Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport PR 36812 from upstream to v1.4.0-rc1 #12432

Closed
ksurent opened this issue Jan 10, 2017 · 1 comment
Closed

Backport PR 36812 from upstream to v1.4.0-rc1 #12432

ksurent opened this issue Jan 10, 2017 · 1 comment
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P2

Comments

@ksurent
Copy link

ksurent commented Jan 10, 2017

There was a bug in the upstream where Jobs and Pods created as a result of a ScheduledJob run would have names that weren't unique enough, which prevented the ScheduledJob from completing.
It was fixed in kubernetes/kubernetes#36812.

Without that fix it's really hard to reliably run recurring tasks in Origin so I propose it should be backported to Origin v1.4.0+rc1 and carried over to later v1.4.* releases.

Version

oc v1.4.0-rc1+b4e0954
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://localhost:8443
openshift v1.4.0-rc1+b4e0954
kubernetes v1.4.0+776c994

Steps To Reproduce
  1. Create a minutely ScheduledJob
  2. Wait for some time
Current Result

The job stops running after some time.

Expected Result

The job should keep running.

Additional Information

Logs from origin-master:

1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"582233", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created job hellocron-1706264388
1 event.go:217] Event(api.ObjectReference{Kind:"Job", Namespace:”ksurent-jobs", Name:"hellocron-1706264388", UID:"cac57d81-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"582319", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hellocron-1706264388-zb0vb
1 event.go:217] Event(api.ObjectReference{Kind:"Job", Namespace:”ksurent-jobs", Name:"hellocron-1706264388", UID:"cac57d81-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"582319", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hellocron-1706264388-v9jle
1 event.go:217] Event(api.ObjectReference{Kind:"Job", Namespace:”ksurent-jobs", Name:"hellocron-1706264388", UID:"cac57d81-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"582319", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hellocron-1706264388-yflxn
1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"582370", FieldPath:""}): type: 'Normal' reason: 'SawCompletedJob' Saw completed job: hellocron-1706264388

... 1 hour passes ...

1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"585993", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "hellocron-1706264388" already exists
1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"585993", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "hellocron-1706264388" already exists
1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"585993", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "hellocron-1706264388" already exists
1 event.go:217] Event(api.ObjectReference{Kind:"ScheduledJob", Namespace:”ksurent-jobs", Name:"hellocron", UID:"aef3a4f1-d427-11e6-a4dd-9cdc7163ec00", APIVersion:"batch", ResourceVersion:"585993", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating job: jobs.batch "hellocron-1706264388" already exists

... keeps going like this ...

1 controller.go:163] Cannot determine if ksurent-jobs/hellocron needs to be started: Too many missed start times to list
@ksurent ksurent changed the title Backport PR 36812 from upstream Backport PR 36812 from upstream to v1.4.0-rc1 Jan 10, 2017
@mfojtik mfojtik added kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Jan 10, 2017
@soltysh
Copy link
Contributor

soltysh commented Jan 10, 2017

ScheduledJob is an alpha feature without any support, sorry :( This bug will be fixed in 1.5 where ScheduledJob is renamed to CronJob.

@soltysh soltysh closed this as completed Jan 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P2
Projects
None yet
Development

No branches or pull requests

3 participants