Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy: update validation for deploymentconfigs #9193

Merged
merged 1 commit into from
Aug 24, 2016
Merged

deploy: update validation for deploymentconfigs #9193

merged 1 commit into from
Aug 24, 2016

Conversation

0xmichalis
Copy link
Contributor

Validate that the deployment config selector is matching the pod
template labels.

cc: @ironcladlou @smarterclayton @mfojtik

@liggitt
Copy link
Contributor

liggitt commented Jun 7, 2016

What are the implications if existing DCs suddenly stop validating because of this?

@0xmichalis
Copy link
Contributor Author

@liggitt those DCs are already broken because the same validation underpins replication controllers:
https://github.com/kubernetes/kubernetes/blob/7476d97781563b70e8b89a8bd3f99ea75ae6c290/pkg/api/validation/validation.go#L2248

@liggitt
Copy link
Contributor

liggitt commented Jun 7, 2016

so they would have failed when actually deploying, but do we have any automated controllers that will try to update the now-invalid DC (to report status, bump versions to trigger deployments, etc), that will behave in a way we don't want (hot-looping, clogging workqueues, etc)?

@0xmichalis
Copy link
Contributor Author

We are gating retries in both the deployment config and config change controller. I believe that for 1.3 we will switch to using caches and the ratelimiting queue so I think we are covered.

@liggitt
Copy link
Contributor

liggitt commented Jun 7, 2016

what about client calls to trigger deployments? would they get a validation error now?

@0xmichalis
Copy link
Contributor Author

Yes. Much better than silently starting an invalid deployment.

@liggitt
Copy link
Contributor

liggitt commented Jun 7, 2016

Much better than silently starting an invalid deployment.

depends. it makes sense when creating the DC. it makes sense if updating the DC from a valid selector to an invalid selector. if you are not changing the selector, and are making an unrelated API call, a validation error about the selector is somewhat nonsensical

@0xmichalis
Copy link
Contributor Author

If it was any other field, I would probably agree but isn't it also nonsensical to allow updates on a DC when it is not working? If I run an update on my (invalid) DC and it succeeded I would expect it to work. But it wouldn't...

@liggitt
Copy link
Contributor

liggitt commented Jun 7, 2016

isn't it also nonsensical to allow updates on a DC when it is not working

No... different updates do different things, and if data was previously allowed in, it's not clear we should disallow unrelated updates in order to tighten validation. I'm not 100% opposed, but the bar needs to be very high for tightening validation like this, and we need to think through all the automated and manual update cases to make sure we're not going to cause bigger problems.

@0xmichalis
Copy link
Contributor Author

It's not about tightening validation, it's about not letting people break themselves or at the very least, letting them know where are they broken, if they are already broken. Disallowing updates like scaling or other template changes that never worked doesn't sound that bad. Plus if someone had hit this already, we would probably know about it.

@liggitt
Copy link
Contributor

liggitt commented Jun 8, 2016

It's not about tightening validation, it's about not letting people break themselves or at the very least, letting them know where are they broken, if they are already broken.

Yes, by tightening validation :)

Disallowing updates like scaling or other template changes that never worked doesn't sound that bad.

Depends how that is being done. If a script or automated system is scaling and is not equipped to understand validation errors (because validation errors were not possible before), this isn't helpful. Like I said, I'm not 100% opposed, but we have to be super careful when tightening validation, because it introduces the potential to make unrelated operations fail.

@0xmichalis
Copy link
Contributor Author

Depends how that is being done. If a script or automated system is scaling and is not equipped to understand validation errors (because validation errors were not possible before), this isn't helpful. Like I said, I'm not 100% opposed, but we have to be super careful when tightening validation, because it introduces the potential to make unrelated operations fail.

I can't think of such a system that wouldn't be already broken. Two cases: 1) scaling events (either by an autoscaler or another automated system or script) = already broken. 2) Updates to the deployment config that will cause[1] a redeployment (either automatic or manual) = already broken.

[1] most if not all of the time you want to redeploy your latest config. Not sure if there is any value in having an updated deployment config that is not working. And I don't think there is an automated system out there that will fix the validation issue for the deployment config to work:)

@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 15, 2016
@smarterclayton
Copy link
Contributor

It would mean, for instance, that an oc label or oc annotate call would fail until you fixed the DC. Since the DC is broken, you'd at least know that the DC was broken.

If there is no case where a DC can exist today that fails this validation, but isn't broken, then this is ok. If there is a DC today that would be working but when we add this validation it breaks (can't trigger redeployment on image?) then I would be concerned. Can we conclusively prove that there is scenario where the DC covered by this new validation would be broken?

@0xmichalis
Copy link
Contributor Author

DCs that will break by this change are already broken.
On master: http://pastebin.com/ZXxLd0WX
The previous deployment is working fine, but any update to the DC will never be reflected back to it.
The new replication controller fails validation so we will never have a new deployment if the validation error is not fixed

From the server logs

E0615 15:01:41.539233    1980 factory.go:69] couldn't create deployment for deployment config broken/deployment-simple: ReplicationController "deployment-simple-2" is invalid: spec.template.metadata.labels: Invalid value: {"deployment":"deployment-simple-2","deploymentconfig":"deployment-simple","name":"deployment-simple-broken"}: `selector` does not match template `labels`

@smarterclayton
Copy link
Contributor

Then I'm ok with the change - Jordan, do yo need more convincing?

@0xmichalis
Copy link
Contributor Author

@liggitt ping

Validate that the deployment config selector is matching the pod
template labels.
@0xmichalis
Copy link
Contributor Author

@liggitt ping - better get this in for 1.3 than not?

@liggitt
Copy link
Contributor

liggitt commented Aug 22, 2016

yeah, I think this is ok

@0xmichalis
Copy link
Contributor Author

[merge]

@openshift-bot
Copy link
Contributor

openshift-bot commented Aug 22, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8386/) (Image: devenv-rhel7_4917)

@0xmichalis
Copy link
Contributor Author

quota [merge]

@mfojtik
Copy link
Contributor

mfojtik commented Aug 23, 2016

@Kargakis while you are here, can you also fix #10570 ?

@0xmichalis
Copy link
Contributor Author

@mfojtik that issue needs some discussion

#8427 [merge]

@0xmichalis
Copy link
Contributor Author

[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 7ade0ff

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8386/)

@0xmichalis
Copy link
Contributor Author

[merge]

@0xmichalis
Copy link
Contributor Author

#8427 [merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 7ade0ff

@openshift-bot openshift-bot merged commit d23e2d8 into openshift:master Aug 24, 2016
@0xmichalis 0xmichalis deleted the update-dc-template-validation branch August 24, 2016 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants