Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-50587: Reject new NodeStatus with non-zero revision set #2208

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

everettraven
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 18, 2025
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

Hello @everettraven! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 18, 2025
Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: everettraven
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@everettraven everettraven force-pushed the reject-new-nodestatus-with-nonzero-revision-set branch from 1dcee82 to f2adc00 Compare February 18, 2025 20:00
@everettraven
Copy link
Contributor Author

/test

Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

@everettraven: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test build
/test e2e-aws-ovn
/test e2e-aws-ovn-hypershift
/test e2e-aws-ovn-techpreview
/test e2e-aws-serial
/test e2e-aws-serial-techpreview
/test e2e-upgrade
/test images
/test integration
/test lint
/test minor-e2e-upgrade-minor
/test minor-images
/test unit
/test verify
/test verify-client-go
/test verify-crd-schema
/test verify-deps
/test verify-feature-promotion

The following commands are available to trigger optional jobs:

/test e2e-azure
/test e2e-gcp
/test okd-scos-e2e-aws-ovn
/test okd-scos-images

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-api-master-build
pull-ci-openshift-api-master-e2e-aws-ovn
pull-ci-openshift-api-master-e2e-aws-ovn-hypershift
pull-ci-openshift-api-master-e2e-aws-ovn-techpreview
pull-ci-openshift-api-master-e2e-aws-serial
pull-ci-openshift-api-master-e2e-aws-serial-techpreview
pull-ci-openshift-api-master-e2e-azure
pull-ci-openshift-api-master-e2e-gcp
pull-ci-openshift-api-master-e2e-upgrade
pull-ci-openshift-api-master-images
pull-ci-openshift-api-master-integration
pull-ci-openshift-api-master-lint
pull-ci-openshift-api-master-minor-e2e-upgrade-minor
pull-ci-openshift-api-master-minor-images
pull-ci-openshift-api-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-api-master-unit
pull-ci-openshift-api-master-verify
pull-ci-openshift-api-master-verify-client-go
pull-ci-openshift-api-master-verify-crd-schema
pull-ci-openshift-api-master-verify-deps
pull-ci-openshift-api-master-verify-feature-promotion

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@everettraven
Copy link
Contributor Author

/test all

@everettraven everettraven marked this pull request as ready for review February 18, 2025 21:48
@openshift-ci openshift-ci bot requested review from bparees and knobunc February 18, 2025 21:49
@everettraven
Copy link
Contributor Author

/payload-job periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-azure

Copy link
Contributor

openshift-ci bot commented Feb 18, 2025

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-azure

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7038d640-ee42-11ef-9b4a-a5a4a7d518e4-0

@everettraven
Copy link
Contributor Author

/retest

@ardaguclu
Copy link
Member

/retest

@everettraven I figured out that the failures are related to this PR. Jobs will fail.

@openshift-ci openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 19, 2025
@ardaguclu
Copy link
Member

Attaching just for easy accessibility. #2207 is prerequisite for this PR.

@ardaguclu
Copy link
Member

/payload-job periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-azure

Copy link
Contributor

openshift-ci bot commented Feb 19, 2025

@ardaguclu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-azure

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/33717610-eee5-11ef-9963-846be62b9323-0

@everettraven
Copy link
Contributor Author

/payload-job periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-gcp
/payload-job periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-aws

Copy link
Contributor

openshift-ci bot commented Feb 19, 2025

@everettraven: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-gcp
  • periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-aws

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/603a58a0-eee6-11ef-87b5-1ffd47bf88a2-0

@ardaguclu
Copy link
Member

What do you think running all payload blockings?

@everettraven
Copy link
Contributor Author

@ardaguclu I'm hesitant to do that because it is quite an expensive operation and my understanding is it should be used very sparingly. That being said, if it is the only way to get these jobs to run maybe it is worth it since this targets resolving component readiness failures?

@ardaguclu
Copy link
Member

@ardaguclu I'm hesitant to do that because it is quite an expensive operation and my understanding is it should be used very sparingly. That being said, if it is the only way to get these jobs to run maybe it is worth it since this targets resolving component readiness failures?

Goof point. Maybe after the review and if recommended from approver, we can trigger.

Copy link
Contributor

openshift-ci bot commented Feb 19, 2025

@everettraven: This PR was included in a payload test run from openshift/cluster-control-plane-machine-set-operator#347
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.19-periodics-e2e-azure

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7a75e780-eeea-11ef-9c70-5891df82d5b3-0

@@ -258,15 +258,21 @@ type StaticPodOperatorStatus struct {

// NodeStatus provides information about the current state of a particular node managed by this operator.
// +kubebuilder:validation:XValidation:rule="has(self.currentRevision) || !has(oldSelf.currentRevision)",message="cannot be unset once set",fieldPath=".currentRevision"
// +kubebuilder:validation:XValidation:rule="oldSelf.hasValue() || (has(self.currentRevision) ? self.currentRevision == 0 : true) ",message="when specified on creation of a nodeStatus, currentRevision must be set to 0 on creation",optionalOldSelf=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is better, but I think you could also write this as

Suggested change
// +kubebuilder:validation:XValidation:rule="oldSelf.hasValue() || (has(self.currentRevision) ? self.currentRevision == 0 : true) ",message="when specified on creation of a nodeStatus, currentRevision must be set to 0 on creation",optionalOldSelf=true
// +kubebuilder:validation:XValidation:rule="oldSelf.hasValue() || self.?currentRevision.orValue(0) == 0) ",message="when specified on creation of a nodeStatus, currentRevision must be set to 0 on creation",optionalOldSelf=true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the way it is currently written to be more straightforward than that from a reader perspective. I'd prefer to keep the rules easier to read and understand than more succinct even if it does the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfectly valid, was just giving you a second option in case you felt this way might have been easier, I'm happy with either

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2025
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Feb 19, 2025
@openshift-ci-robot
Copy link

@everettraven: This pull request references Jira Issue OCPBUGS-50587, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @geliu2016

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Feb 19, 2025
@openshift-ci openshift-ci bot requested a review from geliu2016 February 19, 2025 18:44
@everettraven
Copy link
Contributor Author

Huge thanks to @JoelSpeed for getting https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-control-plane-machine-set-operator-347-openshift-api-2208-e2e-azure-periodic-pre/1892272835435433984 run - looks like this change doesn't immediately break the payload job we noticed the regression on.

I think we should be good to merge these changes pending review and watch the 7 day lookback that Ben had mentioned while we work on the backport process for 4.18.

@ardaguclu
Copy link
Member

/retest

@ardaguclu
Copy link
Member

As far as I can see, manual runs for machine-set-operator didn't give us a clue. We need to run payload blockings to have something;
/payload 4.19 nightly blocking

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@ardaguclu: trigger 15 job(s) of type blocking for the nightly release of OCP 4.19

  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.19-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.19-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-driver-toolkit
  • periodic-ci-openshift-release-master-nightly-4.19-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance
  • periodic-ci-openshift-microshift-release-4.19-periodics-e2e-aws-ovn-ocp-conformance-serial
  • periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4ee4200-ef47-11ef-87e1-c1f8e3a92868-0

@ardaguclu
Copy link
Member

This PR openshift/cluster-control-plane-machine-set-operator#348 could have successfully run e2e-azure-periodic-pre

@ardaguclu
Copy link
Member

/retest

@ardaguclu
Copy link
Member

/payload-job periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm
which failed in bootstrapping

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@ardaguclu: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b7b85cb0-ef5b-11ef-96cb-e64ebe137026-0

@ardaguclu
Copy link
Member

ardaguclu commented Feb 20, 2025

This PR openshift/cluster-control-plane-machine-set-operator#348 could have successfully run e2e-azure-periodic-pre

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/multi-pr-openshift-cluster-control-plane-machine-set-operator-348-openshift-api-2208-e2e-azure-periodic-pre/1892443956839452672 passed. Running a couple of more...

…n not set to 0

Entries of static pod operators' node statuses are co-managed:
- A node controller, responsible for ensuring a node status entry for per control plane node
- An installer controller, responsible for managing operand revisions for each entry

New node status entries with a non-zero revision populated signals that a single client
is managing both aspects and is not intentional. This validation should serve as a pragmatic
against multi-writer field errors.

Signed-off-by: Bryce Palmer <[email protected]>
@everettraven everettraven force-pushed the reject-new-nodestatus-with-nonzero-revision-set branch from 1b31db9 to 46fc4d5 Compare February 20, 2025 12:57
@everettraven
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@everettraven: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-bm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ad79f130-ef8a-11ef-9669-7234f2a65e3a-0

@everettraven
Copy link
Contributor Author

/retest

@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 20, 2025
@everettraven
Copy link
Contributor Author

/testwith e2e-azure openshift/origin#29558

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@everettraven, testwith: Error processing request. ERROR:

could not determine job runs: requested job is invalid. needs to be formatted like: <org>/<repo>/<branch>/<variant?>/<job>. instead it was: e2e-azure

@everettraven
Copy link
Contributor Author

/testwith openshift/api/main/e2e-azure openshift/origin#29558

Copy link
Contributor

openshift-ci bot commented Feb 20, 2025

@everettraven, testwith: could not generate prow job. ERROR:

could not determine ci op config from metadata: got unexpected http 404 status code from configresolver: failed to get config: could not find any config for branch main on repo openshift/api

@everettraven
Copy link
Contributor Author

/testwith openshift/api/master/e2e-azure openshift/origin#29558

Copy link
Contributor

openshift-ci bot commented Feb 21, 2025

@everettraven: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn ce61d0a link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn ce61d0a link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-hypershift ce61d0a link true /test e2e-aws-ovn-hypershift
ci/prow/e2e-aws-ovn-techpreview ce61d0a link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-gcp ce61d0a link false /test e2e-gcp
ci/prow/e2e-azure ce61d0a link false /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants