Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

Merged
merged 9 commits into from
Mar 11, 2018

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jan 28, 2018

  1. Nodes continue to be configured for bootstrapping (as today)
  2. For bootstrap nodes, we write a generic bootstrap-node-config.yaml that contains static pod references and any bootstrap config, and then use that to start a child kubelet using --write-flags instead of launching the node ourselves. If a node-config.yaml is laid down in /etc/origin/node it takes precedence.
  3. For 3.10 we want dynamic node config from Kubernetes to pull down additional files, but there are functional gaps. For now, the openshift SDN container has a sidecar that syncs node config to disk and updates labels (kubelet doesn't update labels, Kubelet does not update node-labels from flags if they change kubernetes/kubernetes#59314)
  4. On the masters, if openshift_master_bootstrap_enabled we generate the master-config.yaml and the etcd config, but we don't start etcd or the masters (no services installed)
  5. On the masters, we copy the static files into the correct pod-manifest-path (/etc/origin/node/pods) or similar
  6. The kubelet at that point should automatically pick up the new static files and launch the components
  7. We wait for them to converge
  8. We install openshift-sdn as the first component, which allows nodes to go ready and start installing things. There is a gap here where the masters are up, the nodes can bootstrap, but the nodes are not ready because no network plugin is installed.

Challenges at this point:

  • The master shims (master-logs and master-restart) need to deal with CRI-O and systemd. Ideally this is a temporary shim until we remove systemd for these components and have cri-ctl installed.
  • We need to test failure modes of the static pods
  • Testing

Further exploration things:

@derekwaynecarr @sdodson @liggitt @deads2k this is the core of self-hosting.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2018
@smarterclayton smarterclayton force-pushed the static_masters branch 3 times, most recently from 831cb45 to 13a4c61 Compare January 31, 2018 04:30
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2018
@smarterclayton smarterclayton force-pushed the static_masters branch 2 times, most recently from bd51401 to 0c08ca0 Compare January 31, 2018 06:12
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2018
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 1, 2018
@smarterclayton
Copy link
Contributor Author

@dcbw @knobunc

@dcbw
Copy link
Contributor

dcbw commented Feb 1, 2018

@danwinship as well...

- mountPath: /var/run/openshift-sdn
name: host-var-run-openshift-sdn
# CNI related mounts which we take over
- mountPath: /host/opt/cni/bin
Copy link
Contributor

@dcbw dcbw Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton nevermind, I'm mixing up my volume mounts... or am I? isn't volumeMount->mountPath the path inside the container? If so, then wouldn't we need a change to the SDN code to look at /host/opt/cni/bin instead of just /opt/cni/bin? I'm not really sure what you're trying to do here with /host/opt/cni/bin...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took over all the master paths related to CNI when this pod starts up. These were the directories the previous content referenced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this one may be a typo. Will check.

@@ -35,19 +46,12 @@ masterKubeConfig: node.kubeconfig
networkConfig:
mtu: {{ openshift_node_group_network_mtu }}
networkPluginName: {{ openshift_node_group_network_plugin }}
nodeIP: ""
podManifestConfig: null
networkPluginName: {{ openshift_node_group_network_plugin }}
Copy link
Contributor

@dcbw dcbw Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton isn't this new networkPluginName a dupe? it's specified just above in networkConfig already. (see below, this might be a leftover from something... but on the node, networkPluginName is part of networkConfig AFAICS).

How this should all work is...

  1. kubelet has a separate network plugin argument which allows passing either nothing (eg use docker default networking), "kubenet" (upstream default), or "cni" (openshift-sdn and 3rd party network plugins)
  2. the Network daemonset gets its own network plugin flag for singletenant, multitenant, or networkpolicy. Or even better, we somehow detect that at runtime (this would take a bit of work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What requirements does kubenet have?

volumeConfig:
localQuota:
perFSGroup: null
volumeDirectory: {{ openshift_node_group_node_data_dir }}/openshift.local.volumes
enable-controller-attach-detach:
- 'true'
networkPluginName: {{ openshift_node_group_network_plugin }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is odd. I can't see how this networkPluginName would be used at all... Our config defines networkPluginName part of the NetworkConfig struct, not top-level in the node config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the legacy field. It continues to be supported on the external struct and takes precedence over the inner. We can probably simplify.

@dcbw
Copy link
Contributor

dcbw commented Feb 1, 2018

@smarterclayton so the only two things I see worth discussion are:

  1. the /host/opt/cni/bin thing not paired with a corresponding change to the SDN code (unless I mis-understand how volume mounts work)
  2. how networkPluginName gets handled on the node, for both Kubelet itself and for the SDN. It would be nice not to require the IsOpenShiftNetworkPlugin() stuff in pkg/cmd/server/kubernetes/node/options/options.go, which we could do be either (a) making the SDN code figure that out automatically on the node, or (b) allowing separate networkPluginName config options for kubelet and the SDN daemonset

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 2, 2018 via email

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2018
@dcbw
Copy link
Contributor

dcbw commented Feb 2, 2018

Do we have any need to support the old plugin names? I think only the network plugin needs to respect it, the kubelet won’t have anything other than cni (I don’t think?) Nodes won’t have any networking config in their config, instead the network daemonset decides that.

@smarterclayton Yeah, the node itself won't need any networkConfig block unless we want to allow people to use pure docker networking or kubenet like upstream supports. In that case we need to pass something other than --network-plugin=cni to kubelet.

Then the network daemonset needs its own networkConfig block which includes MTU and networkPluginName so that we can distinguish between single/multitenant and networkpolicy.

@danwinship
Copy link
Contributor

the node itself won't need any networkConfig block unless we want to allow people to use pure docker networking or kubenet

"oc cluster up" uses kubenet, right?

Then the network daemonset needs its own networkConfig block which includes MTU and networkPluginName so that we can distinguish between single/multitenant and networkpolicy.

Though if there's going to be some mandatory configuration-munging step as part of the upgrade maybe we should change the name/values of this option to better reflect the fact that it's choosing the mode that the single openshift-sdn plugin runs in, not selecting between three different plugins? (Something like network-isolation=none, network-isolation=legacy-multitenant, network-isolation=networkpolicy, with the default being networkpolicy?)

Also, we should think about how people are going to configure whether to use openshift-sdn or ovn-kubernetes.

@dcbw
Copy link
Contributor

dcbw commented Feb 2, 2018

"oc cluster up" uses kubenet, right?

oc cluster up literally doesn't do anything with networking. So you get docker default networking (and yes, it doesn't look like oc cluster up supports CRIO yet). I assume that means networkPluginName="".

Though if there's going to be some mandatory configuration-munging step as part of the upgrade maybe we should change the name/values of this option to better reflect the fact that it's choosing the mode that the single openshift-sdn plugin runs in, not selecting between three different plugins? (Something like network-isolation=none, network-isolation=legacy-multitenant, network-isolation=networkpolicy, with the default being networkpolicy?)

That would be nice, though right now I don't think we are forced to change that. The SDN daemonset still gets the networkPluginName argument AFAICS. But it's still just a config option, so we can put whatever we want in there for the SDN daemonset, though it would be a bit odd to have networkPluginName: network-isolation=XXX.

Also, we should think about how people are going to configure whether to use openshift-sdn or ovn-kubernetes.

On the node, they just don't run the SDN daemonset. Although this is a bit complicated right now because "openshift network" does more than just the SDN+proxy, it also does some DNS stuff too.

@smarterclayton so what we should probably do is split the SDN out from even the DNS stuff and just have an SDN daemonset alone. That way users can either run it, or run ovn-kubernetes, or something else entirely.

@smarterclayton smarterclayton force-pushed the static_masters branch 2 times, most recently from 1083a6a to a8727eb Compare February 4, 2018 20:53
@smarterclayton
Copy link
Contributor Author

oc cluster up is going to be changed to install and use ovs and sdn by default unless we can think of a reason it wouldn't work. Can kubenet be configured via CNI without gymnastics?

DNS is going to be split into CoreDNS at some point. We're going to have to come back to it probably before 3.10 launches though.

@smarterclayton
Copy link
Contributor Author

Without openshift/origin#18430, if any cert expires the kubelets won't reestablish contact. I'll consider back porting it.

@smarterclayton smarterclayton changed the title WIP - Static master config Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping Feb 4, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2018
@jlebon
Copy link
Member

jlebon commented Mar 9, 2018

bot, retest this please

@smarterclayton
Copy link
Contributor Author

GCP is a flake, logging may have regressed

@smarterclayton
Copy link
Contributor Author

bot, retest this please

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

Going to merge on green

@smarterclayton smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2018
@smarterclayton
Copy link
Contributor Author

Two changes - default openshift_push_dns to true, and replace “${version}” with openshift_image_tag in oreg_url.

Follow on PR will reenable system containers job, disable openshift_master role, and default both openshift_master and openshift_node bootstrapping on by default, then begin working on the upgrade steps.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2018
Is a replacement to the openshift_master role.  OpenShift SDN role
serves two purposes - managing the network (kube proxy, dns, and SDN if
enabled) as well as temporarily performing a sync loop.
Remove any overlapping master implications from the node role for now.

The node now uses /usr/bin/openshift-node to generate the config into
kubelet flags and launch the kubelet.
This is the configuration that is baked into AMIs and starts the node
waiting for content. If a subsequent task lays down a node-config.yaml
into /etc/origin/node, the bootstrap config is ignored.
Use static pod restarts everywhere, and remove openvswitch from nodes.
Otherwise some code will check the wrong value.
@smarterclayton smarterclayton added lgtm Indicates that a PR is ready to be merged. retest-not-required labels Mar 11, 2018
@smarterclayton
Copy link
Contributor Author

And the boot comes

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit 4d51b56 into openshift:master Mar 11, 2018
@openshift-ci-robot
Copy link

@smarterclayton: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/system-containers b9fc854 link /test system-containers
ci/openshift-jenkins/extended_conformance_install_crio b9fc854 link /test crio
ci/openshift-jenkins/logging c826c43 link /test logging

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

fi
fi

# TODO: move to cri-ctl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just catching up on this PR... are there tracking issues for all these TODOs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. retest-not-required size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.