Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

smarterclayton · 2018-01-28T22:29:20Z

Nodes continue to be configured for bootstrapping (as today)
For bootstrap nodes, we write a generic bootstrap-node-config.yaml that contains static pod references and any bootstrap config, and then use that to start a child kubelet using --write-flags instead of launching the node ourselves. If a node-config.yaml is laid down in /etc/origin/node it takes precedence.
For 3.10 we want dynamic node config from Kubernetes to pull down additional files, but there are functional gaps. For now, the openshift SDN container has a sidecar that syncs node config to disk and updates labels (kubelet doesn't update labels, Kubelet does not update node-labels from flags if they change kubernetes/kubernetes#59314)
On the masters, if openshift_master_bootstrap_enabled we generate the master-config.yaml and the etcd config, but we don't start etcd or the masters (no services installed)
On the masters, we copy the static files into the correct pod-manifest-path (/etc/origin/node/pods) or similar
The kubelet at that point should automatically pick up the new static files and launch the components
We wait for them to converge
We install openshift-sdn as the first component, which allows nodes to go ready and start installing things. There is a gap here where the masters are up, the nodes can bootstrap, but the nodes are not ready because no network plugin is installed.

Challenges at this point:

The master shims (master-logs and master-restart) need to deal with CRI-O and systemd. Ideally this is a temporary shim until we remove systemd for these components and have cri-ctl installed.
We need to test failure modes of the static pods
Testing

Further exploration things:

need to get all the images using image streams or properly replaced into the static pods
need to look at upgrades and updates
disk locations become our API (/var/lib/origin, /var/lib/etcd) - how many customers have fiddled with this?
may need to make the kubelet halt if it hasn't been able to get server/client certs within a bounded window (5m?) so to ensure that autoheals happen (UPSTREAM: 59316: Exit if no client cert is available for 5m origin#18430)
have to figure out whether dynamic kubelet config is a thing we can rely on for 3.10 (@liggitt), and what gaps there are with dynamic reconfig
client-ca.crt is not handled by bootstrapping or dynamic config. This needs a solution unless we keep the openshift-sdn sidecar around
kubelet doesn't send sd notify to systemd (Kubelet should fire sd notify to systemd (like genericapiserver) so it can have clean starts kubernetes/kubernetes#59079)

@derekwaynecarr @sdodson @liggitt @deads2k this is the core of self-hosting.

smarterclayton · 2018-02-01T20:11:30Z

@dcbw @knobunc

dcbw · 2018-02-01T20:29:59Z

@danwinship as well...

dcbw · 2018-02-01T20:46:10Z

roles/openshift_sdn/files/sdn.yaml

+        - mountPath: /var/run/openshift-sdn
+          name: host-var-run-openshift-sdn
+        # CNI related mounts which we take over
+        - mountPath: /host/opt/cni/bin


@smarterclayton nevermind, I'm mixing up my volume mounts... or am I? isn't volumeMount->mountPath the path inside the container? If so, then wouldn't we need a change to the SDN code to look at /host/opt/cni/bin instead of just /opt/cni/bin? I'm not really sure what you're trying to do here with /host/opt/cni/bin...

I just took over all the master paths related to CNI when this pod starts up. These were the directories the previous content referenced.

Although this one may be a typo. Will check.

dcbw · 2018-02-01T21:20:47Z

roles/openshift_node_group/templates/node-config.yaml.j2

@@ -35,19 +46,12 @@ masterKubeConfig: node.kubeconfig
 networkConfig:
  mtu: {{ openshift_node_group_network_mtu }}
  networkPluginName: {{ openshift_node_group_network_plugin }}
-nodeIP: ""
-podManifestConfig: null
+networkPluginName: {{ openshift_node_group_network_plugin }}


@smarterclayton isn't this new networkPluginName a dupe? it's specified just above in networkConfig already. (see below, this might be a leftover from something... but on the node, networkPluginName is part of networkConfig AFAICS).

How this should all work is...

kubelet has a separate network plugin argument which allows passing either nothing (eg use docker default networking), "kubenet" (upstream default), or "cni" (openshift-sdn and 3rd party network plugins)

the Network daemonset gets its own network plugin flag for singletenant, multitenant, or networkpolicy. Or even better, we somehow detect that at runtime (this would take a bit of work)

What requirements does kubenet have?

dcbw · 2018-02-01T23:09:18Z

roles/openshift_node_group/templates/node-config.yaml.j2

 volumeConfig:
  localQuota:
    perFSGroup: null
 volumeDirectory: {{ openshift_node_group_node_data_dir }}/openshift.local.volumes
-enable-controller-attach-detach:
- 'true'
-networkPluginName: {{ openshift_node_group_network_plugin }}


Ok, this is odd. I can't see how this networkPluginName would be used at all... Our config defines networkPluginName part of the NetworkConfig struct, not top-level in the node config.

This is the legacy field. It continues to be supported on the external struct and takes precedence over the inner. We can probably simplify.

dcbw · 2018-02-01T23:19:37Z

@smarterclayton so the only two things I see worth discussion are:

the /host/opt/cni/bin thing not paired with a corresponding change to the SDN code (unless I mis-understand how volume mounts work)
how networkPluginName gets handled on the node, for both Kubelet itself and for the SDN. It would be nice not to require the IsOpenShiftNetworkPlugin() stuff in pkg/cmd/server/kubernetes/node/options/options.go, which we could do be either (a) making the SDN code figure that out automatically on the node, or (b) allowing separate networkPluginName config options for kubelet and the SDN daemonset

smarterclayton · 2018-02-02T04:03:47Z

Do we have any need to support the old plugin names? I think only the network plugin needs to respect it, the kubelet won’t have anything other than cni (I don’t think?) Nodes won’t have any networking config in their config, instead the network daemonset decides that. On Feb 1, 2018, at 6:19 PM, Dan Williams <[email protected]> wrote: @smarterclayton <https://github.com/smarterclayton> so the only two things I see worth discussion are: 1. the /host/opt/cni/bin thing not paired with a corresponding change to the SDN code (unless I mis-understand how volume mounts work) 2. how networkPluginName gets handled on the node, for both Kubelet itself and for the SDN. It would be nice not to require the IsOpenShiftNetworkPlugin() stuff in pkg/cmd/server/kubernetes/node/options/options.go, which we could do be either (a) making the SDN code figure that out automatically on the node, or (b) allowing separate networkPluginName config options for kubelet and the SDN daemonset — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6916 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p770Zfa0n9hXmLzfGh2E4f7HSuqSks5tQkaNgaJpZM4Rv4zi> .

dcbw · 2018-02-02T17:00:14Z

Do we have any need to support the old plugin names? I think only the network plugin needs to respect it, the kubelet won’t have anything other than cni (I don’t think?) Nodes won’t have any networking config in their config, instead the network daemonset decides that.

@smarterclayton Yeah, the node itself won't need any networkConfig block unless we want to allow people to use pure docker networking or kubenet like upstream supports. In that case we need to pass something other than --network-plugin=cni to kubelet.

Then the network daemonset needs its own networkConfig block which includes MTU and networkPluginName so that we can distinguish between single/multitenant and networkpolicy.

danwinship · 2018-02-02T17:50:51Z

the node itself won't need any networkConfig block unless we want to allow people to use pure docker networking or kubenet

"oc cluster up" uses kubenet, right?

Then the network daemonset needs its own networkConfig block which includes MTU and networkPluginName so that we can distinguish between single/multitenant and networkpolicy.

Though if there's going to be some mandatory configuration-munging step as part of the upgrade maybe we should change the name/values of this option to better reflect the fact that it's choosing the mode that the single openshift-sdn plugin runs in, not selecting between three different plugins? (Something like network-isolation=none, network-isolation=legacy-multitenant, network-isolation=networkpolicy, with the default being networkpolicy?)

Also, we should think about how people are going to configure whether to use openshift-sdn or ovn-kubernetes.

dcbw · 2018-02-02T18:03:07Z

"oc cluster up" uses kubenet, right?

oc cluster up literally doesn't do anything with networking. So you get docker default networking (and yes, it doesn't look like oc cluster up supports CRIO yet). I assume that means networkPluginName="".

Though if there's going to be some mandatory configuration-munging step as part of the upgrade maybe we should change the name/values of this option to better reflect the fact that it's choosing the mode that the single openshift-sdn plugin runs in, not selecting between three different plugins? (Something like network-isolation=none, network-isolation=legacy-multitenant, network-isolation=networkpolicy, with the default being networkpolicy?)

That would be nice, though right now I don't think we are forced to change that. The SDN daemonset still gets the networkPluginName argument AFAICS. But it's still just a config option, so we can put whatever we want in there for the SDN daemonset, though it would be a bit odd to have networkPluginName: network-isolation=XXX.

Also, we should think about how people are going to configure whether to use openshift-sdn or ovn-kubernetes.

On the node, they just don't run the SDN daemonset. Although this is a bit complicated right now because "openshift network" does more than just the SDN+proxy, it also does some DNS stuff too.

@smarterclayton so what we should probably do is split the SDN out from even the DNS stuff and just have an SDN daemonset alone. That way users can either run it, or run ovn-kubernetes, or something else entirely.

smarterclayton · 2018-02-04T20:58:14Z

oc cluster up is going to be changed to install and use ovs and sdn by default unless we can think of a reason it wouldn't work. Can kubenet be configured via CNI without gymnastics?

DNS is going to be split into CoreDNS at some point. We're going to have to come back to it probably before 3.10 launches though.

smarterclayton · 2018-02-04T20:58:49Z

Without openshift/origin#18430, if any cert expires the kubelets won't reestablish contact. I'll consider back porting it.

jlebon · 2018-03-09T20:32:43Z

bot, retest this please

smarterclayton · 2018-03-09T21:10:18Z

GCP is a flake, logging may have regressed

smarterclayton · 2018-03-09T22:43:35Z

bot, retest this please

smarterclayton · 2018-03-10T02:02:17Z

/retest

smarterclayton · 2018-03-10T02:14:00Z

/retest

smarterclayton · 2018-03-10T19:36:16Z

/retest

Going to merge on green

smarterclayton · 2018-03-10T19:40:52Z

Two changes - default openshift_push_dns to true, and replace “${version}” with openshift_image_tag in oreg_url.

Follow on PR will reenable system containers job, disable openshift_master role, and default both openshift_master and openshift_node bootstrapping on by default, then begin working on the upgrade steps.

Is a replacement to the openshift_master role. OpenShift SDN role serves two purposes - managing the network (kube proxy, dns, and SDN if enabled) as well as temporarily performing a sync loop.

Remove any overlapping master implications from the node role for now. The node now uses /usr/bin/openshift-node to generate the config into kubelet flags and launch the kubelet.

This is the configuration that is baked into AMIs and starts the node waiting for content. If a subsequent task lays down a node-config.yaml into /etc/origin/node, the bootstrap config is ignored.

Use static pod restarts everywhere, and remove openvswitch from nodes.

Otherwise some code will check the wrong value.

smarterclayton · 2018-03-11T00:35:19Z

And the boot comes

openshift-merge-robot · 2018-03-11T00:47:06Z

Automatic merge from submit-queue.

openshift-ci-robot · 2018-03-11T01:00:26Z

@smarterclayton: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/system-containers	`b9fc854`	link	`/test system-containers`
ci/openshift-jenkins/extended_conformance_install_crio	`b9fc854`	link	`/test crio`
ci/openshift-jenkins/logging	`c826c43`	link	`/test logging`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

derekwaynecarr · 2018-03-12T04:03:23Z

roles/openshift_control_plane/files/scripts/docker/master-logs

+  fi
+fi
+
+# TODO: move to cri-ctl


just catching up on this PR... are there tracking issues for all these TODOs?

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2018

smarterclayton force-pushed the static_masters branch 3 times, most recently from 831cb45 to 13a4c61 Compare January 31, 2018 04:30

openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2018

smarterclayton force-pushed the static_masters branch 2 times, most recently from bd51401 to 0c08ca0 Compare January 31, 2018 06:12

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2018

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 1, 2018

dcbw reviewed Feb 1, 2018

View reviewed changes

smarterclayton force-pushed the static_masters branch from 0c08ca0 to 4744d9c Compare February 2, 2018 06:35

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2018

smarterclayton force-pushed the static_masters branch 2 times, most recently from 1083a6a to a8727eb Compare February 4, 2018 20:53

smarterclayton changed the title ~~WIP - Static master config~~ Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping Feb 4, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2018

smarterclayton force-pushed the static_masters branch from e058f2c to 6db2727 Compare March 9, 2018 22:27

smarterclayton force-pushed the static_masters branch from 8830845 to 597e57c Compare March 10, 2018 17:11

smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2018

smarterclayton force-pushed the static_masters branch from 597e57c to 446eef0 Compare March 10, 2018 20:45

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2018

smarterclayton added 9 commits March 10, 2018 17:07

Add new openshift_control_plane and openshift_sdn roles

b17728d

Is a replacement to the openshift_master role. OpenShift SDN role serves two purposes - managing the network (kube proxy, dns, and SDN if enabled) as well as temporarily performing a sync loop.

Use an etcd static pod when master bootstrapping is set

26d9775

Prepare the node for dynamic bootstrapping

7dedb11

Remove any overlapping master implications from the node role for now. The node now uses /usr/bin/openshift-node to generate the config into kubelet flags and launch the kubelet.

Switch to bootstrap script as a default var

343655d

Add a local bootstrap-node-config.yml on all bootstrap nodes

984e6d1

This is the configuration that is baked into AMIs and starts the node waiting for content. If a subsequent task lays down a node-config.yaml into /etc/origin/node, the bootstrap config is ignored.

In master bootstrapping mode, use the new openshift_control_plane role

8c0ce4e

Use static pod restarts everywhere, and remove openvswitch from nodes.

Use internalRegistryHostname when bootstrapping

b88970e

Update f27-atomic build to pull images

33de05b

Replace ${version} with openshift_image_tag

c826c43

Otherwise some code will check the wrong value.

smarterclayton force-pushed the static_masters branch from 446eef0 to c826c43 Compare March 10, 2018 22:07

smarterclayton added lgtm Indicates that a PR is ready to be merged. retest-not-required labels Mar 11, 2018

openshift-merge-robot merged commit 4d51b56 into openshift:master Mar 11, 2018

smarterclayton mentioned this pull request Mar 11, 2018

Make bootstrapping opt out and remove the legacy master install path #7486

Merged

derekwaynecarr reviewed Mar 12, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

smarterclayton commented Jan 28, 2018 •

edited

Loading

smarterclayton commented Feb 1, 2018

dcbw commented Feb 1, 2018

dcbw Feb 1, 2018 •

edited

Loading

smarterclayton Feb 4, 2018

smarterclayton Feb 4, 2018

dcbw Feb 1, 2018 •

edited

Loading

smarterclayton Feb 4, 2018

dcbw Feb 1, 2018

smarterclayton Feb 4, 2018

dcbw commented Feb 1, 2018

smarterclayton commented Feb 2, 2018 via email

dcbw commented Feb 2, 2018

danwinship commented Feb 2, 2018

dcbw commented Feb 2, 2018

smarterclayton commented Feb 4, 2018

smarterclayton commented Feb 4, 2018

jlebon commented Mar 9, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 11, 2018

openshift-merge-robot commented Mar 11, 2018

openshift-ci-robot commented Mar 11, 2018

derekwaynecarr Mar 12, 2018

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

Make openshift-ansible use static pods to install the control plane, make nodes prefer bootstrapping #6916

Conversation

smarterclayton commented Jan 28, 2018 • edited Loading

smarterclayton commented Feb 1, 2018

dcbw commented Feb 1, 2018

dcbw Feb 1, 2018 • edited Loading

Choose a reason for hiding this comment

smarterclayton Feb 4, 2018

Choose a reason for hiding this comment

smarterclayton Feb 4, 2018

Choose a reason for hiding this comment

dcbw Feb 1, 2018 • edited Loading

Choose a reason for hiding this comment

smarterclayton Feb 4, 2018

Choose a reason for hiding this comment

dcbw Feb 1, 2018

Choose a reason for hiding this comment

smarterclayton Feb 4, 2018

Choose a reason for hiding this comment

dcbw commented Feb 1, 2018

smarterclayton commented Feb 2, 2018 via email

dcbw commented Feb 2, 2018

danwinship commented Feb 2, 2018

dcbw commented Feb 2, 2018

smarterclayton commented Feb 4, 2018

smarterclayton commented Feb 4, 2018

jlebon commented Mar 9, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 11, 2018

openshift-merge-robot commented Mar 11, 2018

openshift-ci-robot commented Mar 11, 2018

derekwaynecarr Mar 12, 2018

Choose a reason for hiding this comment

smarterclayton commented Jan 28, 2018 •

edited

Loading

dcbw Feb 1, 2018 •

edited

Loading

dcbw Feb 1, 2018 •

edited

Loading