Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nodes to bootstrap and add oc cluster join for nodes #9547

Merged
merged 5 commits into from
Jan 24, 2017

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jun 24, 2016

Leverage the alpha CSR endpoint in Kube to allow nodes to bootstrap themselves. On startup, the node now accepts --bootstrap which instructs it to perform the following steps:

  1. Connect to the master using --kubeconfig and attempt to create a CSR for a client cert on system:nodes, with the common name system:node:NAME
  2. Given a successful client cert, use that cert to request a serving cert for the node hostnames
  3. Given that, attempt to download the ConfigMap node-config in namespace openshift-infra if it exists
  4. Create a default node-config.yaml if none was download (if the config map does not exist)
  5. Merge any command line flags with the config file
  6. Start

The server creates a new infra service account on startup - node-bootstrap in openshift-infra. This account is granted the ability to create client CSRs (and nothing else). The CSR controller is initialized (if no defaults are set) to allow the system:service accounts:openshift-infra group to request client CSRs and the system:nodes group to request server CSRs. The controller can be disabled by setting an invalid or unused group.

Add a new command oc serviceaccounts create-kubeconfig which generates a kubeconfig file for a service account (to make it easy to generate a kubeconfig that can be used to bootstrap the node).

Add a new command oc cluster join which launches a container that acts as a node.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2016
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 22, 2016
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 2, 2016
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2016
@smarterclayton smarterclayton force-pushed the joincommand branch 3 times, most recently from f6480bd to d9c6a4a Compare November 20, 2016 01:25
@smarterclayton
Copy link
Contributor Author

@liggitt @deads2k this is ready for a once over.

@smarterclayton
Copy link
Contributor Author

Created a node-bootstrap.kubeconfig on creation that is a client cert for the account. This simplifies small clusters (because you can just use that file as your bootstrap). However, it's not the ideal outcome, and most people should use oc sa create-kubeconfig -n openshift-infra node-bootstrap > as their node-bootstrap.kubeconfig.

@smarterclayton
Copy link
Contributor Author

[test]

@smarterclayton smarterclayton changed the title WIP - enable oc cluster join an existing cluster Allow nodes to bootstrap and add oc cluster join for nodes Nov 20, 2016
@smarterclayton smarterclayton added this to the 1.5.0 milestone Nov 20, 2016
@smarterclayton
Copy link
Contributor Author

@dgoodwin this is similar to the upstream work, but leverages our existing crypto infra and is capable of starting existing nodes. I think future work may converge, but will be some difference for a while.

@smarterclayton smarterclayton force-pushed the joincommand branch 3 times, most recently from c6b970f to d6e1c68 Compare November 21, 2016 23:00
Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments from trying to test this out.

Add a new node to an existing OpenShift cluster

.PP
Uses an existing connection to a Docker daemon to start an OpenShift node. You must provide a secret to connect to the master. Before running command, ensure that you can execute docker commands successfully (ie. 'docker ps').
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get more specific on what that secret is and how you obtain it? I am assuming it's kubeconfig output from create-kubeconfig but per below I don't seem to have a node-bootstrap SA.

It might be very useful to spell out exactly how to do this in the output from oc cluster up, or in the "by example" docs somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect that the secret might one day also allow you to use the discovery bits likely landing in 1.6? We would need to plan for how you find an API server if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secret is a kubeconfig file, that has the ability to request CSR. Expectation here is that your file contains discovery info (we're not focused on self-hosting in OpenShift until it's had time to bake in Kube, so the discovery bits aren't really necessary).

`)
)

// NewCmdJoin creates a command that joins an existing OpenShift cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW we found kubeadm reset pretty handy for both devs and users, oc cluster leave might be a nice addition at some point to just clear everything out and allow user to try again.

KeyFile: path.Join(certDir, "node-bootstrap.key"),
},
UnqualifiedUser: "node-bootstrap",
User: "system:serviceaccount:openshift-infra:node-bootstrap",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be created by default after oc cluster up? (this did not seem to happen for me) Anything I might be missing that needs to be done to get it created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is created automatically on a new cluster - you would need to rebuild the image if you haven't.

)
if err != nil {
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To above question, this looks like it should be creating a node-bootstrap SA, I do see other SA's defined here, but not this one.

@dgoodwin
Copy link
Contributor

Where does this fit in larger scope of things, is oc cluster up/join becoming less focused on dev experimentation and headed for something supported?

signClient = true
break
}
if len(a.approveKubeletServingCSRsForGroup) > 0 && g == a.approveKubeletServingCSRsForGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you sign serving certificates without confirming that my identity actually owns the hostname that you're signing for, then I can exploit the API to gain a serving cert for another node which in combination with an owned node network will allow me to steal any used secret in the system even if we close the API authorizer gaps.

I haven't dug into the pull, but if you allowed me to get signed cert for the address of the master, I could also steal any node credentials I wanted.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 26, 2016
@smarterclayton
Copy link
Contributor Author

Where does this fit in larger scope of things, is oc cluster up/join becoming less focused on dev experimentation and headed for something supported?

No, it would still be focused on the simplest possible workflow that can lead a cluster.

Name: fmt.Sprintf("node-bootstrapper-server-%s", safeSecretName(o.NodeArgs.NodeName)),
},
Spec: certificates.CertificateSigningRequestSpec{
Request: csrData,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO to limit to server usage in 1.6

if err != nil {
return err
}
if err := o.NodeArgs.MergeSerializeableNodeConfig(nodeConfig); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you sanity-checked if there's any difference between this and start node --write-config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but will do so again.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jan 24, 2017 via email

@liggitt
Copy link
Contributor

liggitt commented Jan 24, 2017

things we'll want to fix before this is no longer marked experimental:

  • make the cli work with a version of the certificates api other than v1alpha1
  • switch the auto-sign to gate on a permission check, not a hard-coded supergroup (I plan to do this upstream shortly)
  • reduce magic filenames in bootstrap file

@liggitt
Copy link
Contributor

liggitt commented Jan 24, 2017

make sure it's clearly marked experimental, make sure there are follow-ups for the items above. no other comments.

@smarterclayton
Copy link
Contributor Author

My execution plan for this is:

  1. merge
  2. extend openshift-ansible to have a role that sets up the node agent as a bootstrapper rather than a service model
  3. change ci.openshift.org to use that config model when baking nodes (prove baking nodes works)
  4. build an auto approver for ci.openshift.org that uses observe + cert data
  5. get experience in all the failure modes
  6. in 1.6 or 1.7 update node configuration as necessary, and possibly support "real" kube bootstrap mode (which looks like 1.7 at the earliest)

When the --bootstrap flag is provided to `start node`, require
--kubeconfig, connect to the master and perform a CSR request for a
client cert and a server cert. The node then attempts to read a
node-config ConfigMap from openshift-infra - if no value exists,
it uses the default config. If the CSR is not approved within a few
minutes, the node will crash loop.

On the master, enable the cert signer and grant the node-bootstrapper
service account the appropriate permission to request CSRs.
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jan 24, 2017 via email

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 2525370

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 2525370

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13220/) (Base Commit: c971421)

@openshift-bot
Copy link
Contributor

openshift-bot commented Jan 24, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13232/) (Base Commit: 22d4240) (Image: devenv-rhel7_5761)

@debianmaster
Copy link

can you please add a simple readme on how to use this feature?

@debianmaster
Copy link

still not clear how to generate the boostrap secret. can you please add simple readme or post commands here i will write a READMe i tried multiple options but could not get it right

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jul 15, 2017 via email

@debianmaster
Copy link

looks like a typo in node-boostrappe im going go try this and let you know

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jul 16, 2017 via email

@debianmaster
Copy link

debianmaster commented Jul 16, 2017

Tried with correct spelling now.

Here is what i tried on mac.

master

oc cluster up --docker-machine --create-machine 
oc serviceaccounts create-kubeconfig -n openshift-infra node-bootstrapper > secret 

node1

node1
oc cluster join  --docker-machine=n1 --create-machine --secret='conentent of secret' 

Failed with following error on node1 docker-machine

-- Finding server IP ...
   Using docker-machine IP 192.168.99.101 as the host IP
   Using 192.168.99.101 as the server IP
-- Joining OpenShift cluster ...
   Starting OpenShift Node using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     No log available from "origin" container

@Davidsoff
Copy link

Davidsoff commented Jul 20, 2017

same issue here using:

master

oc cluster up --use-existing-config --host-data-dir=/var/lib/origin/openshift.local.etcd/ --host-volumes-dir=/var/lib/origin/openshift.local.volumes/ --host-pv-dir=/var/lib/orign/openshift.local.pv/

node

oc login -t <token>
oc serviceaccounts create-kubeconfig -n openshift-infra node-bootstrapper > secret 
oc cluster join  --secret='conentent of secret'

result

failed on the node with the following error:

-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.6.0-rc.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using 127.0.0.1 as the server IP
-- Joining OpenShift cluster ...
   Starting OpenShift Node using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     No log available from "origin" container

@debianmaster
Copy link

cc/ @smarterclayton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants