Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make node to auto detect openshift network plugin from master #9147

Merged

Conversation

pravisankar
Copy link

@pravisankar
Copy link
Author

@openshift/networking @openshift/api-review PTAL

@smarterclayton
Copy link
Contributor

API approved

@danwinship
Copy link
Contributor

Maybe update ValidateClusterNetworkUpdate to only allow changing the plugin if you're switching between the two openshift-sdn plugins? (Or at least, you can only switch to one of the openshift-sdn plugins if you're also switching from one of the openshift-sdn plugins. Maybe some third-party plugin might also support switching from openshift-sdn, but we don't support switching from third-party plugins to us.)

@smarterclayton
Copy link
Contributor

smarterclayton commented Jun 6, 2016 via email

@danwinship
Copy link
Contributor

Maybe validation isn't the right place to put it, but if you switch from a non-openshift-sdn plugin to an openshift-sdn plugin in a non-empty cluster, you're probably going to end up with a broken network. (And given that the previous plugin might have done anything to the nodes'/pods' networking configuration, and there's no way to force it to clean up after itself, there's no way we could make such a switchover work, other than by rebooting all the nodes.) But maybe the best solution is just to document "don't do that then".

@smarterclayton
Copy link
Contributor

If the user deleted all the ClusterNetworks, would that be enough to
trigger a cleanup (obviously they wouldn't have their old data)?

On Mon, Jun 6, 2016 at 11:23 AM, Dan Winship [email protected]
wrote:

Maybe validation isn't the right place to put it, but if you switch from a
non-openshift-sdn plugin to an openshift-sdn plugin in a non-empty cluster,
you're probably going to end up with a broken network. (And given that the
previous plugin might have done anything to the nodes'/pods' networking
configuration, and there's no way to force it to clean up after itself,
there's no way we could make such a switchover work, other than by
rebooting all the nodes.) But maybe the best solution is just to document
"don't do that then".


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#9147 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pyCZzhwbcjbZHoP5Rzh5nJ0PLZvaks5qJDt-gaJpZM4Is66G
.

@danwinship
Copy link
Contributor

We don't currently watch for changes on ClusterNetwork at all, but even if we did, the HostSubnet and Pod records implicitly reference the ClusterNetwork too (since they contain IPs allocated out of the ClusterNetwork range), so you'd have to delete them too.

@pravisankar
Copy link
Author

While reviewing QE test cases, I realized a bunch of issues related to ClusterNetwork:

  • If someone migrates from redhat subnet/multitenant plugin to some other k8s network plugin, then oc get clusternetwork/hostsubnets shows stale output and it is confusing.
  • Unless we allow nodes with different network plugins in a cluster (unlikely?), 'networkPluginName' config param on node is redundant and can cause plugin mismatches.
  • We are allowing creation of ClusterNetwork other than 'default' and there is no way for the user to specify which ClusterNetwork to use. Don't see any benefit to enable this.

To alleviate some of these problems, I was thinking to do:

  • Remove existing clusternetwork/hostsubnets if the current network plugin is not subnet/multitenant.
  • Deprecate 'networkPluginName' node config and get the plugin name from master. If the node config is still used then enforce the plugin mismatch check.
  • Do not allow ClusterNetwork creation other than 'default'.
  • Ensure new nodes (i.e. nodes with these changes) work with old masters and old nodes work with new masters. We may not be able to perform plugin mismatch checks in case of old master or old node.

@knobunc @eparis @danwinship @dcbw @smarterclayton reasonable?

@danwinship
Copy link
Contributor

Remove existing clusternetwork/hostsubnets if the current network plugin is not subnet/multitenant.

It would be nice if we could make it clearer that those types are specific to openshift-sdn too, not origin-generic. (eg, we don't want to let people manually create a clusternetwork record if they aren't using openshift-sdn, because that wouldn't make any sense)

Deprecate 'networkPluginName' node config and get the plugin name from master. If the node config is still used then enforce the plugin mismatch check.

Agreed, I suggested that in the other bug too

Do not allow ClusterNetwork creation other than 'default'.

Yes, I think. I was looking into something similar while working on the egress firewall stuff, and was surprised that kubernetes doesn't have any concept of singleton types... Maybe that's something we want to try to address upstream.

@smarterclayton
Copy link
Contributor

On Tue, Jun 7, 2016 at 10:16 AM, Dan Winship [email protected]
wrote:

Remove existing clusternetwork/hostsubnets if the current network plugin
is not subnet/multitenant.

It would be nice if we could make it clearer that those types are specific
to openshift-sdn too, not origin-generic. (eg, we don't want to let people
manually create a clusternetwork record if they aren't using openshift-sdn,
because that wouldn't make any sense)

Why not? Why would a third party plugin not be able to use cluster network
records?

Deprecate 'networkPluginName' node config and get the plugin name from
master. If the node config is still used then enforce the plugin mismatch
check.

Agreed, I suggested that in the other bug too

Do not allow ClusterNetwork creation other than 'default'.

Yes, I think. I was looking into something similar while working on the
egress firewall stuff, and was surprised that kubernetes doesn't have any
concept of singleton types... Maybe that's something we want to try to
address upstream.

That's intentional, we're unlikely to add it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9147 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p2KSsiwXbixSVgwGHWbJNaop-ubEks5qJX0tgaJpZM4Is66G
.

@danwinship
Copy link
Contributor

Why not? Why would a third party plugin not be able to use cluster network records?

No other plugin has wanted to; it would make them OpenShift-only as opposed to being able to work with kubernetes in general.

I guess if origin itself created the ClusterNetwork record, then other plugins might want to consider using it to read their configuration. (But currently the ClusterNetwork gets created by the openshift-sdn master code, so if you're using an exec-based or CNI network plugin, then no ClusterNetwork record is going to get created, and there's no hook that would allow you to create it yourself.)

HostSubnet is more clearly openshift-sdn-specific. Eg, it assumes that each node has a single pod subnet, which isn't compatible with some network plugins.

@smarterclayton
Copy link
Contributor

I think my point is that the API objects are intended to be used to support
plugins. From an API perspective, we don't really care whether it's
openshift-sdn or not, so we certainly wouldn't write code that says "only
openshift-sdn can use these APIs". We can certainly say "we don't support
you mucking with these on your own".

On Tue, Jun 7, 2016 at 10:37 AM, Dan Winship [email protected]
wrote:

Why not? Why would a third party plugin not be able to use cluster network
records?

No other plugin has wanted to; it would make them OpenShift-only as
opposed to being able to work with kubernetes in general.

I guess if origin itself created the ClusterNetwork record, then other
plugins might want to consider using it to read their configuration. (But
currently the ClusterNetwork gets created by the openshift-sdn master code,
so if you're using an exec-based or CNI network plugin, then no
ClusterNetwork record is going to get created, and there's no hook that
would allow you to create it yourself.)

HostSubnet is more clearly openshift-sdn-specific. Eg, it assumes that
each node has a single pod subnet, which isn't compatible with some network
plugins.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#9147 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p6a2bB2kRZdkh9bqcvWzvR_AvRmEks5qJYI4gaJpZM4Is66G
.

@pravisankar pravisankar force-pushed the add-plugin-to-clusternetwork branch from 84dea25 to 6155b62 Compare June 10, 2016 02:07
@pravisankar pravisankar changed the title Added pluginName field to ClusterNetwork resource Make node to auto detect openshift network plugin from master Jun 10, 2016
@pravisankar
Copy link
Author

@openshift/networking @openshift/api-review updated, PTAL

if clusterNetwork.PluginName == "" {
// Do not return error in this case
glog.Warningf("either there is network plugin mismatch between OpenShift node(%q) and master or OpenShift master is running an older version where we did not persist plugin name", pluginName)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert this condition and return earlier.

@pravisankar pravisankar force-pushed the add-plugin-to-clusternetwork branch from 6155b62 to 15d00e6 Compare June 10, 2016 19:01
@pravisankar
Copy link
Author

@danwinship @dcbw @knobunc @smarterclayton updated, PTAL

@@ -582,6 +573,23 @@ func (c *MasterConfig) GetRestStorage() map[string]rest.Storage {
"clusterRoles": clusterRoleStorage,
}

if sdnovs.IsOpenShiftNetworkPlugin(c.Options.NetworkConfig.NetworkPluginName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this - it's a breaking change when cluster config changes, and we don't do it. All storage should always be enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When user switches from openshift network plugin to third party plugin, oc get clusternetwork/hostsubnets provides stale info and it will be confusing(clusternetwork will show multitenant/subnet pluginName when they are using a different one). I didn't prefer to delete existing clusternetwork/hostsubnets because any unintentional typo in plugin name will wipe out the configuration.
This is similar to storage for builds, build endpoints are enabled based on user config.These REST endpoints are specific to our SDN and if users are not using our plugin they shouldn't care about these endpoints?
@smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately these have always been enabled, we cannot disable them
without admin intervention. With your code change, they would suddenly be
disabled when the admin upgraded. That violates our backwards API
guarantees, where we don't change / break the behavior of the API without
admin intervention.

For 1.3 I don't think disabling these is worth the time or effort, since
it's unlikely users would use them. If you want to make this better we
should document that users should delete stale resources when they change
plugins.

On Mon, Jun 13, 2016 at 3:13 PM, Ravi Sankar Penta <[email protected]

wrote:

In pkg/cmd/server/origin/master.go
#9147 (comment):

@@ -582,6 +573,23 @@ func (c *MasterConfig) GetRestStorage() map[string]rest.Storage {
"clusterRoles": clusterRoleStorage,
}

  • if sdnovs.IsOpenShiftNetworkPlugin(c.Options.NetworkConfig.NetworkPluginName) {

When user switches from openshift network plugin to third party plugin, oc
get clusternetwork/hostsubnets provides stale info and it will be
confusing(clusternetwork will show multitenant/subnet pluginName when they
are using a different one). I didn't prefer to delete existing
clusternetwork/hostsubnets because any unintentional typo in plugin name
will wipe out the configuration.
This is similar to storage for builds, build endpoints are enabled based
on user config.These REST endpoints are specific to our SDN and if users
are not using our plugin they shouldn't care about these endpoints?
@smarterclayton https://github.com/smarterclayton


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/9147/files/816de37bfe1ff6cef3fbf81df8096bffa2d06030..06546b65a4c9375014c5d3ee8b80056afd73b78d#r66849632,
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p0s_pidYXnJUqVflXfqQ4RxmiwXrks5qLau8gaJpZM4Is66G
.

@pravisankar pravisankar force-pushed the add-plugin-to-clusternetwork branch 2 times, most recently from 1e74669 to ea57398 Compare June 13, 2016 19:41
@pravisankar
Copy link
Author

@danwinship @smarterclayton
Reverted blocking clusternetwork creation and disabling SDN specific endpoints.
I will update the docs to delete stale resources when users switch to different plugin.

@smarterclayton
Copy link
Contributor

[test]

@smarterclayton
Copy link
Contributor

Compile failure:

# github.com/openshift/origin/pkg/cmd/server/kubernetes
_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node_config.go:177: undefined: ovs.IsOpenShiftNetworkPlugin
_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node_config.go:421: undefined: ovs.IsOpenShiftNetworkPlugin
!!! Error in hack/../hack/common.sh:302

Changes look good, update that and once I see a green test I'll merge.

@pravisankar
Copy link
Author

Needs corresponding openshift-sdn pr openshift/openshift-sdn#318 for this to work. I will sync the new changes from openshift-sdn repo.

@pravisankar pravisankar force-pushed the add-plugin-to-clusternetwork branch from ea57398 to 1c6ea24 Compare June 15, 2016 19:49
@pravisankar
Copy link
Author

[test]

@openshift-bot
Copy link
Contributor

openshift-bot commented Jun 15, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5007/) (Image: devenv-rhel7_4394)

@liggitt
Copy link
Contributor

liggitt commented Jun 15, 2016

Please hold this for the rebase which is in the merge queue

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016
Ravi Sankar Penta added 5 commits June 16, 2016 10:36
Master persists the pluginName and all nodes fetchs this info to
detect network plugin mismatch.
Node config nodeConfig.NetworkPluginName is optional for openshift network plugin.
If plugin is not specified, it will auto detect by looking at ClusterNetwork.
If plugin is specified, we will validate both node and master are using same network plugin or not.
nodeConfig.NetworkPluginName is not deprecated to support third party network plugins.
We can't expect or enforce third party node network plugin to auto detect from master.

When openshift network plugin is used, fix check to block node start when kubelet is enabled but not plugin and viceversa.
…etwork resource and node networkPluginName changes
@pravisankar pravisankar force-pushed the add-plugin-to-clusternetwork branch from 1c6ea24 to e4ef036 Compare June 16, 2016 17:38
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 16, 2016
@pravisankar
Copy link
Author

re[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to e4ef036

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/5007/)

@pravisankar
Copy link
Author

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to e4ef036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants