Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up OpenShift SDN network artifacts when using third party network plugin #11947

Closed

Conversation

xiangpengzhao
Copy link
Contributor

If users forget to clean up OpenShift SDN network artifacts when migrating between SDN plugins, CLI functionality oadm pod-network validation #11880 will get incorrect result due to the existed clusternetwork info. This PR cleans up OpenShift SDN network artifacts if exist when using third party network plugin. Then users don't need to clean up manually.

cc @knobunc @pravisankar

@pweil- pweil- added component/networking kind/bug Categorizes issue or PR as related to a bug. priority/P2 labels Nov 17, 2016
Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

I like the intent, and the code seems reasonable enough... but how does this scale? As we add more and more networking plugins, will we have to call the clean-up code for all of the ones we aren't running (or when they all live outside the main tree, how would we even do that)?

@openshift/networking @liggitt Any thoughts?

@danwinship
Copy link
Contributor

I feel like we had this discussion in a previous bug and decided against it. Maybe because we wanted to support the possibility of other plugins using HostSubnet, etc, if they wanted?

Another bad side effect: if you try to switch from "redhat/openshift-ovs-subnet" to "redhat/openshift-ovs-multitenan" [sic], then it will delete all of your openshift-sdn network state, and then tell you that no such plugin exists, and exit with an error. So if we are going to do cleanup, it would be better if we didn't do it until after verifying that the plugin name is actually valid.

If the problem is just that ClusterNetwork is out of sync once you switch to a non-osdn plugin, maybe we should just move the ClusterNetwork handling out of the plugin and have it always be updated at startup, regardless of plugin.

@xiangpengzhao
Copy link
Contributor Author

It seems that there is no scaling issue. It checks the using plugin by IsOpenShiftNetworkPlugin and decides whether to clean up and return or not. No matter what the third party network plugin is, the clean-up just cleans up the artifacts which OpenShift SDN produced before in the same manner.

I like the intent, and the code seems reasonable enough... but how does this scale? As we add more and more networking plugins, will we have to call the clean-up code for all of the ones we aren't running (or when they all live outside the main tree, how would we even do that)?

@xiangpengzhao
Copy link
Contributor Author

Reasonable. But if other plugins want to use HostSubnet, they have to start the cluster using OpenShift SDN firstly in order to produce HostSubtnet, and then migrate to their plugins. Is this a good way?
Maybe there were also other reasons in the discussion you mentioned?

I feel like we had this discussion in a previous bug and decided against it. Maybe because we wanted to support the possibility of other plugins using HostSubnet, etc, if they wanted?

IMO, this will not happen. If we want to switch from "redhat/openshift-ovs-subnet" to "redhat/openshift-ovs-multitenan", the condition if !osapi.IsOpenShiftNetworkPlugin(networkConfig.NetworkPluginName) will be false, then the clean-up will not be called.

Another bad side effect: if you try to switch from "redhat/openshift-ovs-subnet" to "redhat/openshift-ovs-multitenan" [sic], then it will delete all of your openshift-sdn network state, and then tell you that no such plugin exists, and exit with an error. So if we are going to do cleanup, it would be better if we didn't do it until after verifying that the plugin name is actually valid.

Agreed.

If the problem is just that ClusterNetwork is out of sync once you switch to a non-osdn plugin, maybe we should just move the ClusterNetwork handling out of the plugin and have it always be updated at startup, regardless of plugin.

@xiangpengzhao
Copy link
Contributor Author

Further discussion, if necessary, is welcome. If we don't have enough reasons for the cleanup, just close the PR :)
cc @knobunc @danwinship

@danwinship
Copy link
Contributor

IMO, this will not happen. If we want to switch from "redhat/openshift-ovs-subnet" to "redhat/openshift-ovs-multitenan", the condition if !osapi.IsOpenShiftNetworkPlugin(networkConfig.NetworkPluginName) will be false, then the clean-up will not be called.

Note the typo in "redhat/openshift-ovs-multitenan" (missing "t" at the end). My point was that if someone tries to switch between the two osdn plugins, but they get the name wrong, then with this patch OpenShift will destroy their existing networking data first, and then notice that there's a typo and it's unable to continue.

@knobunc
Copy link
Contributor

knobunc commented Nov 18, 2016

It seems that there is no scaling issue. It checks the using plugin by IsOpenShiftNetworkPlugin and decides whether to clean up and return or not. No matter what the third party network plugin is, the clean-up just cleans up the artifacts which OpenShift SDN produced before in the same manner.

Sorry, not a scaling issue in terms of performance. But rather scaling in terms of what happens as we, and other people, add more networking plugins.

@danwinship
Copy link
Contributor

openshift-sdn is a special case, in that parts of it exist whether or not you're using it, which can cause problems as with #11880. We don't have to worry about arbitrary plugins. (Though also, as above, I think the fix for #11880 is that we should have ClusterNetwork always reflect the currently-selected plugin, even when it's not openshift-sdn.)

@xiangpengzhao
Copy link
Contributor Author

@knobunc @danwinship 100% agree with both of you. Sorry for the misunderstanding in my last comments. 😅

@danwinship With regard to your last comment, I thought that Should we store a fake clusternetwork info with only the plugin name when using a third party plugin? in #11880 inline comment. I didn't know if this way is good enough to reflect the currently-selected plugin.

@@ -27,6 +27,7 @@ type OsdnMaster struct {

func StartMaster(networkConfig osconfigapi.MasterNetworkConfig, osClient *osclient.Client, kClient *kclient.Client) error {
if !osapi.IsOpenShiftNetworkPlugin(networkConfig.NetworkPluginName) {
cleanupOpenShiftNetworkArtifacts(osClient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately not safe. A misconfiguration on the master can remove all client info. Instead, it would be more appropriate to log an event and an error to glog if any of the resources exist, and instruct administrators to run:

$ oc delete cluster-network,host-subnets,net-namespaces --all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Agreed. Thanks Clayton!
Do you mean that we should record an event in some way like recorder.Eventf? If so, where will the event be seen by users? I'm not familiar with the event record mechanism, sorry:) Is just logging an error enough? It seems that there are no other events recorded in master starting process.

@smarterclayton
Copy link
Contributor

I think it's ok to warn the admin this content exists, we just can't delete it wholesale. Logging and firing an event would be appropriate.

@pweil-
Copy link

pweil- commented Jun 26, 2017

closing due to inactivity

@pweil- pweil- closed this Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/networking kind/bug Categorizes issue or PR as related to a bug. priority/P2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants