Minimize direct API calls in SDN master by using informer cache #18911

pravisankar · 2018-03-09T04:08:09Z

https://trello.com/c/Uifetuz3/597-5-minimize-direct-api-calls-in-sdn-using-shared-informers

pravisankar · 2018-03-09T04:09:15Z

@openshift/sig-networking PTAL

pravisankar · 2018-03-09T18:48:06Z

/retest

danwinship · 2018-03-09T19:01:19Z

Allow allocator to mark assigned subnet

What is this commit doing?

danwinship · 2018-03-09T19:12:40Z

Use existing network informers from openshift controller context

This seems completely independent of everything else, and needs a pkg/cmd approver, so you probably should split it into a separate PR

- Currently, we need to pass all allocated subnets during subnet allocator creation time (inUse arg to NewSubnetAllocator()). This means we need to know all existing subnets beforehand. - This change exposes additional method so that we can mark a specific subnet as already allocated dynamically (after the subnet allocated is created). Precursor for openshift#18911

- Currently, we need to pass all allocated subnets during subnet allocator creation time (inUse arg to NewSubnetAllocator()). This means we need to know all existing subnets beforehand. - This change exposes additional method so that we can mark a specific subnet as already allocated dynamically (after the subnet allocator is created). Precursor for openshift#18911

pravisankar · 2018-03-15T21:34:06Z

On Fri, Mar 9, 2018 at 11:01 AM, Dan Winship ***@***.***> wrote: Allow allocator to mark assigned subnet What is this commit doing?

Added more details in #18999 which needs pkg/util approver

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18911 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABM0hp403dEGjFGSW26RsKO4sx7Iq7jvks5tctGCgaJpZM4SjueJ> .

pravisankar · 2018-03-15T21:35:24Z

On Fri, Mar 9, 2018 at 11:12 AM, Dan Winship ***@***.***> wrote: Use existing network informers from openshift controller context This seems completely independent of everything else, and needs a pkg/cmd approver, so you probably should split it into a separate PR

Created #18998

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#18911 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABM0hg4Bw48_AMw59X0BDYAV7e5A9SE7ks5tctQtgaJpZM4SjueJ> .

- Currently, we need to pass all allocated subnets during subnet allocator creation time (inUse arg to NewSubnetAllocator()). This means we need to know all existing subnets beforehand. - This change exposes additional method so that we can mark a specific subnet as already allocated dynamically (after the subnet allocator is created). Precursor for openshift#18911

Automatic merge from submit-queue (batch tested with PRs 18999, 18543). Allow subnet allocator to mark assigned subnet dynamically - Moved subnet allocator from pkg/util/netutils to pkg/network/master Subnet allocator is specific to SDN master and not used anywhere. - Currently, we need to pass all allocated subnets during subnet allocator creation time (inUse arg to NewSubnetAllocator()). This means we need to know all existing subnets beforehand. - This change exposes additional method so that we can mark a specific subnet as already allocated dynamically (after the subnet allocator is created). Precursor for #18911

pravisankar · 2018-03-21T21:31:28Z

dependent prs got merged, now this is ready for review/merge
@openshift/networking @danwinship PTAL

dcbw · 2018-03-22T18:49:37Z

pkg/network/master/subnets.go

+}
+
+func (master *OsdnMaster) handleDeleteNode(obj interface{}) {
+	node := obj.(*kapi.Node)


do we need the "Tombstone" pattern here? eg

ns, ok := obj.(*kapi.Namespace) if !ok { tombstone, ok := obj.(cache.DeletedFinalStateUnknown) if !ok { logrus.Errorf("couldn't get object from tombstone %+v", obj) return } ns, ok = tombstone.Obj.(*kapi.Namespace) if !ok { logrus.Errorf("tombstone contained object that is not a namespace %#v", obj) return } }

No, 'Tombstone' pattern is already handled in InformerFuncs()
https://github.com/pravisankar/origin/blob/c367802e15a6070aefe9ca83823153da9b31be96/pkg/network/master/subnets.go#L45
https://github.com/pravisankar/origin/blob/c367802e15a6070aefe9ca83823153da9b31be96/pkg/network/common/informers.go#L27

pravisankar · 2018-03-22T23:08:29Z

/retest

danwinship · 2018-03-23T14:17:24Z

pkg/network/master/master.go

+
+func (master *OsdnMaster) startSubSystems(pluginName string) {
+	if err := master.SubnetStartMaster(master.networkInfo.ClusterNetworks); err != nil {
+		panic(err)


glog.Fatalf
(likewise below and in other commits)

danwinship · 2018-03-23T14:52:46Z

pkg/network/master/vnids.go

@@ -77,26 +75,17 @@ func (vmap *masterVNIDMap) isAdminNamespace(nsName string) bool {
 	return false
 }

-func (vmap *masterVNIDMap) populateVNIDs(netNamespaceInformer networkinformers.NetNamespaceInformer) error {


This patch doesn't work. You're removing the initial bulk NetNamespace->netIDManager initialization, but there isn't any lazy NetNamespace->netIDManager initialization anywhere, so it will never learn the existing namespace->VNID mappings. So when you start up the master the second time, it will see Namespace Added events for each pre-existing Namespace, and it will call assignVNID, which will call allocateNetID, which will call getVNID to see if the Namespace is already known, but it's not already known, so that will return !found, and so then allocateNetID will call vmap.netIDManager.AllocateNext() to allocate a new VNID, then try to Create a NetNamespace object with that VNID, fail because the NetNamespace already exists, and then log an error.

Even if you updated handleAddOrUpdateNetNamespace to do the right thing, you'd still have to deal with the fact that you don't know if you're going to see Namespace "foo" or NetNamespace "foo" first. So I think basically you can't get rid of populateVNIDs.

fixed, initializing vnid allocator is done in 00bf4a7

danwinship · 2018-03-28T21:04:15Z

/lol

danwinship · 2018-03-29T15:21:28Z

pkg/network/master/master.go

 	go master.startSubSystems(networkConfig.NetworkPluginName)

 	return nil
 }

+func (master *OsdnMaster) initSubSystems() error {


There's no reason this has to be a separate function, is there? You could just as easily call initSubnetMaster() at the start of startSubSystems().

Actually is there even any reason left at this point for separate init and start methods? It's true that we do that in some other files but that's usually because the start method does things we don't want to do from unit tests. But there are no unit tests of these files (and the init method does things we probably wouldn't want done from unit tests as well anyway).

Same comment applies to vnid code

- Move HostSubnet reconciliation inside watchSubnets() which will help us to detect and fix issues sooner and not during next restart. - Move host IP validations for existing subnets inside watchSubnets() Subnet watch gets both existing and new subnets and this avoids duplicate host IP validations. - Now we do not need to call release subnet in all the cluster networks.

- Removed inUse arg to newSubnetAllocator (no longer used) - Renamed method names

pravisankar · 2018-03-30T17:36:42Z

@danwinship PTAL

pravisankar · 2018-03-31T02:19:59Z

/retest

pravisankar · 2018-03-31T08:30:23Z

/test gcp

danwinship · 2018-04-02T16:56:31Z

/test extended_networking

danwinship · 2018-04-02T17:18:51Z

/lgtm

openshift-ci-robot · 2018-04-02T17:18:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, pravisankar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/network/OWNERS~~ [danwinship,pravisankar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2018-04-02T17:19:32Z

/hold
until extended_networking passes

pravisankar · 2018-04-02T18:31:44Z

extended_networking tests passed
/hold cancel

- Currently, we need to pass all allocated subnets during subnet allocator creation time (inUse arg to NewSubnetAllocator()). This means we need to know all existing subnets beforehand. - This change exposes additional method so that we can mark a specific subnet as already allocated dynamically (after the subnet allocator is created). Precursor for openshift/origin#18911

openshift-ci-robot requested review from juanvallejo and knobunc March 9, 2018 04:08

openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 9, 2018

openshift-ci-robot added the sig/networking label Mar 9, 2018

pravisankar requested review from danwinship, dcbw and rajatchopra March 9, 2018 04:10

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2018

pravisankar mentioned this pull request Mar 15, 2018

Allow subnet allocator to mark assigned subnet dynamically #18999

Merged

pravisankar added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 15, 2018

pravisankar force-pushed the sdn-minimize-live-apicalls branch from 02ecfac to 5920c3c Compare March 20, 2018 20:46

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2018

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2018

pravisankar force-pushed the sdn-minimize-live-apicalls branch from 5920c3c to 9be3468 Compare March 21, 2018 21:15

pravisankar removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2018

dcbw reviewed Mar 22, 2018

View reviewed changes

danwinship suggested changes Mar 23, 2018

View reviewed changes

openshift-ci-robot added the ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ label Mar 28, 2018

danwinship reviewed Mar 29, 2018

View reviewed changes

Ravi Sankar Penta added 9 commits March 30, 2018 10:32

SDN master: Start subnet and vnid sub-systems asynchronously

3973ae1

Wait for informer sync before starting sub-systems in SDN master

14ed1e0

Use node and host subnet cache in pkg/network/master/subnets.go

d2211b3

Use netNamespace cache in pkg/network/master/vnids.go

d78c7bb

SDN master VNID cleanup

41293e8

Split allocators from subnet handling in pkg/network/master/subnets.go

3f3be98

Subnet allocator cleanup

9f87155

- Removed inUse arg to newSubnetAllocator (no longer used) - Renamed method names

Call deleteNode() only if node UID matches

ae0f3e1

pravisankar force-pushed the sdn-minimize-live-apicalls branch from d2baa57 to ae0f3e1 Compare March 30, 2018 17:34

openshift-ci-robot assigned danwinship Apr 2, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 2, 2018

danwinship approved these changes Apr 2, 2018

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2018

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2018

openshift-merge-robot merged commit 2cf54b9 into openshift:master Apr 2, 2018

danwinship mentioned this pull request Apr 4, 2018

hostsubnet and netnamespace controllers can hang on start #19217

Closed

pravisankar mentioned this pull request Apr 9, 2018

Hoist sdn informer into Start() #19285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize direct API calls in SDN master by using informer cache #18911

Minimize direct API calls in SDN master by using informer cache #18911

pravisankar commented Mar 9, 2018

pravisankar commented Mar 9, 2018

pravisankar commented Mar 9, 2018

danwinship commented Mar 9, 2018 •

edited by pravisankar

Loading

danwinship commented Mar 9, 2018

pravisankar commented Mar 15, 2018 via email

pravisankar commented Mar 15, 2018 via email

pravisankar commented Mar 21, 2018

dcbw Mar 22, 2018

pravisankar Mar 22, 2018

pravisankar commented Mar 22, 2018

danwinship Mar 23, 2018

danwinship Mar 23, 2018

pravisankar Mar 26, 2018

danwinship commented Mar 28, 2018

danwinship Mar 29, 2018

pravisankar commented Mar 30, 2018

pravisankar commented Mar 31, 2018

pravisankar commented Mar 31, 2018

danwinship commented Apr 2, 2018

danwinship commented Apr 2, 2018

openshift-ci-robot commented Apr 2, 2018

danwinship commented Apr 2, 2018

pravisankar commented Apr 2, 2018

Minimize direct API calls in SDN master by using informer cache #18911

Minimize direct API calls in SDN master by using informer cache #18911

Conversation

pravisankar commented Mar 9, 2018

pravisankar commented Mar 9, 2018

pravisankar commented Mar 9, 2018

danwinship commented Mar 9, 2018 • edited by pravisankar Loading

danwinship commented Mar 9, 2018

pravisankar commented Mar 15, 2018 via email

pravisankar commented Mar 15, 2018 via email

pravisankar commented Mar 21, 2018

dcbw Mar 22, 2018

Choose a reason for hiding this comment

pravisankar Mar 22, 2018

Choose a reason for hiding this comment

pravisankar commented Mar 22, 2018

danwinship Mar 23, 2018

Choose a reason for hiding this comment

danwinship Mar 23, 2018

Choose a reason for hiding this comment

pravisankar Mar 26, 2018

Choose a reason for hiding this comment

danwinship commented Mar 28, 2018

danwinship Mar 29, 2018

Choose a reason for hiding this comment

pravisankar commented Mar 30, 2018

pravisankar commented Mar 31, 2018

pravisankar commented Mar 31, 2018

danwinship commented Apr 2, 2018

danwinship commented Apr 2, 2018

openshift-ci-robot commented Apr 2, 2018

danwinship commented Apr 2, 2018

pravisankar commented Apr 2, 2018

danwinship commented Mar 9, 2018 •

edited by pravisankar

Loading