sdn: set veth TX queue length to unblock QoS #11126

dcbw · 2016-09-27T20:09:51Z

docker/libnetwork appear to create the pod veth interfaces with
an explicitly 0 txqlen instead of leaving the default 1000 that
the kernel would assign. A txqlen of 0 blocks any packets when
QoS is enabled for a port due to the way the Linux HTB shaper
works. Explicitly set the veth txqlen to make QoS work again.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1378000

docker/libnetwork appear to create the pod veth interfaces with an explicitly 0 txqlen instead of leaving the default 1000 that the kernel would assign. A txqlen of 0 blocks any packets when QoS is enabled for a port due to the way the Linux HTB shaper works. Explicitly set the veth txqlen to make QoS work again. Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1378000

dcbw · 2016-09-27T20:10:05Z

@openshift/networking @eparis

knobunc

LGTM, but should we open an issue with docker / libnetwork too?

dcbw · 2016-09-27T20:19:14Z

See https://bugzilla.redhat.com/show_bug.cgi?id=1152231 for background on why veth interfaces now work this way...

dcbw · 2016-09-27T20:35:29Z

LGTM, but should we open an issue with docker / libnetwork too?

Its pretty murky; the explicit txqlen:0 was done back in 2014 to work around the bug that the kernel change from 1152231 fixes. So maybe it's time to revisit. But since we may not use docker for creating the veth in the future anyway, it probably doesn't matter much to us.

danwinship · 2016-09-27T21:55:11Z

OK... so...

With old kernels, veths were created by default in such a way that they had bad performance (docker-archive/libcontainer#193). Setting txqueuelen to 0 fixes the performance problem but would cause explicitly-added qdiscs to fail (docker-archive/libcontainer#193 (comment)). Docker went with that approach anyway though, because they decided qdiscs were an advanced feature they didn't need to support (docker-archive/libcontainer#193 (comment)).

Except meanwhile, it turns out that the original theory that setting txqueuelen=0 would cause explicitly-added qdiscs to fail was mostly wrong, and in particular doesn't apply to the "htb" qdisc which we are using (http://www.spinics.net/lists/netdev/msg337586.html) (although it defaults to txqueuelen=2 in this case, not 1000, which seems like it might be... poor?) So with old kernels, OpenShift worked, even when using QoS.

Phil and Jesper eventually fixed the original "veths were created by default in such a way that they had bad performance" bug (https://bugzilla.redhat.com/show_bug.cgi?id=1152231), which means that the original libcontainer "set txqueuelen to 0" workaround is no longer needed (yay!), but in the process they also removed the workaround for txqueuelen==0 in htb (https://www.mail-archive.com/[email protected]/msg74390.html), which means that the "set txqueuelen to 0" workaround is no longer allowed (boo!). (This seems like it might count as the sort of thing that kernel hackers consider an API break? Or maybe not, depending on exactly how poorly the txqueuelen of 2 worked...)

Anyway... with this patch:

old kernel: libcontainer sets txqueuelen to 0 and then brings up the veth, which causes the kernel to create it with no qdisc, which means it has good performance. Your patch then sets txqueuelen to 1000 before adding the htb qdisc, which has the desired effect (which is probably better than the old behavior)
new kernel: libcontainer sets txqueuelen to 0 and then brings up the veth, which would have been created with no qdisc anyway, and the txqueuelen has no effect. Your patch then sets txqueuelen to 1000 before adding the htb qdisc, which has the desired effect (which is certainly better than the old behavior of attaching the htb qdisc with txqueuelen 0, which broke everything).

So LGTM. [merge]

openshift-bot · 2016-09-27T22:02:32Z

[Test]ing while waiting on the merge queue

openshift-bot · 2016-09-27T22:22:52Z

Evaluated for origin test up to 8a2fbcf

openshift-bot · 2016-09-28T00:22:27Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9384/)

dcbw · 2016-09-28T03:20:27Z

Travis failure is #11015 so re-[merge]

pravisankar · 2016-09-28T20:17:16Z

LGTM, @danwinship thx for the nice explanation

knobunc · 2016-09-29T11:58:23Z

[merge]

dcbw · 2016-09-29T22:44:16Z

flake is #11165 re-[merge]

openshift-bot · 2016-09-29T22:47:25Z

Evaluated for origin merge up to 8a2fbcf

openshift-bot · 2016-09-29T23:37:24Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9526/) (Image: devenv-rhel7_5102)

It is a clear misconfiguration to attach a qdisc to a device with tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred, htb, plug and sfb) inherit/copy this value as their queue length. Why should the kernel catch such a misconfiguration? Because prior to introducing the IFF_NO_QUEUE device flag, userspace found a loophole in the qdisc config system that allowed them to achieve the equivalent of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from a device. The loophole on older kernels is setting tx_queue_len=0, *prior* to device qdisc init (the config time is significant, simply setting tx_queue_len=0 doesn't trigger the loophole). This loophole is currently used by Docker[1] to get better performance and scalability out of the veth device. The Docker developers were warned[1] that they needed to adjust the tx_queue_len if ever attaching a qdisc. The OpenShift project didn't remember this warning and attached a qdisc, this were caught and fixed in[2]. [1] docker-archive/libcontainer#193 [2] openshift/origin#11126 Instead of fixing every userspace program that used this loophole, and forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's catch the misconfiguration on the kernel side. Signed-off-by: Jesper Dangaard Brouer <[email protected]>

It is a clear misconfiguration to attach a qdisc to a device with tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred, htb, plug and sfb) inherit/copy this value as their queue length. Why should the kernel catch such a misconfiguration? Because prior to introducing the IFF_NO_QUEUE device flag, userspace found a loophole in the qdisc config system that allowed them to achieve the equivalent of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from a device. The loophole on older kernels is setting tx_queue_len=0, *prior* to device qdisc init (the config time is significant, simply setting tx_queue_len=0 doesn't trigger the loophole). This loophole is currently used by Docker[1] to get better performance and scalability out of the veth device. The Docker developers were warned[1] that they needed to adjust the tx_queue_len if ever attaching a qdisc. The OpenShift project didn't remember this warning and attached a qdisc, this were caught and fixed in[2]. [1] docker-archive/libcontainer#193 [2] openshift/origin#11126 Instead of fixing every userspace program that used this loophole, and forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's catch the misconfiguration on the kernel side. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>

knobunc approved these changes Sep 27, 2016

View reviewed changes

dcbw added the component/networking label Sep 28, 2016

openshift-bot merged commit dfab2c1 into openshift:master Sep 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdn: set veth TX queue length to unblock QoS #11126

sdn: set veth TX queue length to unblock QoS #11126

dcbw commented Sep 27, 2016

dcbw commented Sep 27, 2016

knobunc left a comment

dcbw commented Sep 27, 2016

dcbw commented Sep 27, 2016

danwinship commented Sep 27, 2016

openshift-bot commented Sep 27, 2016

openshift-bot commented Sep 27, 2016

openshift-bot commented Sep 28, 2016

dcbw commented Sep 28, 2016

pravisankar commented Sep 28, 2016

knobunc commented Sep 29, 2016

dcbw commented Sep 29, 2016

openshift-bot commented Sep 29, 2016

openshift-bot commented Sep 29, 2016 •

edited

Loading

sdn: set veth TX queue length to unblock QoS #11126

sdn: set veth TX queue length to unblock QoS #11126

Conversation

dcbw commented Sep 27, 2016

dcbw commented Sep 27, 2016

knobunc left a comment

Choose a reason for hiding this comment

dcbw commented Sep 27, 2016

dcbw commented Sep 27, 2016

danwinship commented Sep 27, 2016

openshift-bot commented Sep 27, 2016

openshift-bot commented Sep 27, 2016

openshift-bot commented Sep 28, 2016

dcbw commented Sep 28, 2016

pravisankar commented Sep 28, 2016

knobunc commented Sep 29, 2016

dcbw commented Sep 29, 2016

openshift-bot commented Sep 29, 2016

openshift-bot commented Sep 29, 2016 • edited Loading

openshift-bot commented Sep 29, 2016 •

edited

Loading