EgressNetworkPolicy rewrite drop traffic for too long #19276

mjudeikis · 2018-04-09T15:33:01Z

I have multiple EgressNetworkPolicies in my cluster.

Policy:

{
    "kind": "EgressNetworkPolicy",
    "apiVersion": "v1",
    "metadata": {
        "name": "default"
    },
    "spec": {
        "egress": [
            {
                "type": "Allow",
                "to": {
                    "cidrSelector": "1.2.3.0/24"
                }
            },
            {
                "type": "Allow",
                "to": {
                    "dnsName": "www.foo.com"
                }
            },
            {
                "type": "Deny",
                "to": {
                    "cidrSelector": "0.0.0.0/0"
                }
            }
        ]
    }
}

namespace:

[root@console-REPL ~]# oc describe netnamespace labs-dev                                                 
Name:           labs-dev                            
Created:        About an hour ago                   
Labels:         <none>                              
Annotations:    <none>                              
Name:           labs-dev                            
ID:             1674326                             
Egress IPs:     <none>                              
[root@console-REPL ~]# oc describe netnamespace labs-dev

So I have rules created:

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-flows br0 | grep 198c56                                                                                                                                               
 cookie=0x0, duration=3105.808s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.26.73,nw_frag=later actions=load:0x198c56->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80                    
 cookie=0x0, duration=3105.801s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.26.73,tp_dst=8080 actions=load:0x198c56->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80                     
 cookie=0x0, duration=3140.219s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0x198c56,reg1=0x198c56 actions=output:NXM_NX_REG2[]                                                                           
 cookie=0x0, duration=253.253s, table=101, n_packets=0, n_bytes=0, priority=3,ip,reg0=0x198c56,nw_dst=1.2.3.0/24 actions=output:2                                                                                  
 cookie=0x0, duration=253.247s, table=101, n_packets=0, n_bytes=0, priority=2,ip,reg0=0x198c56,nw_dst=107.23.81.155 actions=output:2                                                                               
 cookie=0x0, duration=253.242s, table=101, n_packets=0, n_bytes=0, priority=2,ip,reg0=0x198c56,nw_dst=107.23.205.115 actions=output:2                                                                              
 cookie=0x0, duration=253.233s, table=101, n_packets=0, n_bytes=0, priority=1,ip,reg0=0x198c56 actions=drop                                                                                                        
[root@node1 ~]#

All good and nice. But there is this piece of code:
https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L426
which runs each ~30 min.

Now my www.foo.com is behind internal dns and its slower than usual.

Spolier. From this point I do some assumption without knowing code base
This part of code get the long execution and hangs (?)
https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L447-L487

DNS should be taken from the cache, but by any change, it can be blocked by this:
https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L490

Long story short, time between:
otx.AddFlow("table=101, reg0=%d, cookie=1, priority=65535, actions=drop", vnid)
and
otx.DeleteFlows("table=101, reg0=%d, cookie=1/1", vnid)
in the code causes downtime.

still trying to debug onsite with the cluster. will update as I find something. But any comments are welcome.

The text was updated successfully, but these errors were encountered:

mjudeikis · 2018-04-09T15:37:40Z

ping @openshift/networking ?

mjudeikis · 2018-04-09T15:57:22Z

So this is dump for one of "this case"

[root@ocpd00068 flows]# grep 0x7f33b6 ./flows.2018_03_09_21:54_52 | grep table=101
		cookie=0x1,  table=101,   priority=65535,reg0=0x7f33b6 actions=drop
cookie=0x0,  table=101,   priority=21,ip,reg0=0x7f33b6,nw_dst=10.254.4.130 actions=output:2
cookie=0x0,  table=101,   priority=20,ip,reg0=0x7f33b6,nw_dst=10.254.0.101 actions=output:2
cookie=0x0,  table=101,   priority=19,ip,reg0=0x7f33b6,nw_dst=10.0.27.1 actions=output:2
cookie=0x0,  table=101,   priority=18,ip,reg0=0x7f33b6,nw_dst=10.239.98.20 actions=output:2
cookie=0x0,  table=101,   priority=17,ip,reg0=0x7f33b6,nw_dst=10.0.17.2 actions=output:2
cookie=0x0,  table=101,   priority=16,ip,reg0=0x7f33b6,nw_dst=10.0.17.1 actions=output:2
cookie=0x0,  table=101,   priority=15,ip,reg0=0x7f33b6,nw_dst=10.254.0.100 actions=output:2
cookie=0x0,  table=101,   priority=14,ip,reg0=0x7f33b6,nw_dst=10.0.17.199 actions=output:2
cookie=0x0,  table=101,   priority=13,ip,reg0=0x7f33b6,nw_dst=10.254.20.100 actions=output:2
cookie=0x0,  table=101,   priority=12,ip,reg0=0x7f33b6,nw_dst=10.0.21.59 actions=output:2
cookie=0x0,  table=101,   priority=11,ip,reg0=0x7f33b6,nw_dst=10.253.0.28 actions=output:2
cookie=0x0,  table=101,   priority=10,ip,reg0=0x7f33b6,nw_dst=10.253.2.101 actions=output:2
cookie=0x0,  table=101,   priority=9,ip,reg0=0x7f33b6,nw_dst=10.253.2.122 actions=output:2
cookie=0x0,  table=101,   priority=8,ip,reg0=0x7f33b6,nw_dst=10.0.34.105 actions=output:2
cookie=0x0,  table=101,   priority=7,ip,reg0=0x7f33b6,nw_dst=10.239.98.125 actions=output:2
cookie=0x0,  table=101,   priority=6,ip,reg0=0x7f33b6,nw_dst=10.239.98.125 actions=output:2

[root@ocpd00068 flows]# grep 0x7f33b6 ./flows.2018_03_09_21:54_53 | grep table=101
		cookie=0x1,  table=101,   priority=65535,reg0=0x7f33b6 actions=drop
cookie=0x0,  table=101,   priority=21,ip,reg0=0x7f33b6,nw_dst=10.254.4.130 actions=output:2
cookie=0x0,  table=101,   priority=20,ip,reg0=0x7f33b6,nw_dst=10.254.0.101 actions=output:2
cookie=0x0,  table=101,   priority=19,ip,reg0=0x7f33b6,nw_dst=10.0.27.1 actions=output:2
cookie=0x0,  table=101,   priority=18,ip,reg0=0x7f33b6,nw_dst=10.239.98.20 actions=output:2
cookie=0x0,  table=101,   priority=17,ip,reg0=0x7f33b6,nw_dst=10.0.17.2 actions=output:2
cookie=0x0,  table=101,   priority=16,ip,reg0=0x7f33b6,nw_dst=10.0.17.1 actions=output:2
cookie=0x0,  table=101,   priority=15,ip,reg0=0x7f33b6,nw_dst=10.254.0.100 actions=output:2
cookie=0x0,  table=101,   priority=14,ip,reg0=0x7f33b6,nw_dst=10.0.17.199 actions=output:2
cookie=0x0,  table=101,   priority=13,ip,reg0=0x7f33b6,nw_dst=10.254.20.100 actions=output:2
cookie=0x0,  table=101,   priority=12,ip,reg0=0x7f33b6,nw_dst=10.0.21.59 actions=output:2
cookie=0x0,  table=101,   priority=11,ip,reg0=0x7f33b6,nw_dst=10.253.0.28 actions=output:2
cookie=0x0,  table=101,   priority=10,ip,reg0=0x7f33b6,nw_dst=10.253.2.101 actions=output:2
cookie=0x0,  table=101,   priority=9,ip,reg0=0x7f33b6,nw_dst=10.253.2.122 actions=output:2
cookie=0x0,  table=101,   priority=8,ip,reg0=0x7f33b6,nw_dst=10.0.34.105 actions=output:2
cookie=0x0,  table=101,   priority=7,ip,reg0=0x7f33b6,nw_dst=10.239.98.125 actions=output:2
cookie=0x0,  table=101,   priority=6,ip,reg0=0x7f33b6,nw_dst=10.239.98.125 actions=output:2
cookie=0x0,  table=101,   priority=5,ip,reg0=0x7f33b6,nw_dst=10.254.7.66 actions=output:2
cookie=0x0,  table=101,   priority=4,ip,reg0=0x7f33b6,nw_dst=10.254.4.66 actions=output:2
cookie=0x0,  table=101,   priority=3,ip,reg0=0x7f33b6,nw_dst=10.252.2.2 actions=output:2
cookie=0x0,  table=101,   priority=2,ip,reg0=0x7f33b6,nw_dst=10.254.0.83 actions=output:2
cookie=0x0,  table=101,   priority=1,ip,reg0=0x7f33b6 actions=drop

So diff between traffic in first file and second file is 1second, and we can see rules is being populated as second file as more "records". Because drop rule exist we cant go out. there was cases when this "table requild" tooks up to 6 seconds.

danwinship · 2018-04-09T17:07:22Z

This part of code get the long execution and hangs (?)
https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L447-L487

DNS should be taken from the cache, but by any change, it can be blocked by this:
https://github.com/openshift/origin/blob/master/pkg/network/node/ovscontroller.go#L490

490 is

if err := common.CheckDNSResolver(); err != nil {

which just checks if dig is installed. The delay is probably coming earlier, at:

ips := egressDNS.GetIPs(policies[0], rule.To.DNSName)

because that has to wait for a lock, so it might block on the resync in another thread. We need to rewrite this to generate the rules first and then write them out all at once after.

mjudeikis · 2018-04-09T20:15:48Z

@danwinship what is probably this will happen soon? if its low I might just do it myself..

danwinship · 2018-04-10T13:28:43Z

I'll probably get to it soonish

mojsha · 2018-04-12T05:31:37Z

@danwinship We use the ovs-multitenant plugin and have an issue when using the EgressNetworkPolicy objects. It happens "randomly" but basically the issue is that all access to a pod that has been restarted after the incident (I'm calling the point at which the bug starts its effect the "incident"), or a newly provisioned pod, is basically unreachable by other pods. Eventually the entire cluster becomes unusable and we have to restart the nodes for everything to work again. After removing all EgressNetworkPolicy objects, the problem goes away.

Are we understanding it correctly that you are dropping all traffic to get a snapshot/dump of the OVS rules, and then do some manipulation and write down the changes, and then allow traffic to start again? And could this be a reason for the abovementioned?

mjudeikis · 2018-04-12T08:19:34Z

@mojsha do you use upstream organization DNS for this?
Are they you using nodePort?
Do you use full FQDN names in your EgressNetworkPolicy file, or just IP's?

mojsha · 2018-04-12T08:41:30Z

@mjudeikis We use OpenShift's SkyDNS in combination with an upstream DNS for non-cluster addresses.
We use a mix of ClusterIP and NodePort. Does it make any difference what type it is?
We use a combination of DNS (FQDN & non-FQDN), IPs and CIDR-addresses in the EgressNetworkPolicy-objects.

danwinship · 2018-04-12T12:51:52Z

@mojsha that's not this bug. This bug only results in temporary disruptions.

Your bug sounds like #13965, which was fixed in 3.7. Are you running a really old version of origin?

mojsha · 2018-04-12T13:12:01Z

@danwinship No, I'm using 3.6, and we are moving to 3.7.2. What makes you think that this issue was fixed in 3.7 (it's not apparent when I'm looking at the issue)?

Can you confirm that the traffic is temporary "blocked"? If so, is there a work-around for this? I am thinking of our production setup where we will be using this and we don't want any service interruption.

danwinship · 2018-04-12T14:34:43Z

I know it's fixed in 3.7 from when it was committed.

There's no workaround for the bug discussed here, but the fix will probably get backported to 3.7. (OCP 3.7 that is, not Origin. But you shouldn't be running old versions of Origin.)

mojsha · 2018-04-13T08:48:54Z

Fair enough. We're in the process of moving to OCP.

Can you clarify my second paragraph regarding whether update of an EgressNetworkPolicy object will result in network connectivity being dropped/stopped for a short duration and will lead to service interruption if there is an ongoing connection/session or a new incoming one?

mjudeikis · 2018-04-13T09:54:16Z

@danwinship bz:1558484

danwinship · 2018-04-13T12:39:36Z

@mojsha most users have not noticed any disruption. It may be that the disruption only occurs if, eg, you have a DNS-based rule and the hostname it refers to has slow/flaky DNS servers.

Traffic is silently dropped during the interruption; it is not actively rejected. Assuming the interruption is brief, then the packets will get retried and you won't even notice.

mjudeikis · 2018-04-13T12:44:42Z

@mojsha I can confirm that cluster I observed this contains hundreds of Egress rules, and they use FQDN's, where DNS is not the fastest. It's not perfect, but the impact is minimal if you dont over abuse Egress rules. So know the limitations, but overall its not critical.

If we move to full IP's based rules, there is no issue at all.

mojsha · 2018-04-13T12:55:25Z

@danwinship @mjudeikis Thanks guys. Makes me much more comfortable re-enabling the usage of EgressNetworkPolicy objects once we complete the upgrade to 3.7.

pravisankar mentioned this issue Apr 13, 2018

OpenShift SDN: Improvements to UpdateEgressNetworkPolicyRules() #19346

Merged

openshift-merge-robot closed this as completed in #19346 Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EgressNetworkPolicy rewrite drop traffic for too long #19276

EgressNetworkPolicy rewrite drop traffic for too long #19276

mjudeikis commented Apr 9, 2018 •

edited

Loading

mjudeikis commented Apr 9, 2018

mjudeikis commented Apr 9, 2018

danwinship commented Apr 9, 2018

mjudeikis commented Apr 9, 2018

danwinship commented Apr 10, 2018

mojsha commented Apr 12, 2018 •

edited

Loading

mjudeikis commented Apr 12, 2018

mojsha commented Apr 12, 2018 •

edited

Loading

danwinship commented Apr 12, 2018

mojsha commented Apr 12, 2018

danwinship commented Apr 12, 2018

mojsha commented Apr 13, 2018

mjudeikis commented Apr 13, 2018

danwinship commented Apr 13, 2018

mjudeikis commented Apr 13, 2018

mojsha commented Apr 13, 2018

EgressNetworkPolicy rewrite drop traffic for too long #19276

EgressNetworkPolicy rewrite drop traffic for too long #19276

Comments

mjudeikis commented Apr 9, 2018 • edited Loading

mjudeikis commented Apr 9, 2018

mjudeikis commented Apr 9, 2018

danwinship commented Apr 9, 2018

mjudeikis commented Apr 9, 2018

danwinship commented Apr 10, 2018

mojsha commented Apr 12, 2018 • edited Loading

mjudeikis commented Apr 12, 2018

mojsha commented Apr 12, 2018 • edited Loading

danwinship commented Apr 12, 2018

mojsha commented Apr 12, 2018

danwinship commented Apr 12, 2018

mojsha commented Apr 13, 2018

mjudeikis commented Apr 13, 2018

danwinship commented Apr 13, 2018

mjudeikis commented Apr 13, 2018

mojsha commented Apr 13, 2018

mjudeikis commented Apr 9, 2018 •

edited

Loading

mojsha commented Apr 12, 2018 •

edited

Loading

mojsha commented Apr 12, 2018 •

edited

Loading