Router: Changed default resource resync interval from 10mins to 30mins #17012

pravisankar · 2017-10-23T23:49:57Z

Resyncs are mainly intended for robustness. Mainly to handle the case
where the resource handler failed to process the item and we hope this
will be fixed if we process the item again after sometime(resync interval).
Yes, this may fix some transient errors but if we resync frequently then
there could be big penalities.
Currently router watches these resources: routes, endpoints, nodes,
namespaces, ingresses and secrets. When we have many routes
(like several thousand in online case), processing these items takes
long time, router reload itself takes few seconds (not milliseconds).
Due to short resync interval there will be constant churn of reprocessing
of all the items for all these resources.
Earlier we needed shorter resync interval because sharded router was depending
on this interval but with Sharded router based on namespace labels should notice routes immediately #16039 that limitation is removed.

10 mins seems aggressive for some rare transient errors, changed defaults
to 30 mins. Admin can edit router deployment config if they need custom resync interval.

Fixed project sync interval in router

pravisankar · 2017-10-23T23:51:03Z

@knobunc @eparis @rajatchopra PTAL

Rationale: - Resyncs are mainly intended for robustness. Mainly to handle the case where the resource handler failed to process the item and we hope this will be fixed if we process the item again after sometime(resync interval). Yes, this may fix some transient errors but if we resync frequently then there could be big penalities. - Currently router watches these resources: routes, endpoints, nodes, namespaces, ingresses and secrets. When we have many routes (like several thousand in online case), processing these items takes long time, router reload itself takes few seconds (not milliseconds). Due to short resync interval there will be constant churn of reprocessing of all the items for all these resources. - Earlier we needed shorter resync interval because sharded router was depending on this interval but with openshift#16039 that limitation is removed. 10 mins seems aggressive for some rare transient errors, changed defaults to 30 mins. Admin can edit router deployment config if they need custom resync interval.

ResyncInterval is tunable param and it can go as low as 1 second.

php-coder · 2017-10-23T23:57:51Z

/unassign

eparis · 2017-10-24T01:32:56Z

/hold
Is there a bz or other specific problem this is trying to address? Admittedly we should never need the resync, but 30 minutes seems like an aweful long time...

pravisankar · 2017-10-24T02:20:09Z

@eparis
We use 30 mins resync interval in openshift SDN. I'm assuming transient problems are rare and in most cases when it fails first time, it is highly likely that it will also fail next time. Any recommendations?
There is no open bug for this but the second commit fixes the issue where project sync interval could be negative if tunable resync interval is very low.

imcsk8 · 2017-10-24T15:22:40Z

LGTM

eparis · 2017-10-24T15:32:54Z

/lgtm
but lets leave this held until master opens for 3.8.

openshift-merge-robot · 2017-10-24T15:33:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eparis, pravisankar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/cmd/OWNERS~~ [eparis]
~~pkg/router/OWNERS~~ [eparis]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

pravisankar · 2017-11-16T19:29:08Z

/retest

openshift-bot · 2017-11-17T03:34:28Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2017-11-17T15:40:28Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2017-11-18T01:45:28Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-merge-robot · 2017-11-18T14:50:37Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-merge-robot · 2017-11-18T16:18:05Z

Automatic merge from submit-queue (batch tested with PRs 17012, 17243).

openshift-ci-robot · 2017-11-18T17:20:16Z

@pravisankar: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/extended_conformance_install_update	`5e20571`	link	`/test extended_conformance_install_update`
ci/openshift-jenkins/extended_conformance_install	`5e20571`	link	`/test extended_conformance_install`
ci/openshift-jenkins/extended_conformance_gce	`5e20571`	link	`/test extended_conformance_gce`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 23, 2017

openshift-merge-robot assigned pecameron and php-coder Oct 23, 2017

pravisankar added component/networking kind/bug Categorizes issue or PR as related to a bug. sig/networking labels Oct 23, 2017

pravisankar requested review from rajatchopra and knobunc October 23, 2017 23:51

Ravi Sankar Penta added 2 commits October 23, 2017 16:55

Fix project sync interval in router

5e20571

ResyncInterval is tunable param and it can go as low as 1 second.

pravisankar force-pushed the router-bump-resyncinterval branch from 1ebc7b1 to 5e20571 Compare October 23, 2017 23:56

pravisankar added component/routing and removed component/networking sig/networking labels Oct 23, 2017

openshift-ci-robot unassigned php-coder Oct 23, 2017

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 24, 2017

openshift-ci-robot assigned eparis Oct 24, 2017

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2017

openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2017

0xmichalis added this to the 3.8.0 milestone Nov 2, 2017

pravisankar added sig/networking and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. labels Nov 14, 2017

openshift-merge-robot merged commit 366bc48 into openshift:master Nov 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Router: Changed default resource resync interval from 10mins to 30mins #17012

Router: Changed default resource resync interval from 10mins to 30mins #17012

pravisankar commented Oct 23, 2017 •

edited

Loading

pravisankar commented Oct 23, 2017

php-coder commented Oct 23, 2017

eparis commented Oct 24, 2017

pravisankar commented Oct 24, 2017

imcsk8 commented Oct 24, 2017

eparis commented Oct 24, 2017

openshift-merge-robot commented Oct 24, 2017

pravisankar commented Nov 16, 2017

openshift-bot commented Nov 17, 2017

openshift-bot commented Nov 17, 2017

openshift-bot commented Nov 18, 2017

openshift-merge-robot commented Nov 18, 2017

openshift-merge-robot commented Nov 18, 2017

openshift-ci-robot commented Nov 18, 2017

Router: Changed default resource resync interval from 10mins to 30mins #17012

Router: Changed default resource resync interval from 10mins to 30mins #17012

Conversation

pravisankar commented Oct 23, 2017 • edited Loading

pravisankar commented Oct 23, 2017

php-coder commented Oct 23, 2017

eparis commented Oct 24, 2017

pravisankar commented Oct 24, 2017

imcsk8 commented Oct 24, 2017

eparis commented Oct 24, 2017

openshift-merge-robot commented Oct 24, 2017

pravisankar commented Nov 16, 2017

openshift-bot commented Nov 17, 2017

openshift-bot commented Nov 17, 2017

openshift-bot commented Nov 18, 2017

openshift-merge-robot commented Nov 18, 2017

openshift-merge-robot commented Nov 18, 2017

openshift-ci-robot commented Nov 18, 2017

pravisankar commented Oct 23, 2017 •

edited

Loading