Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestScopedProjectAccess integration failed #16716

Closed
mfojtik opened this issue Oct 6, 2017 · 19 comments
Closed

TestScopedProjectAccess integration failed #16716

mfojtik opened this issue Oct 6, 2017 · 19 comments
Assignees
Labels
area/security component/auth help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0

Comments

@mfojtik
Copy link
Contributor

mfojtik commented Oct 6, 2017

Not clear to me what failed here: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16715/test_pull_request_origin_integration/8960/

@soltysh
Copy link
Contributor

soltysh commented Oct 6, 2017

From my digging it looks like it's the startup of the server that is failing. Still haven't got to root cause, yet.

@soltysh soltysh self-assigned this Oct 6, 2017
@soltysh
Copy link
Contributor

soltysh commented Oct 6, 2017

Additional test that I can consistently fail on my machine is TestInvalidRoleRefs from that same file.

@enj
Copy link
Contributor

enj commented Oct 7, 2017

https://github.com/enj/origin/blob/1eed81f10256f22d4f3426722b44ae47eaf3fba7/test/integration/project_test.go#L488 is where it fails locally for me (the all project list returns 1,2,3 instead of 1,2,3,4. @deads2k any hints on what changed to break this?

@enj
Copy link
Contributor

enj commented Oct 7, 2017

If I put that part of the test into a loop, it fails twice and then is successful with getting a list of all 4 projects. So it seems like a cache backing the project ACL is lagging behind more than before.

@enj enj added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 9, 2017
@bparees
Copy link
Contributor

bparees commented Oct 9, 2017

@adelton
Copy link
Contributor

adelton commented Oct 11, 2017

I'm reading the issue and the logs but I somehow fail to see why this is component/auth - what seems to be the auth-related cause behind this?

@bparees
Copy link
Contributor

bparees commented Oct 11, 2017

@adelton the assumption is that "TestScopedProjectAccess" would be an auth thing. If it's not, please determine who would own such a test.

@bparees
Copy link
Contributor

bparees commented Oct 11, 2017

(and if it is indeed an Auth test but if failed due to something not auth related, then please determine what component did fail and assign it to the appropriate component)

@adelton
Copy link
Contributor

adelton commented Oct 11, 2017

@bparees, thanks.

In #16716 (comment), @enj suggests "cache backing the project ACL is lagging behind more than before". What component is handling the cache backing?

@enj
Copy link
Contributor

enj commented Oct 11, 2017

This is auth because the project list is the only special ACL we have (because RBAC cannot express "list the subset of the projects I can see" - it can only express "list all projects"). The cache logic is in pkg/project/auth/cache.go, but I doubt its the actual cache that got slower. The simplest approach may be a git bisect from a week before this started.

@smarterclayton
Copy link
Contributor

@mrogers950
Copy link
Contributor

Bisecting and running TestScopedProjectAccess points to 84f6103

@adelton
Copy link
Contributor

adelton commented Oct 11, 2017

@deads2k, any idea what in 84f6103 could be causing the problem?

@deads2k
Copy link
Contributor

deads2k commented Oct 11, 2017

@deads2k, any idea what in 84f6103 could be causing the problem?

Nothing comes to mind.

@bparees
Copy link
Contributor

bparees commented Oct 11, 2017

unless a fix is imminent can we disable this test? (while leaving this issue open to resolve the flake and re-enable it)

@enj
Copy link
Contributor

enj commented Oct 12, 2017

I believe I have determined the root cause and will open a PR tomorrow.

@simo5 @mrogers950

@deads2k
Copy link
Contributor

deads2k commented Oct 12, 2017

I believe I have determined the root cause and will open a PR tomorrow.

margin too small?

openshift-merge-robot added a commit that referenced this issue Oct 13, 2017
Automatic merge from submit-queue.

Make admin project creation wait for SAR

This change adds a SAR check to direct project creation to ensure that the designated user can get the project which was created for them.  It also updates the project integration tests to be more tolerant of the project ACL being out of date.  This race condition became more apparent as we moved to the generated clients since those clients were smaller and had their own rate limiters (instead of a one big client that could do everything and shared the same rate limiter).  Since the new clients would perform actions at a faster pace, the race against the project ACL would occur more frequently.

Signed-off-by: Monis Khan <[email protected]>

Fixes #16716

/kind bug

/assign @smarterclayton @simo5 @deads2k
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security component/auth help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0
Projects
None yet
Development

No branches or pull requests

9 participants