Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment broken in default namespace #180

Open
andrewazores opened this issue Mar 18, 2021 · 14 comments
Open

Deployment broken in default namespace #180

andrewazores opened this issue Mar 18, 2021 · 14 comments
Labels
bug Something isn't working

Comments

@andrewazores
Copy link
Member

See previous discussion: #176

@andrewazores andrewazores added the bug Something isn't working label Mar 18, 2021
@andrewazores andrewazores self-assigned this Mar 18, 2021
@andrewazores
Copy link
Member Author

I'm seeing the same problem as here:

https://bugzilla.redhat.com/show_bug.cgi?id=1934177

My CRC environment does use 4.7, so I think it is the exact same failure scenario. I have tried many permutations of runAsNonRoot and runAsUser/runAsGroup and haven't had any success in deploying kube-rbac-proxy:v0.8.0.

@ebaron
Copy link
Member

ebaron commented Mar 19, 2021

It sounds like this is exclusive to the distroless image. We could try registry.access.redhat.com/ubi8/ubi-minimal:latest?

@andrewazores
Copy link
Member Author

This may be better suited for a separate bug, but DEPLOY_NAMESPACE=foo make undeploy will also attempt to destroy the foo namespace. That's probably not good if a user is trying to deploy this in a namespace they're actually using for anything beyond sandbox testing of our operator.

@ebaron
Copy link
Member

ebaron commented Mar 24, 2021

I've noticed this as well. I'm surprised it's the default behaviour. We could remove the namespace object from config/manager/manager.yaml and create it manually (if necessary) in our Makefile, I suppose.

andrewazores added a commit to andrewazores/cryostat-operator that referenced this issue Mar 25, 2021
andrewazores added a commit to andrewazores/cryostat-operator that referenced this issue Mar 25, 2021
@andrewazores
Copy link
Member Author

I addressed the Namespace creation/deletion in my operator-sdk 1.5.0 PR (linked above). If a user tries to manually deploy to a namespace that doesn't exist then the deployment will simply fail.

I've tried updating the kube-rbac-proxy to v0.8.0 and different permutations of ubi/distroless, runAsNonRoot/runAsUser and other securityContext settings, but still haven't found a working solution that gets the deployment to work in default and non-default namespaces. I'll go through the most sensible permutations that seemed like likely candidates and document what they are and how they fail.

@andrewazores
Copy link
Member Author

andrewazores commented Mar 26, 2021

  • Updating kube-rbac-proxy to v0.8.0 and setting runAsNonRoot: true for its container:
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,7 @@ spec:
     spec:
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"
@@ -19,6 +19,8 @@ spec:
         ports:
         - containerPort: 8443
           name: https
+        securityContext:
+          runAsNonRoot: true
       - name: manager
         args:
         - "--health-probe-bind-address=:8081"

This results in a working deployment in default, but non-default fails with a CreateContainerError and the following reason in oc describe pod:

Error: container create failed: time="2021-03-26T16:04:45Z" level=error msg="container_linux.go:366: starting container process caused: chdir to cwd (\"/home/nonroot\") set in config.json failed: permission denied"

The manager container starts and runs seemingly normally.

@andrewazores
Copy link
Member Author

  • Updating kube-rbac-proxy to v0.8.0 and setting runAsNonRoot: true for the deployment:
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -8,9 +8,11 @@ metadata:
 spec:
   template:
     spec:
+      securityContext:
+        runAsNonRoot: true
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"

This has the same behaviour as the previous case.

@andrewazores
Copy link
Member Author

  • The current situation as-is upstream, with kube-rbac-proxy v0.5.0 and no other modifications to the deployment/containers fails in the default namespace with the following reason:

Error: container has runAsNonRoot and image will run as root (pod: "container-jfr-operator-controller-manager-5cf4cb9dfb-rsxk9_default(34a428d2-3ba0-4c71-a51f-51608a7475d0)", container: kube-rbac-proxy)

and succeeds in other namespaces.

@andrewazores
Copy link
Member Author

  • Leaving kube-rbac-proxy as v0.5.0 but setting its container to override runAsNonRoot: false like so:
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -11,6 +11,8 @@ spec:
       containers:
       - name: kube-rbac-proxy
         image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        securityContext:
+          runAsNonRoot: false
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"

seems to produce a working deployment in all namespaces, however it is undesirable to have to run this container as root.

@andrewazores
Copy link
Member Author

  • Updating kube-rbac-proxy to v0.8.0 and setting runAsNonRoot:true and runAsUser: 65532 as suggested in some related bug reports on other repos, like so:
--- a/config/default/manager_auth_proxy_patch.yaml
+++ b/config/default/manager_auth_proxy_patch.yaml
@@ -10,7 +10,10 @@ spec:
     spec:
       containers:
       - name: kube-rbac-proxy
-        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.5.0
+        image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
+        securityContext:
+          runAsNonRoot: true
+          runAsUser: 65532
         args:
         - "--secure-listen-address=0.0.0.0:8443"
         - "--upstream=http://127.0.0.1:8080/"

results in a working deployment in default but a failure in other namespaces where the ReplicaSet of the deployment gets stuck at 1 Desired, 0 Current, 0 Ready. oc get -o yaml on the ReplicaSet reveals this failure message:

'pods "container-jfr-operator-controller-manager-744dbc77d8-" is forbidden:
        unable to validate against any security context constraint: [spec.containers[0].securityContext.runAsUser:
        Invalid value: 65532: must be in the ranges: [1000610000, 1000619999]]'

@andrewazores
Copy link
Member Author

All of the failure classes have to do specifically with the kube-rbac-proxy container and not the manager container, so I have left out any previously tested scenarios where I changed the Dockerfile base image from ubi to distroless. This doesn't seem relevant to the particular failure modes observed, and the manager container can be made to run on either base.

@andrewazores
Copy link
Member Author

operator-framework/operator-sdk#4684

Looks the same or closely related.

@ebaron
Copy link
Member

ebaron commented Mar 26, 2021

operator-framework/operator-sdk#4684

Looks the same or closely related.

Indeed. I'm not sure if there's much we can do until this is resolved in the SDK.

ebaron pushed a commit that referenced this issue Mar 26, 2021
* PROJECT sdk 1.5.0 upgrade

* Update controller-runtime

* Update labels, scorecard image versions

* Rename healthz and readyz handlers

* Do not create/destroy namespace on (un)deploy

Related to #180

* Correct auth proxy service account name

* Correct order of Recording/ContainerJFR deletion on undeploy
@ebaron ebaron added this to the 1.0 Release milestone Mar 30, 2021
@ebaron
Copy link
Member

ebaron commented Apr 29, 2021

This has been mitigated for now, so removing from the 1.0 milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants