Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

Workspaces cannot be created after OpenShift v3.7.9 update #1666

Closed
rhopp opened this issue Dec 14, 2017 · 33 comments
Closed

Workspaces cannot be created after OpenShift v3.7.9 update #1666

rhopp opened this issue Dec 14, 2017 · 33 comments

Comments

@rhopp
Copy link
Collaborator

rhopp commented Dec 14, 2017

This is affecting multi-tenant and single-tenant che.
When creating workspace, deployment gets created, but that's all...
image

Log from che-master pod:


2017-12-14 06:54:23,275[nio-8080-exec-7]  [INFO ] [o.e.c.a.w.s.WorkspaceManager 807]    - Workspace 'che/dfdgdfgdgdfg-dza9s' with id 'workspace1caavxqds6y8kkzs' created by user 'che'
--
  | 2017-12-14 06:54:25,669[aceSharedPool-0]  [WARN ] [.e.c.p.d.m.MachineProviderImpl 569]  - Failed to check image che/vertx availability. Cause: Unable connect to unix socket: '/var/run/docker.sock'
  | 2017-12-14 06:54:28,038[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1050] - Created ImageStream registry.devshift.net_che_vertx.
  | Dec 14, 2017 6:54:28 AM com.redhat.che.keycloak.shared.KeycloakSettings pullFromApiEndpointIfNecessary
  | INFO: Pulling Keycloak settings from URL :http://localhost:8080/api/keycloak/settings
  | Dec 14, 2017 6:54:28 AM com.redhat.che.keycloak.shared.KeycloakSettings pullFromApiEndpointIfNecessary
  | INFO: KeycloakSettings = {che.keycloak.disabled=false, che.keycloak.auth_server_url=https://sso.openshift.io/auth , che.keycloak.client_id=openshiftio-public, che.keycloak.realm=fabric8, che.keycloak.oso.endpoint=https://sso.openshift.io/auth/realms/fabric8/broker/openshift-v3/token , che.keycloak.github.endpoint=https://auth.openshift.io/api/token?for=https://github.com }
  | 2017-12-14 06:54:30,271[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1825] - Created ImageStreamTag registry.devshift.net_che_vertx:1caavxqds6y8kkzs76s2fml5nqp6e0 in namespace kkanova-osiotest1-che
  | 2017-12-14 06:54:30,280[aceSharedPool-0]  [WARN ] [o.e.c.p.o.c.OpenShiftConnector 569]  - Didn't find image 'eclipse-che/workspace1caavxqds6y8kkzs_machine76s2fml5nqp6e07j_che_dev-machine' in the map of created images
  | 2017-12-14 06:54:30,771[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1616] - imageStreamTag dockerImageConfig empty. Using dockerImageMetadata to get image info
  | 2017-12-14 06:54:31,471[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1616] - imageStreamTag dockerImageConfig empty. Using dockerImageMetadata to get image info
  | 2017-12-14 06:54:31,474[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 655]  - Che property 'che.openshift.workspace.memory.override' used to override workspace memory limit to 1900Mi.
  | 2017-12-14 06:54:32,064[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1679] - OpenShift service che-ws-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,376[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route wsagent-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,513[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route exec-agent-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,620[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route server-4403-tcp-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,711[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route terminal-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,767[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route server-22-tcp-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,827[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route vertx-debug-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,884[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route tomcat8-debug-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:32,943[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route vertx-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:33,001[aceSharedPool-0]  [INFO ] [.c.p.o.c.OpenShiftRouteCreator 82]   - OpenShift route codeserver-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:33,002[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1713] - Creating OpenShift deployment che-ws-1caavxqds6y8kkzs
  | 2017-12-14 06:54:33,002[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1718] - Adding container 1caavxqds6y8kkzs-machine76s2fml5nqp6e07j-che-dev-machine to OpenShift deployment che-ws-1caavxqds6y8kkzs
  | 2017-12-14 06:54:33,014[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 35]   - Container environment variables:
  | 2017-12-14 06:54:33,028[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - EXEC_AGENT_LOGS_DIR_SETTER=export EXEC_AGENT_LOGS_DIR="$HOME/che/exec-agent/logs"
  | 2017-12-14 06:54:33,028[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - MAVEN_OPTS=-XX:+UseG1GC -XX:+UseStringDeduplication -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:MaxRAM=1200m -Xms256m
  | 2017-12-14 06:54:33,028[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_LOCAL_CONF_DIR=/mnt/che/conf
  | 2017-12-14 06:54:33,028[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_LOGS_DIR_SETTER=
  | 2017-12-14 06:54:33,029[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - USER_TOKEN=dummy_token
  | 2017-12-14 06:54:33,029[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_MACHINE_NAME=dev-machine
  | 2017-12-14 06:54:33,029[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - JAVA_OPTS=-XX:+UseG1GC -XX:+UseStringDeduplication -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:MaxRAM=1200m -Xms256m
  | 2017-12-14 06:54:33,030[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_WORKSPACE_ID=workspace1caavxqds6y8kkzs
  | 2017-12-14 06:54:33,030[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_PROJECTS_ROOT=/projects
  | 2017-12-14 06:54:33,030[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_IS_DEV_MACHINE=true
  | 2017-12-14 06:54:33,030[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.k.KubernetesEnvVar 43]   - - CHE_API=http://che-host:8080/wsmaster/api
  | 2017-12-14 06:54:33,765[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1781] - OpenShift deployment che-ws-1caavxqds6y8kkzs created
  | 2017-12-14 06:54:34,880[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 722]  - Error while creating Pod, removing deployment
  | 2017-12-14 06:54:34,881[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 723]  - Pod with deployment name che-ws-1caavxqds6y8kkzs not found
  | 2017-12-14 06:54:35,281[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route codeserver-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,628[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route exec-agent-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,648[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route server-22-tcp-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,671[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route server-4403-tcp-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,681[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route terminal-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,691[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route tomcat8-debug-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,700[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route vertx-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,708[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route vertx-debug-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,720[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route wsagent-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,731[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 99]   - Removing OpenShift Service che-ws-1caavxqds6y8kkzs
  | 2017-12-14 06:54:35,791[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 104]  - Removing OpenShift Deployment che-ws-1caavxqds6y8kkzs
  | 2017-12-14 06:54:36,001[aceSharedPool-0]  [ERROR] [o.e.c.a.w.s.WorkspaceManager 683]    - null
  | org.eclipse.che.api.core.ServerException: null
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:1005)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startEnvironmentQueue(CheEnvironmentEngine.java:797)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.start(CheEnvironmentEngine.java:263)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.startEnvironmentAndPublishEvents(WorkspaceRuntimes.java:694)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.access$100(WorkspaceRuntimes.java:103)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:986)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:948)
  | at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalCallable.call(CopyThreadLocalCallable.java:31)
  | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  | at java.lang.Thread.run(Thread.java:748)
  | Caused by: org.eclipse.che.api.core.ServerException: null
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:401)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.lambda$startEnvironmentQueue$6(CheEnvironmentEngine.java:768)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:942)
  | ... 11 common frames omitted
  | Caused by: java.lang.NullPointerException: null
  | at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl$DeploymentReaper.reap(DeploymentOperationsImpl.java:178)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:555)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:68)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:596)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:584)
  | at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:63)
  | at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:32)
  | at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:160)
  | at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:59)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanUpWorkspaceResources(OpenShiftDeploymentCleaner.java:105)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanDeploymentResources(OpenShiftDeploymentCleaner.java:42)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanDeploymentResources(OpenShiftDeploymentCleaner.java:48)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftConnector.createContainer(OpenShiftConnector.java:725)
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.createContainer(MachineProviderImpl.java:648)
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:357)
  | ... 13 common frames omitted

@aditya-konarde
Copy link

+1. Blocks user from doing anything on the workspace.

@sbose78
Copy link
Collaborator

sbose78 commented Dec 14, 2017

@l0rd @ibuziuk could you please have a look?

@ibuziuk
Copy link
Collaborator

ibuziuk commented Dec 14, 2017

@rhopp anything suspicious in the openshift events ?

@rhopp
Copy link
Collaborator Author

rhopp commented Dec 14, 2017

@ibuziuk Nothing. No events at all.

@ibuziuk
Copy link
Collaborator

ibuziuk commented Dec 14, 2017

Indeed, no workspace events at all - deployment is simply never scaled up for some reason
image

image

@ibuziuk
Copy link
Collaborator

ibuziuk commented Dec 14, 2017

The very same problem is reproducible on prod-preview https://console.free-stg.openshift.com
No wonder since version of openshift online are quite similar on prod / prod-preview currently:

https://console.free-stg.openshift.com

OpenShift Master: v3.7.9 (online version 3.6.0.78)
Kubernetes Master: v1.7.6+a08f5eeb62

https://console.starter-us-east-2.openshift.com

OpenShift Master: v3.7.9 (online version 3.6.0.73.0)
Kubernetes Master: v1.7.6+a08f5eeb62

@sleshchenko
Copy link

sleshchenko commented Dec 14, 2017

Maybe it can help somehow. Workspaces on Che 6 Server start successfully. Here you can try it http://che-che6-server.dev.rdu2c.fabric8.io
So, I think Che 6 Server works fine because it creates pods directly instead of using deployments (do not know it is right way or not).

@l0rd
Copy link
Collaborator

l0rd commented Dec 14, 2017

The difference I see in the deployment yaml:

- apiVersion: extensions/v1beta1
+ apiVersion: apps/v1beta1
kind: Deployment

@jfchevrette
Copy link
Contributor

@l0rd Deployments are new in OpenShift 3.7. However not all nodes are upgraded to 3.7 at the moment so I'm not sure how this is handled by the controllers.

Since this issue was opened, a few nodes have been upgraded to 3.7 and the upgrade process is still ongoing.

I was able to observe that the deployment object is created properly but pods are not created. No events in OpenShift. Notice that replicas is set to 0.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  creationTimestamp: '2017-12-14T15:22:19Z'
  generation: 2
  labels:
    deployment: che-ws-zc8r9jnzu81k8lt9
  name: che-ws-zc8r9jnzu81k8lt9
  namespace: jchevret-osiotest1-che
  resourceVersion: '764688366'
  selfLink: >-
    /apis/apps/v1beta1/namespaces/jchevret-osiotest1-che/deployments/che-ws-zc8r9jnzu81k8lt9
  uid: 92fa6a0c-e0e2-11e7-b113-0233cba325d9
spec:
  replicas: 0
  selector:
    matchLabels:
      deployment: che-ws-zc8r9jnzu81k8lt9
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        deployment: che-ws-zc8r9jnzu81k8lt9
    spec:
      containers:
        - env:
            - name: EXEC_AGENT_LOGS_DIR_SETTER
              value: export EXEC_AGENT_LOGS_DIR="$HOME/che/exec-agent/logs"
            - name: MAVEN_OPTS
              value: >-
                -XX:+UseG1GC -XX:+UseStringDeduplication -XX:MinHeapFreeRatio=20
                -XX:MaxHeapFreeRatio=40 -XX:MaxRAM=1200m -Xms256m
            - name: CHE_LOCAL_CONF_DIR
              value: /mnt/che/conf
            - name: CHE_LOGS_DIR_SETTER
            - name: USER_TOKEN
              value: dummy_token
            - name: CHE_MACHINE_NAME
              value: dev-machine
            - name: JAVA_OPTS
              value: >-
                -XX:+UseG1GC -XX:+UseStringDeduplication -XX:MinHeapFreeRatio=20
                -XX:MaxHeapFreeRatio=40 -XX:MaxRAM=1200m -Xms256m
            - name: CHE_WORKSPACE_ID
              value: workspacezc8r9jnzu81k8lt9
            - name: CHE_PROJECTS_ROOT
              value: /projects
            - name: CHE_IS_DEV_MACHINE
              value: 'true'
            - name: CHE_API
              value: 'http://che-host:8080/wsmaster/api'
          image: >-
            172.30.98.11:5000/jchevret-osiotest1-che/registry.devshift.net_che_vertx:zc8r9jnzu81k8lt9u74t5b7mjn8pux
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 300
            periodSeconds: 10
            successThreshold: 1
            tcpSocket:
              port: 4401
            timeoutSeconds: 1
          name: zc8r9jnzu81k8lt9-machineu74t5b7mjn8pux5t-che-dev-machine
          ports:
            - containerPort: 4401
              name: wsagent
              protocol: TCP
            - containerPort: 4412
              name: exec-agent
              protocol: TCP
            - containerPort: 4403
              name: server-4403-tcp
              protocol: TCP
            - containerPort: 4411
              name: terminal
              protocol: TCP
            - containerPort: 22
              name: server-22-tcp
              protocol: TCP
            - containerPort: 5005
              name: vertx-debug
              protocol: TCP
            - containerPort: 8000
              name: tomcat8-debug
              protocol: TCP
            - containerPort: 8080
              name: vertx
              protocol: TCP
            - containerPort: 9876
              name: codeserver
              protocol: TCP
          resources:
            limits:
              memory: 1900Mi
            requests:
              memory: 1100Mi
          securityContext:
            privileged: false
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /projects
              name: claim-che-workspace
              subPath: testosio1-q1iyp
            - mountPath: /workspace-logs
              name: claim-che-workspace
              subPath: testosio1-q1iyp-logs
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 0
      volumes:
        - name: claim-che-workspace
          persistentVolumeClaim:
            claimName: claim-che-workspace
status: {}
2017-12-14 15:22:19,258[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 1781] - OpenShift deployment che-ws-zc8r9jnzu81k8lt9 created
--
  | 2017-12-14 15:22:20,367[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 722]  - Error while creating Pod, removing deployment
  | 2017-12-14 15:22:20,367[aceSharedPool-0]  [INFO ] [o.e.c.p.o.c.OpenShiftConnector 723]  - Pod with deployment name che-ws-zc8r9jnzu81k8lt9 not found
  | 2017-12-14 15:22:21,048[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route codeserver-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,752[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route exec-agent-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,761[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route server-22-tcp-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,772[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route server-4403-tcp-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,781[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route terminal-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,792[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route tomcat8-debug-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,803[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route vertx-debug-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,818[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route vertx-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,829[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 93]   - Removing OpenShift Route wsagent-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,838[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 99]   - Removing OpenShift Service che-ws-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,870[aceSharedPool-0]  [INFO ] [o.c.OpenShiftDeploymentCleaner 104]  - Removing OpenShift Deployment che-ws-zc8r9jnzu81k8lt9
  | 2017-12-14 15:22:21,977[aceSharedPool-0]  [ERROR] [o.e.c.a.w.s.WorkspaceManager 683]    - null
  | org.eclipse.che.api.core.ServerException: null
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:1005)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startEnvironmentQueue(CheEnvironmentEngine.java:797)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.start(CheEnvironmentEngine.java:263)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.startEnvironmentAndPublishEvents(WorkspaceRuntimes.java:694)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.access$100(WorkspaceRuntimes.java:103)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:986)
  | at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:948)
  | at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalCallable.call(CopyThreadLocalCallable.java:31)
  | at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  | at java.lang.Thread.run(Thread.java:748)
  | Caused by: org.eclipse.che.api.core.ServerException: null
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:401)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.lambda$startEnvironmentQueue$6(CheEnvironmentEngine.java:768)
  | at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:942)
  | ... 11 common frames omitted
  | Caused by: java.lang.NullPointerException: null
  | at io.fabric8.kubernetes.client.dsl.internal.DeploymentOperationsImpl$DeploymentReaper.reap(DeploymentOperationsImpl.java:178)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:555)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:68)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:596)
  | at io.fabric8.kubernetes.client.dsl.base.BaseOperation.delete(BaseOperation.java:584)
  | at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:63)
  | at io.fabric8.kubernetes.client.handlers.DeploymentHandler.delete(DeploymentHandler.java:32)
  | at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:160)
  | at io.fabric8.kubernetes.client.dsl.internal.NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.delete(NamespaceVisitFromServerGetWatchDeleteRecreateWaitApplicableImpl.java:59)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanUpWorkspaceResources(OpenShiftDeploymentCleaner.java:105)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanDeploymentResources(OpenShiftDeploymentCleaner.java:42)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftDeploymentCleaner.cleanDeploymentResources(OpenShiftDeploymentCleaner.java:48)
  | at org.eclipse.che.plugin.openshift.client.OpenShiftConnector.createContainer(OpenShiftConnector.java:725)
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.createContainer(MachineProviderImpl.java:648)
  | at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:357)
  | ... 13 common frames omitted

@jfchevrette
Copy link
Contributor

@ibuziuk free-int was also running v3.7.9 (the latest GA)

@xyntrix
Copy link

xyntrix commented Dec 14, 2017

@ibuziuk FWIW free-int was (before 3.8 upgrade):

OpenShift Master: 3.7.9 (online version 3.6.0.83)
Kubernetes Master: v1.7.6+a08f5eeb62

@ibuziuk
Copy link
Collaborator

ibuziuk commented Dec 14, 2017

@jfchevrette @mmclanerh could we update console.starter-us-east-2.openshift.com to

OpenShift Master: 3.7.9 (online version 3.6.0.83)
Kubernetes Master: v1.7.6+a08f5eeb62

Current version of online looks different - #1666 (comment)

@jfchevrette
Copy link
Contributor

I am being told that a very similar issue with deployments was observed by the openshift team in their jenkins talking to OpenShift 3.7. The issue was resolved by upgrading the fabric8 kubernetes client in their jenkins plugin. Is this something we can try?

@jfchevrette
Copy link
Contributor

@jfchevrette
Copy link
Contributor

This issue was 'resolved' by restarting the controllers on the master nodes. Upstream is still investigating the root cause.

@xyntrix
Copy link

xyntrix commented Dec 14, 2017

@rhopp @sbose78 @ibuziuk @l0rd if prod looks good now, we can get this closed (or at least, no longer a sev1). everything looking good now?

@ibuziuk
Copy link
Collaborator

ibuziuk commented Dec 15, 2017

@mmclanerh awesome news! Since it is not fixed on prod-preview I would still keep it open and add [prod-preview] to the description + change severity

@ibuziuk ibuziuk changed the title Workspaces cannot be created since starter-us-east-2.openshift moved to 3.7 [prod-preview] Workspaces cannot be created since starter-us-east-2.openshift moved to 3.7 Dec 15, 2017
@ibuziuk ibuziuk changed the title [prod-preview] Workspaces cannot be created since starter-us-east-2.openshift moved to 3.7 [prod-preview] Workspaces cannot be created after OpensShift v3.7.9 update Dec 15, 2017
@xyntrix
Copy link

xyntrix commented Dec 15, 2017

controllers restarted in prod-preview. this should alleviate the pod replicas issues.

@xyntrix
Copy link

xyntrix commented Dec 15, 2017

prod-preview is able to spin up pods and spawn replicas correctly now. @ibuziuk @rhopp @l0rd good with closing this?

@rhopp
Copy link
Collaborator Author

rhopp commented Dec 18, 2017

My account on prod-preview is currently broken, so I have no way how to verify.

@aditya-konarde aditya-konarde changed the title [prod-preview] Workspaces cannot be created after OpensShift v3.7.9 update Workspaces cannot be created after OpensShift v3.7.9 update Jan 4, 2018
@aditya-konarde aditya-konarde changed the title Workspaces cannot be created after OpensShift v3.7.9 update Workspaces cannot be created after OpenShift v3.7.9 update Jan 4, 2018
@ibuziuk
Copy link
Collaborator

ibuziuk commented Jan 4, 2018

openshift online cluster upgrades are ongoing now, need to wait till it finished and retry.

@ldimaggi
Copy link
Collaborator

ldimaggi commented Jan 4, 2018

Seeing this error again today: Could not start workspace a6lex. Reason: Start of environment 'default' failed. Error: null

@ibuziuk
Copy link
Collaborator

ibuziuk commented Jan 4, 2018

Upstream bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1526165

In order to reproduce the problem on prod / prod-preview cluster the sample [1] provided by @jfchevrette was used. After applying yml on prod / prod-preview clusters deployment is never scaled up.

BTW, the problem is not reproducible of free-int cluster which was previously used for prod-preview :

OpenShift Master:
v3.8.18 (online version 3.6.0.83)
Kubernetes Master:
v1.8.1+0d5291c

[1] https://gist.github.com/jfchevrette/2833c0fb2f685f4eaf221f681dfc755b

@xyntrix
Copy link

xyntrix commented Jan 4, 2018

production is now working for pod spin ups

@ibuziuk
Copy link
Collaborator

ibuziuk commented Jan 4, 2018

since workspace creation is working fine on prod after controllers restart changing to SEV2

@xyntrix
Copy link

xyntrix commented Jan 4, 2018

if everything satisfactory, i believe this issue can be closed out.

@rhopp
Copy link
Collaborator Author

rhopp commented Jan 8, 2018

Aaaand it's back.
It's impossible to create workspaces on prod. (it was possible half an hour ago).

@jfchevrette
Copy link
Contributor

The OpenShift controllers have been restarted to workaround this issue. Upstream has a fix openshift/origin#17855 which is making it's way to our cluster soon. I unfortunately don't have an ETA yet.

@ibuziuk
Copy link
Collaborator

ibuziuk commented Jan 11, 2018

the issue is back again on prod changing label to SEV1

@pbergene
Copy link
Collaborator

Controllers restarted and deplloyments work. For future reference the service is atomic-openshift-master-controllers.service. Hotfix is applied to free-stg, but does not yet look to be on starter clusters - my impression is that this should hit starter-us-east-2 within a few working days.

@ibuziuk
Copy link
Collaborator

ibuziuk commented Jan 16, 2018

Closing. Fix for prod cluster is applied / bugzilla[1] for starter-us-east-2 is has "verified" status
Controllers restart is not required anymore

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1526165

@ibuziuk ibuziuk closed this as completed Jan 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests