Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster up with --use-existing-config does not start openshift/origin container #14842

Closed
benjaminapetersen opened this issue Jun 22, 2017 · 14 comments
Assignees
Labels
component/composition kind/bug Categorizes issue or PR as related to a bug. priority/P1

Comments

@benjaminapetersen
Copy link
Contributor

While running openshift locally on a mac using Docker for mac with the following:

$ oc cluster up --version=latest --service-catalog-true --host-data-dir=$HOME/go/data/openshift.local.etcd
#
# Add template to project, create new project, etc.
#
$ oc cluster down
#
# Now check to see if persistent:
$  oc cluster up --version=latest --service-catalog=true --use-existing-config --host-data-dir=$HOME/go/data/openshift.local.etcd

Attempting to login:

$ oc login -u developer -p developer
error: dial tcp 127.0.0.1:8443: getsockopt: connection refused - verify you have provided the correct host and port and that the server is currently running.
$ oc login -u system:admin
error: dial tcp 127.0.0.1:8443: getsockopt: connection refused - verify you have provided the correct host and port and that the server is currently running.

Listing docker containers show openshift/origin:latest is not running:

$ docker ps
CONTAINER ID        IMAGE                                                                                               COMMAND                  CREATED             STATUS              PORTS                                                              NAMES
b2a99d799e26        centos/mongodb-32-centos7@sha256:aa37993c3be2d4731db79c0c7aba11db3e9352b1adc586c7e9054f57808789c0   "container-entrypo..."   10 minutes ago      Up 10 minutes                                                                          k8s_mongodb_mongodb-1-pjbnm_myproject_72a1b580-577d-11e7-806a-9a85c517db53_0
3fe47e9b09af        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_apiserver-882628510-pr6s9_service-catalog_a32109de-577c-11e7-806a-9a85c517db53_0
318e574d53ad        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_controller-manager-207524488-8n1n1_service-catalog_a338881b-577c-11e7-806a-9a85c517db53_0
7b3ed7efca4f        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_docker-registry-1-1dvn1_default_bce568b0-577c-11e7-806a-9a85c517db53_0
22dc7e8e5b50        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_nodejs-mongo-persistent-1-jqkqj_myproject_a0ba9552-577d-11e7-806a-9a85c517db53_0
fe6afaa373b8        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 0.0.0.0:1936->1936/tcp   k8s_POD_router-1-5k0pc_default_bd61fab1-577c-11e7-806a-9a85c517db53_0
2a602644f93d        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_mongodb-1-pjbnm_myproject_72a1b580-577d-11e7-806a-9a85c517db53_0

Using the -a flag shows that the openshift/origin:latest container exited:

$ docker ps -a
# other cotainers above...
ac997360afa2        openshift/origin:latest                                                                                                     "/bin/bash -c '#!/..."   11 minutes ago      Exited (0) 11 minutes ago                                                                        upbeat_hawking
bdd96201be79        openshift/origin:latest                                                                                                     "/usr/bin/openshif..."   11 minutes ago      Exited (255) 11 minutes ago                                                                      origin

Spoke with @bparees & shared logs from exited containers. He suggested it may be an aggregator issue. I can share the logs again if that would be helpful.

Version:

$ oc version
oc v3.6.0-alpha.2+4423ff5-468-dirty
kubernetes v1.6.1+5115d708d7
features: Basic-Auth

Server https://127.0.0.1:8443
kubernetes v1.6.1+5115d708d7
@benjaminapetersen
Copy link
Contributor Author

@jwforres @spadgett

@bparees bparees self-assigned this Jun 22, 2017
@spadgett
Copy link
Member

cc @csrwng

@bparees bparees added kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Jun 22, 2017
@benjaminapetersen
Copy link
Contributor Author

benjaminapetersen commented Jun 22, 2017

Some portions of docker logs bdd96201be79 that imply errors & may be helpful:

W0622 19:10:30.085489   15657 start_master.go:291] Warning: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, master start will continue.
W0622 19:10:30.085614   15657 start_master.go:291] Warning: assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, master start will continue.
W0622 19:10:30.085622   15657 start_master.go:291] Warning: auditConfig.auditFilePath: Required value: audit can not be logged to a separate file, master start will continue.
2017-06-22 19:10:30.090724 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:4001: getsockopt: connection refused"; Reconnecting to {127.0.0.1:4001 <nil>}
#
#
#
2017-06-22 19:10:31.396348 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.396971 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.408698 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
#
#
I0622 19:10:31.412470   15657 clusterquotamapping.go:160] Starting ClusterQuotaMappingController controller
E0622 19:10:31.419957   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.PolicyBinding: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/policybindings?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0622 19:10:31.420013   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.Policy: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/policies?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
2017-06-22 19:10:31.420062 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
E0622 19:10:31.420150   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterPolicyBinding: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/clusterpolicybindings?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
#
#
#
E0622 19:10:31.420305   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterPolicy: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/clusterpolicies?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0622 19:10:31.420889   15657 reflector.go:201] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterResourceQuota: Get https://127.0.0.1:8443/apis/quota.openshift.io/v1/clusterresourcequotas?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
2017-06-22 19:10:31.428706 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.451436 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.452487 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
W0622 19:10:31.635538   15657 genericapiserver.go:295] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
#
#
#
E0622 19:10:40.499517   15657 remote_runtime.go:203] StartContainer "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60" from runtime service failed: rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists
E0622 19:10:40.499672   15657 kuberuntime_manager.go:719] container start failed: rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists: Start Container Failed
E0622 19:10:40.499776   15657 pod_workers.go:182] Error syncing pod 72a1b580-577d-11e7-806a-9a85c517db53 ("mongodb-1-pjbnm_myproject(72a1b580-577d-11e7-806a-9a85c517db53)"), skipping: failed to "StartContainer" for "mongodb" with rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists: "Start Container Failed"
W0622 19:10:40.667412   15657 docker_sandbox.go:263] Couldn't find network status for myproject/nodejs-mongo-persistent-1-jqkqj through plugin: invalid network status for
F0622 19:10:40.863206   15657 start_master.go:457] failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: an error on the server ("Error: 'dial tcp 172.30.1.2:443: getsockopt: connection refused'\nTrying to reach: 'https://172.30.1.2/apis/servicecatalog.k8s.io/v1alpha1'") has prevented the request from succeeding

@bparees
Copy link
Contributor

bparees commented Jun 22, 2017

@deads2k any chance you can eyeball this log and say yes/no to it being an aggregator issue? seeing a lot of missing resources reported.

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

for what it's worth i couldn't recreate this on linux. cluster up, cluster down, cluster back up, no issue (w/ SC enabled in both cluster ups)

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

oops, i didn't use existing config. trying again.

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

ok, recreated. still not really sure what's up, still suspect aggregator issues.

@deads2k
Copy link
Contributor

deads2k commented Jun 23, 2017

@deads2k any chance you can eyeball this log and say yes/no to it being an aggregator issue? seeing a lot of missing resources reported.

I would guess its missing #14595, but that's been in for a while.

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

I would guess its missing #14595, but that's been in for a while.

@deads2k yeah i'm hitting this on a very recent master.

@deads2k
Copy link
Contributor

deads2k commented Jun 23, 2017

@deads2k yeah i'm hitting this on a very recent master.

Can you zip up the folder with your masterconfig and related artifacts for me to try?

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

@benjaminapetersen i can't recreate this anymore on master, here's what i'm running:

rm -rf /var/lib/origin
rm -rf /tmp/etcddata

oc cluster up --version=latest --service-catalog --host-data-dir=/tmp/etcddata

oc cluster down

oc cluster up --version=latest --service-catalog --host-data-dir=/tmp/etcddata --use-existing-config

can you build new images from master and see if it's still an issue for you?

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

argh. i keep forgetting the issue isn't having it come up, it's having it be available. it is still broken.

@bparees
Copy link
Contributor

bparees commented Jun 23, 2017

sent config/etcd files to @deads2k

@deads2k
Copy link
Contributor

deads2k commented Jun 26, 2017

sent config/etcd files to @deads2k

Thanks. Opened #14881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/composition kind/bug Categorizes issue or PR as related to a bug. priority/P1
Projects
None yet
Development

No branches or pull requests

5 participants