Cluster up with --use-existing-config does not start openshift/origin container #14842

benjaminapetersen · 2017-06-22T19:41:05Z

While running openshift locally on a mac using Docker for mac with the following:

$ oc cluster up --version=latest --service-catalog-true --host-data-dir=$HOME/go/data/openshift.local.etcd
#
# Add template to project, create new project, etc.
#
$ oc cluster down
#
# Now check to see if persistent:
$  oc cluster up --version=latest --service-catalog=true --use-existing-config --host-data-dir=$HOME/go/data/openshift.local.etcd

Attempting to login:

$ oc login -u developer -p developer
error: dial tcp 127.0.0.1:8443: getsockopt: connection refused - verify you have provided the correct host and port and that the server is currently running.
$ oc login -u system:admin
error: dial tcp 127.0.0.1:8443: getsockopt: connection refused - verify you have provided the correct host and port and that the server is currently running.

Listing docker containers show openshift/origin:latest is not running:

$ docker ps
CONTAINER ID        IMAGE                                                                                               COMMAND                  CREATED             STATUS              PORTS                                                              NAMES
b2a99d799e26        centos/mongodb-32-centos7@sha256:aa37993c3be2d4731db79c0c7aba11db3e9352b1adc586c7e9054f57808789c0   "container-entrypo..."   10 minutes ago      Up 10 minutes                                                                          k8s_mongodb_mongodb-1-pjbnm_myproject_72a1b580-577d-11e7-806a-9a85c517db53_0
3fe47e9b09af        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_apiserver-882628510-pr6s9_service-catalog_a32109de-577c-11e7-806a-9a85c517db53_0
318e574d53ad        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_controller-manager-207524488-8n1n1_service-catalog_a338881b-577c-11e7-806a-9a85c517db53_0
7b3ed7efca4f        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_docker-registry-1-1dvn1_default_bce568b0-577c-11e7-806a-9a85c517db53_0
22dc7e8e5b50        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_nodejs-mongo-persistent-1-jqkqj_myproject_a0ba9552-577d-11e7-806a-9a85c517db53_0
fe6afaa373b8        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp, 0.0.0.0:1936->1936/tcp   k8s_POD_router-1-5k0pc_default_bd61fab1-577c-11e7-806a-9a85c517db53_0
2a602644f93d        openshift/origin-pod:latest                                                                         "/usr/bin/pod"           10 minutes ago      Up 10 minutes                                                                          k8s_POD_mongodb-1-pjbnm_myproject_72a1b580-577d-11e7-806a-9a85c517db53_0

Using the -a flag shows that the openshift/origin:latest container exited:

$ docker ps -a
# other cotainers above...
ac997360afa2        openshift/origin:latest                                                                                                     "/bin/bash -c '#!/..."   11 minutes ago      Exited (0) 11 minutes ago                                                                        upbeat_hawking
bdd96201be79        openshift/origin:latest                                                                                                     "/usr/bin/openshif..."   11 minutes ago      Exited (255) 11 minutes ago                                                                      origin

Spoke with @bparees & shared logs from exited containers. He suggested it may be an aggregator issue. I can share the logs again if that would be helpful.

Version:

$ oc version
oc v3.6.0-alpha.2+4423ff5-468-dirty
kubernetes v1.6.1+5115d708d7
features: Basic-Auth

Server https://127.0.0.1:8443
kubernetes v1.6.1+5115d708d7

The text was updated successfully, but these errors were encountered:

benjaminapetersen · 2017-06-22T19:41:12Z

@jwforres @spadgett

spadgett · 2017-06-22T19:42:22Z

cc @csrwng

benjaminapetersen · 2017-06-22T19:48:58Z

Some portions of docker logs bdd96201be79 that imply errors & may be helpful:

W0622 19:10:30.085489   15657 start_master.go:291] Warning: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, master start will continue.
W0622 19:10:30.085614   15657 start_master.go:291] Warning: assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, master start will continue.
W0622 19:10:30.085622   15657 start_master.go:291] Warning: auditConfig.auditFilePath: Required value: audit can not be logged to a separate file, master start will continue.
2017-06-22 19:10:30.090724 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:4001: getsockopt: connection refused"; Reconnecting to {127.0.0.1:4001 <nil>}
#
#
#
2017-06-22 19:10:31.396348 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.396971 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.408698 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
#
#
I0622 19:10:31.412470   15657 clusterquotamapping.go:160] Starting ClusterQuotaMappingController controller
E0622 19:10:31.419957   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.PolicyBinding: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/policybindings?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0622 19:10:31.420013   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.Policy: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/policies?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
2017-06-22 19:10:31.420062 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
E0622 19:10:31.420150   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterPolicyBinding: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/clusterpolicybindings?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
#
#
#
E0622 19:10:31.420305   15657 reflector.go:201] github.com/openshift/origin/pkg/authorization/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterPolicy: Get https://127.0.0.1:8443/apis/authorization.openshift.io/v1/clusterpolicies?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0622 19:10:31.420889   15657 reflector.go:201] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:45: Failed to list *api.ClusterResourceQuota: Get https://127.0.0.1:8443/apis/quota.openshift.io/v1/clusterresourcequotas?resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
2017-06-22 19:10:31.428706 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.451436 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
2017-06-22 19:10:31.452487 I | etcdserver/api/v3rpc: Failed to dial [::]:4001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry.
W0622 19:10:31.635538   15657 genericapiserver.go:295] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
#
#
#
E0622 19:10:40.499517   15657 remote_runtime.go:203] StartContainer "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60" from runtime service failed: rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists
E0622 19:10:40.499672   15657 kuberuntime_manager.go:719] container start failed: rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists: Start Container Failed
E0622 19:10:40.499776   15657 pod_workers.go:182] Error syncing pod 72a1b580-577d-11e7-806a-9a85c517db53 ("mongodb-1-pjbnm_myproject(72a1b580-577d-11e7-806a-9a85c517db53)"), skipping: failed to "StartContainer" for "mongodb" with rpc error: code = 2 desc = failed to create symbolic link "/var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log" to the container log file "/var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log" for container "b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60": symlink /var/lib/docker/containers/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60/b2a99d799e26c6300e29bb95c4c0ce300fbafc249db32c1020a67a89bb3f7b60-json.log /var/log/pods/72a1b580-577d-11e7-806a-9a85c517db53/mongodb_0.log: file exists: "Start Container Failed"
W0622 19:10:40.667412   15657 docker_sandbox.go:263] Couldn't find network status for myproject/nodejs-mongo-persistent-1-jqkqj through plugin: invalid network status for
F0622 19:10:40.863206   15657 start_master.go:457] failed to get supported resources from server: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1alpha1: an error on the server ("Error: 'dial tcp 172.30.1.2:443: getsockopt: connection refused'\nTrying to reach: 'https://172.30.1.2/apis/servicecatalog.k8s.io/v1alpha1'") has prevented the request from succeeding

bparees · 2017-06-22T20:33:00Z

@deads2k any chance you can eyeball this log and say yes/no to it being an aggregator issue? seeing a lot of missing resources reported.

bparees · 2017-06-23T03:52:59Z

for what it's worth i couldn't recreate this on linux. cluster up, cluster down, cluster back up, no issue (w/ SC enabled in both cluster ups)

bparees · 2017-06-23T03:53:33Z

oops, i didn't use existing config. trying again.

bparees · 2017-06-23T04:14:38Z

ok, recreated. still not really sure what's up, still suspect aggregator issues.

deads2k · 2017-06-23T15:18:23Z

@deads2k any chance you can eyeball this log and say yes/no to it being an aggregator issue? seeing a lot of missing resources reported.

I would guess its missing #14595, but that's been in for a while.

bparees · 2017-06-23T17:55:39Z

I would guess its missing #14595, but that's been in for a while.

@deads2k yeah i'm hitting this on a very recent master.

deads2k · 2017-06-23T17:59:13Z

@deads2k yeah i'm hitting this on a very recent master.

Can you zip up the folder with your masterconfig and related artifacts for me to try?

bparees · 2017-06-23T19:29:31Z

@benjaminapetersen i can't recreate this anymore on master, here's what i'm running:

rm -rf /var/lib/origin
rm -rf /tmp/etcddata

oc cluster up --version=latest --service-catalog --host-data-dir=/tmp/etcddata

oc cluster down

oc cluster up --version=latest --service-catalog --host-data-dir=/tmp/etcddata --use-existing-config

can you build new images from master and see if it's still an issue for you?

bparees · 2017-06-23T19:30:01Z

argh. i keep forgetting the issue isn't having it come up, it's having it be available. it is still broken.

bparees · 2017-06-23T20:00:15Z

sent config/etcd files to @deads2k

deads2k · 2017-06-26T11:57:57Z

sent config/etcd files to @deads2k

Thanks. Opened #14881

bparees self-assigned this Jun 22, 2017

bparees added kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Jun 22, 2017

pweil- added the component/composition label Jun 23, 2017

bparees assigned deads2k and unassigned bparees Jun 23, 2017

deads2k mentioned this issue Jun 26, 2017

UPSTREAM: 47347: actually check for a live discovery endpoint before … #14881

Merged

benjaminapetersen mentioned this issue Jun 26, 2017

Occasionally oc cluster up results in a state that responds with EOF for all requests #14890

Closed

smarterclayton closed this as completed in #14881 Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster up with --use-existing-config does not start openshift/origin container #14842

Cluster up with --use-existing-config does not start openshift/origin container #14842

benjaminapetersen commented Jun 22, 2017

benjaminapetersen commented Jun 22, 2017

spadgett commented Jun 22, 2017

benjaminapetersen commented Jun 22, 2017 •

edited

Loading

bparees commented Jun 22, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 26, 2017

Cluster up with --use-existing-config does not start openshift/origin container #14842

Cluster up with --use-existing-config does not start openshift/origin container #14842

Comments

benjaminapetersen commented Jun 22, 2017

benjaminapetersen commented Jun 22, 2017

spadgett commented Jun 22, 2017

benjaminapetersen commented Jun 22, 2017 • edited Loading

bparees commented Jun 22, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

bparees commented Jun 23, 2017

deads2k commented Jun 26, 2017

benjaminapetersen commented Jun 22, 2017 •

edited

Loading