Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove /var/lib/docker volume from containerized node #15105

Closed

Conversation

sjenning
Copy link
Contributor

@sjenning sjenning commented Jul 7, 2017

xref https://bugzilla.redhat.com/show_bug.cgi?id=1467824

tl;dr upstream docker does not allow the forced removal of containers anymore. Before, a forced removal, when using devicemapper graph driver, simply orphaned a thin device and those devices would eventually consume the pool. The upstream docker change was backported to docker-1.12.6-40 to fix this issue.

However, the mounting of /var/lib/docker into the containerized node is causing other openshift services from being able to (re)start successfully since the systemd unit files first remove the old container, which fails due to the containerized node holding the mount point, then tries a docker run, which fails because a container with that name already exists.

To avoid this, the PR removes /var/lib/docker from the volumes mounted into the containerized node.

@derekwaynecarr @smarterclayton @eparis

@sjenning
Copy link
Contributor Author

sjenning commented Jul 7, 2017

@rhvgoyal just saw your update on the BZ. Is this not needed anymore?

@rhvgoyal
Copy link

@sjenning I have created a PR so that oci-umount plugin will take care of removing multiple mounts inside container. But I think modifying unit file to not mount /var/lib/docker/ to begin with is a good idea.

IOW, while my PR to oci-umount will take care of this, but we should merge this change anyway (irrespective of my PR).

@rhvgoyal
Copy link

containers/oci-umount#10

@sjenning
Copy link
Contributor Author

[test]

@smarterclayton
Copy link
Contributor

@csrwng do we need to change cluster up?

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 38b2082

giuseppe added a commit to giuseppe/origin that referenced this pull request Jul 10, 2017
@csrwng
Copy link
Contributor

csrwng commented Jul 10, 2017

@smarterclayton yes we need to remove it there as well, will create a PR

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/3031/) (Base Commit: 18c5730) (PR Branch Commit: 38b2082)

@csrwng
Copy link
Contributor

csrwng commented Jul 10, 2017

@sjenning @rhvgoyal under what circumstances should we not mount /var/lib/docker anymore? Is it based on the Docker version? or the version of the kubelet? Will things break, for example, if I try to run origin v1.5.1 images on top of Docker docker-1.12.6-30 and I don't mount /var/lib/docker ?

@rhvgoyal
Copy link

@csrwng I am expecting that you don't have to mount /var/lib/docker/ in any circumstance at all. There is one logging case of fluentd which still needs it. Apart from that, if there are other use cases, I want to know.

@csrwng
Copy link
Contributor

csrwng commented Jul 10, 2017

@rhvgoyal So for the one fluentd case, is it no longer required?

@rhvgoyal
Copy link

@rhvgoyal I think for fluentd case you still require it. As I understand it, for json log based driver, they still go trough container logs under /var/lib/docker.

@rhvgoyal
Copy link

@csrwng I myself am trying to figure out who else needs it and what breaks if we don't mount it.

giuseppe added a commit to giuseppe/origin that referenced this pull request Jul 11, 2017
with /var/lib/docker/containers

so we don't leak /var/lib/docker/devicemapper mount points
into the container.

Reference: openshift#15105

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

would a build still work? I've tried this change in the system container and if there is no /var/lib/docker/containers present, the build fails with:

# oc logs -f bc/hello-openshift-plus
failed to open log file "/var/log/pods/7267b90a-65dc-11e7-9b7c-525400874d80/docker-build_0.log": open /var/log/pods/7267b90a-65dc-11e7-9b7c-525400874d80/docker-build_0.log: no such file or directory[root@69ccc7cff516 /]# oc logs -f bc/hello-openshift-plus

That happens since files under /var/log/pods are symbolic links to /var/lib/docker/containers/, e.g.

ls -l /var/log/pods/bd5e4ab0-65df-11e7-80e7-525400874d80/docker-build_0.log
lrwxrwxrwx. 1 root root 165 Jul 11 02:22 /var/log/pods/bd5e4ab0-65df-11e7-80e7-525400874d80/docker-build_0.log -> /var/lib/docker/containers/37eb94514154a15d542856c03a66cd08fc0d010a1ff9b327e79ceffffe8b4a29/37eb94514154a15d542856c03a66cd08fc0d010a1ff9b327e79ceffffe8b4a29-json.log

I have not tested this with the Docker container, so please ignore my comment if it doesn't happen there. If so though, what is done differently than files in /var/log/pods have the right content?

giuseppe added a commit to giuseppe/origin that referenced this pull request Jul 11, 2017
with /var/lib/docker/containers

so we don't leak /var/lib/docker/devicemapper mount points
into the container.

Reference: openshift#15105

Signed-off-by: Giuseppe Scrivano <[email protected]>
@rhvgoyal
Copy link

@giuseppe These seems to be just accessing json logs of a container. /var/lib/docker/containers/ is just a bind mount onto itself. So even if I unmount it, logs are still there.

So I doubt this breakage is due to unmounting of /var/lib/docker/containers. That directory and logs with-in should still be accessible.

@giuseppe
Copy link
Member

@rhvgoyal yes true, my point was that if we just drop the /var/lib/docker mount point (as currently the patch seems to do) we lose completely access to /var/lib/docker/containers from the node container.

Do we need to add a mount point for /var/lib/docker/containers since /var/lib/docker is not mounted anymore or how that will be accessible? I needed this in the system container case to fix the issue I reported above.

@rhvgoyal
Copy link

I don't understand openshift. So you will have to simplify it for me. So why do we need access to logs while doing a build?

@giuseppe
Copy link
Member

I am not expert either but my guess is that it is needed for oc logs. If I use oc logs I can see the docker build log there.

@rhvgoyal
Copy link

Ok, so this is the case where "oc logs" is being run from inside a container and it is trying to access logs of a container? This sounds similar to fluentd accessing container logs.

So why are we not using "docker logs" to access logs?

@rhvgoyal
Copy link

Anyway, if some container does need /var/lib/docker, then yes they will have to volume mount it in. I don't know which containers need it. So openshift service/container needs to decide which containers really need it.

oci-umount should take care of unmounting /var/lib/docker/devicemapper and /var/lib/docker/containers from inside the containers.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 11, 2017 via email

@sjenning
Copy link
Contributor Author

sjenning commented Jul 11, 2017

@smarterclayton I was trying to prove-by-doing but containerized install is busted #15147. I'll have to look at the code in the meantime.

@sjenning
Copy link
Contributor Author

This is the only thing I can find

https://github.com/openshift/origin/blob/master/pkg/cmd/server/kubernetes/node/node.go#L61-L63

For that, /var/lib/docker doesn't have to be mounted in, but it does need to exist.

@openshift-merge-robot openshift-merge-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 24, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjenning
We suggest the following additional approver: deads2k

Assign the PR to them by writing /assign @deads2k in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjenning
We suggest the following additional approver: deads2k

Assign the PR to them by writing /assign @deads2k in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2017
@deads2k
Copy link
Contributor

deads2k commented Jul 31, 2017

/assign sjenning
/unassign

giuseppe added a commit to giuseppe/origin that referenced this pull request Nov 27, 2017
with /var/lib/docker/containers

so we don't leak /var/lib/docker/devicemapper mount points
into the container.

Reference: openshift#15105

Signed-off-by: Giuseppe Scrivano <[email protected]>
openshift-merge-robot added a commit that referenced this pull request Nov 29, 2017
Automatic merge from submit-queue (batch tested with PRs 17476, 17143, 15115, 17094, 17500).

node, syscontainer: drop /var/lib/docker mount point

Reference: #15105

Signed-off-by: Giuseppe Scrivano <[email protected]>
@sjenning
Copy link
Contributor Author

@rhvgoyal do we still need this?

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2018
@sjenning sjenning closed this Feb 27, 2018
@sjenning
Copy link
Contributor Author

I think this is handled via other means now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants