Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster up: add persistent volumes on startup #12456

Merged
merged 1 commit into from
Jan 17, 2017

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Jan 11, 2017

Adds persistence support to 'cluster up'

  • Starts a job that creates a set of persistent volumes on cluster up start.
  • The directory where persistent volumes are created can be specified through a command line argument.
  • The registry, metrics, and logging are switched to use persistence.
  • The Jenkins auto provisioner is now using the jenkins-persistent template.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 11, 2017

@bparees ptal - I marked this PR as WIP because I'd like it to be tried out and tested to make sure it meets our use cases for persistence.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 11, 2017

@jorgemoralespou @GrahamDumpleton your feedback is greatly appreciated.

@bparees
Copy link
Contributor

bparees commented Jan 11, 2017

lgtm

@GrahamDumpleton
Copy link

GrahamDumpleton commented Jan 11, 2017

Are you having auto volume provisioning? If not, how do you specify number of volumes to pre create? How do you override size created as?

There is also still the question of how to deal with volumes when released. The persistent volumes need to be set as 'Retain' as too risk to delete automatically. Plus you can't necessarily delete them depending on how file ownership is.

FWIW, in our latest wrappers we pre-create 10 persistent volumes of 10Gi. In my version of the wrapper I allow volume count and volume size to be overridden.

Our wrappers also provide commands for manually adding volumes later. These can be additional anonymous volumes, or volumes you associate with a specific directory and set up with a claim so can be associated with specific application more easily when the directory is not empty and has data for specific application.

@bparees
Copy link
Contributor

bparees commented Jan 11, 2017

Are you having auto volume provisioning?

no. it was investigated but auto provisioning doesn't handle permissions properly on the directories it creates.

If not, how do you specify number of volumes to pre create?

it creates 1000.

How do you override size created as?

you don't. they are all created as 100gig volumes, openshift will assign you the smallest available volume that is greater than or equal to your request size, so by doing this all the volumes should be viable for any PVC. For any reasonable use of oc cluster up, that should be sufficient for all needs.

Our wrappers also provide commands for manually adding volumes later.

our assumption is 1000 ought to be enough for the intended purposes of oc cluster up(which does not include long running/managed clusters), but there is a plan to document how to create additional PVs if desired. (hopefully still on @csrwng's checklist, as part of the doc for this feature).

- ReadWriteOnce
hostPath:
path: ${BASEDIR}/${NAME}
persistentVolumeReclaimPolicy: Retain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does recycle not work for hostpath volumes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees I tested this and it does work. Once you delete a PVC, the contents of the pv directory are wiped out. Trying to understand if this is what we should be using ('Recycle' instead of 'Retain')

@jorgemoralespou
Copy link

Is the volume created on the origin container or on the host?

@jorgemoralespou
Copy link

Closed by accident.

@bparees
Copy link
Contributor

bparees commented Jan 12, 2017

Is the volume created on the origin container or on the host?

the actual storage is a directory on the host filesystem.

@GrahamDumpleton
Copy link

It should be noted that mounting volumes from the real underlying host (not the VM) on MacOS X will not work properly if using VirtualBox. This is why only support Docker for Mac with our wrappers and not older Docker version based on boot2docker.

When using docker-machine, the volume path isn't on local host but can only be whatever docker-machine instance is is created which runs Docker.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 12, 2017

It should be noted that mounting volumes from the real underlying host (not the VM) on MacOS X will not work properly if using VirtualBox. This is why only support Docker for Mac with our wrappers and not older Docker version based on boot2docker.

Correct, however even then you get the benefit of persistence, even though it's not accessible on your host machine.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2017

The persistent volumes need to be set as 'Retain' as too risk to delete automatically.

@GrahamDumpleton I want to better understand the risk. If you get rid of the PVC, should the contents of the directory not go away?

@jorgemoralespou
Copy link

jorgemoralespou commented Jan 13, 2017 via email

@bparees
Copy link
Contributor

bparees commented Jan 13, 2017

Again, I never know what's the end goal of "cluster up" :-D

it's definitely not to run 2 clusters on the same host.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2017

@jorgemoralespou in such a use case (where you have a specific pv you want to reuse across different clusters) I would consider the pv more like a "pet" rather than "cattle". In that case, you're better off creating your own pv with the appropriate retention policy. At the very least we can have documentation to help you do that. However, the purpose of this pull is to add cattle type of pv's to cluster up environments.

@csrwng csrwng force-pushed the clusterup_persistence branch from 0d35314 to e885542 Compare January 13, 2017 16:46
@csrwng
Copy link
Contributor Author

csrwng commented Jan 13, 2017

@bparees @jorgemoralespou there's 2 concerns that I still have regarding this pull:

  1. naming of the pv argument for pv dirs. We currently have the following:

    --host-config-dir
    --host-data-dir
    --host-volumes-dir
    

    I'm adding a new argument

    --persistent-volumes-dir
    

    which doesn't match the naming convention so far. Would you prefer

    --host-pv-dir 
    

    or

    --host-persistent-volumes-dir
    

    ?

  2. currently if provisioning has started in a previous cluster start (by checking the existence of a service account), then provisioning is not attempted again. However, now I'm thinking that if I start my cluster and very quickly bring it down, then maybe only a small number of pvs get provisioned. I'm thinking that maybe I should start the provisioning process regardless. Thoughts?

@jorgemoralespou
Copy link

@bparees

it's definitely not to run 2 clusters on the same host.

Why not? not saying at the same time (concurrently), but definitely I see many people (developers) creating multiple clusters with different information and starting/stopping the appropriate cluster at will.

@csrwng Why not using a Job to do the PV creation, so if it fails, it will get launched on next start. Also, PV creation (the Job) should be idempotent to not create the already created PVs.
Finally, startup time does not rely on the PV creation. The job will get launched as soon as the platform is up.

Does it make sense?

@bparees
Copy link
Contributor

bparees commented Jan 13, 2017

Why not? not saying at the same time (concurrently), but definitely I see many people (developers) creating multiple clusters with different information and starting/stopping the appropriate cluster at will.

oc cluster up is not for maintaining and managing long lived clusters. if you're using it to stop+restart clusters, you're already off the use cases we are targeting.

@bparees
Copy link
Contributor

bparees commented Jan 13, 2017

@csrwng --host-pv-dir is my vote but i don't have strong feelings.

@jorgemoralespou
Copy link

@bparees

oc cluster up is not for maintaining and managing long lived clusters. if you're using it to stop+restart clusters, you're already off the use cases we are targeting.

Then why are we providing keep-config? What is long lived clusters for you? I don't agree you assesment here.

@jorgemoralespou
Copy link

@csrwng tested.

Errors with following configuration:

oc cluster up --public-hostname '127.0.0.1'                 --host-data-dir '/Users/jmorales/.oc/profiles/pv/data'                 --host-config-dir '/Users/jmorales/.oc/profiles/pv/config'                 --routing-suffix 'apps.lcup'                 --use-existing-config
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0-alpha.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using public hostname IP 127.0.0.1 as the host IP
   Using 127.0.0.1 as the server IP
-- Starting OpenShift container ...
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
FAIL
   Error: cannot start OpenShift daemon
   Caused By:
     Error: cannot start container c0beda67dae8d2548893b877bc44ac54dcc4d967e829716d36331c841d4a884a
     Caused By:
       Error: API error (502): Mounts denied: nfo.
       .
       The path /var/lib/origin/openshift.local.pv
       is not shared from OS X and is not known to Docker.
       You can configure shared paths from Docker -> Preferences... -> File Sharing.
       See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more i

[ERROR] Cluster has not started correctly. Profile configuration will be preserved

Included the flag:

oc cluster up --public-hostname '127.0.0.1'                 --host-data-dir '/Users/jmorales/.oc/profiles/pv/data'                 --host-config-dir '/Users/jmorales/.oc/profiles/pv/config'                 --persistent-volumes-dir '/Users/jmorales/.oc/profiles/pv/pv'                 --routing-suffix 'apps.lcup'                 --use-existing-config
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ...
   Deleted existing OpenShift container
-- Checking for openshift/origin:v1.5.0-alpha.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
   Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
   Using public hostname IP 127.0.0.1 as the host IP
   Using 127.0.0.1 as the server IP
-- Starting OpenShift container ...
   Creating initial OpenShift configuration
   Starting OpenShift using container 'origin'
FAIL
   Error: cannot start OpenShift daemon
   Caused By:
     Error: cannot start container a1599887542dfb7f091a2c0ce8085145d9a292434a1e261c17413f92c78f387d
     Caused By:
       Error: API error (502): Mounts denied: nfo.
       .
       The path /var/lib/origin/openshift.local.pv
       is not shared from OS X and is not known to Docker.
       You can configure shared paths from Docker -> Preferences... -> File Sharing.
       See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more i

[ERROR] There's been an error creating the cluster, the profile [] will be removed

@jorgemoralespou
Copy link

jorgemoralespou commented Jan 13, 2017 via email

@GrahamDumpleton
Copy link

When I talk about risk, my opinion may be clouded by what may have been incorrect behaviour seen in early versions of oc cluster up. I will have to test again, but my recollection was that when it would recycle the directories, it wouldn't always delete everything that was in them. For example, if the application had deliberately made files read only, then the cleanup process wouldn't remove them. this meant those files carried over to the next use of the directory as a volume and could cause problems.

As far as use cases for oc cluster up, if the official stance is that there is only interest in the specific use case that engineering has for using this with testing, then it shouldn't be promoted at all to end users as something that can be used. To me this is the wrong attitude though. There should be acceptance that other people will want to use this differently to what you have in mind, and whatever you do to make it work for your use case, you simply need to keep in mind that what other people do will be different, and not do things that will specifically block other use cases or make things much harder. For example, it is being flexible in changing things if necessary, such as was done with having the developer account login with a fixed password on every up. This was relaxed and meant that we could set up the identity provider as htpasswd and use real passwords.

So always make things optional, easily disabled, or reconfigurable. Don't force a particular way of thinking or doing things that then makes it unusable to others. If this is seen as just too hard, then a variant of the oc cluster concept should be broken out as a distinct project and let the community guide its direction as to what it should do. To that end, it perhaps should be a separate project under the MiniShift umbrella, given that MiniShift will likely also want to drive this down a path more useful to developers of applications, rather than testers.

@bparees
Copy link
Contributor

bparees commented Jan 13, 2017

The challenge is we don't want oc cluster up to turn into an alternative cluster management tool. We don't want to dilute our dev or test resources in that way. If you need a highly configurable/manageable cluster, that's what ansible is for, possibly it's what the CDK/minishift are for too.

@GrahamDumpleton
Copy link

We aren't expecting all the separate things we are doing in wrappers to be pushed down into oc cluster up. We are happy simply if things are done in oc cluster up in a way which still enables us to do extra things, and doesn't block us. MiniShift will no doubt have the same concerns. So I wouldn't necessarily be rushing to add extra features just because we have them in our wrappers.

@GrahamDumpleton
Copy link

For v1.4.0-rc1 on MacOS X, Recycle policy on volumes is appearing to work fine even in case where files are made read only. I will need to test with Linux and Windows and with v1.3.2, but if that checks out okay, then I will flick the default reclaim policy for the initially created volumes in my wrapper to Recycle.

I have also added a --reclaim-policy option to command for creating a new volume. For explicitly created volumes I will keep the default in that case Retain to be safe so that users do not accidentally delete their existing files if point to a directory with stuff in it. They can override it if need be with the option.

@csrwng csrwng force-pushed the clusterup_persistence branch from e885542 to 1fa88b8 Compare January 16, 2017 18:24
@csrwng
Copy link
Contributor Author

csrwng commented Jan 16, 2017

I've now updated the job to be restartable and changed the name of the host dir flag to be more consistent with the other hostdir flags.

@csrwng csrwng changed the title [WIP] cluster up: add persistent volumes on startup cluster up: add persistent volumes on startup Jan 16, 2017
@jorgemoralespou
Copy link

@csrwng I still think 1000 PVs is quite a lot.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 16, 2017

@jorgemoralespou what do we gain by reducing the number?

@csrwng csrwng force-pushed the clusterup_persistence branch from 1fa88b8 to b37416f Compare January 17, 2017 04:05
@jorgemoralespou
Copy link

jorgemoralespou commented Jan 17, 2017 via email

@csrwng csrwng force-pushed the clusterup_persistence branch from b37416f to b255e6f Compare January 17, 2017 18:49
@csrwng
Copy link
Contributor Author

csrwng commented Jan 17, 2017

ok, now creating 100 pv with all modes (RWO,RWX,ROX)

@csrwng
Copy link
Contributor Author

csrwng commented Jan 17, 2017

Tested on Mac, Win, Linux
@bparees can I get a final signoff?

@bparees
Copy link
Contributor

bparees commented Jan 17, 2017

lgtm

@csrwng
Copy link
Contributor Author

csrwng commented Jan 17, 2017

[merge]

@jorgemoralespou
Copy link

jorgemoralespou commented Jan 17, 2017 via email

@csrwng
Copy link
Contributor Author

csrwng commented Jan 17, 2017

@jorgemoralespou yes

@csrwng
Copy link
Contributor Author

csrwng commented Jan 17, 2017

Flake #12530
[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to b255e6f

@openshift-bot
Copy link
Contributor

openshift-bot commented Jan 17, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12965/) (Base Commit: eaa36ed) (Image: devenv-rhel7_5703)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants