Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observe command #4196

Merged
merged 4 commits into from
Sep 2, 2016
Merged

Observe command #4196

merged 4 commits into from
Sep 2, 2016

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Aug 16, 2015

Based in part on kubernetes/kubernetes#5164

It should be easy for a third party to build a "controller" - a loop that iterates over all Kubernetes or OpenShift resources and invokes a script to maintain an invariant. It should also be easy for a third party to understand the implications of building a controller - there are many hard edge cases that users should be exposed to gradually.

This PR adds the observe command to oc. It takes a resource type and a flag to apply to the current namespace or all namespaces and then creates a DeltaFIFO that watches those resources. By default, the user gets only updates and adds, and cannot sync on deletes. The user can provide a --names command to handle external synchronization and handle deletes via a --delete command.

The --each command is invoked once for each resource. By default it takes namespace and name as the first two arguments, and a user can specify additional arguments with -a in JSONPath or Go template format. An admin can get the specific type via an env var by specifying --type-env-var. The admin can also use the --object-env-var flag to specify an env var that will contain the full serialized JSON of the object.

Two examples are described in the help - future examples will be created that demonstrate more complex scenarios.

$ oc observe namespaces --each set_annotations.sh
$ oc observe services --each register_dns.sh -a '{ .spec.clusterIP }' --names registered_services.sh
$ oc observe namespaces
#2016-08-30T14:17:55-04:00 Sync started
#2016-08-30T14:17:55-04:00 Sync 33438  "" bar
#2016-08-30T14:17:55-04:00 Sync 33370  "" default
#2016-08-30T14:17:55-04:00 Sync 33407  "" foo
#2016-08-30T14:17:55-04:00 Sync 33371  "" kube-system
#2016-08-30T14:17:55-04:00 Sync 36878  "" openshift
#2016-08-30T14:17:55-04:00 Sync 33373  "" openshift-infra
#2016-08-30T14:17:55-04:00 Sync 33374  "" test-cmd-images-2

The process exposes a health check, readiness check, and metrics for exec duration. By default the loop terminates with a non-zero exit code after 20 errors are encountered, which forces admins to handle poorly performing loops and surfaces them up in their intended use (as controllers in pods).

Todo:

  • good write up and call for experimentation
  • allow bulk sync vs individual sync
  • mode for direct piping
  • timeouts on process execution
  • parallel workers
  • document / demonstrate each controller pattern (level driven, resync, cleanup controller)
  • more examples in examples/observer
  • label selectors and field selectors
  • more complex filter rules
  • performance work

@thockin
Copy link

thockin commented Aug 17, 2015

Why not make sync just emit to stdout and let the processing be done by the caller. It's much more composable and flexible.

$ oc sync -t {.metadata.namespace} -t {.metadata.name} -t {.spec.clusterIP} \
    | while read NS NAME IP; do \
        echo "name:$NAME namespace:$NS ip:$IP"; \
    done

or even

$ oc sync -t {.metadata.namespace} -t {.metadata.name} -t {.spec.clusterIP} | grep -v "hack-hack-hack" \
    | while read NS NAME IP; do \
        echo "name:$NAME namespace:$NS ip:$IP"; \
    done

You'll never have enough builtin matchers and processors to keep people happy, just punt the whole thing to UNIX tools.

@smarterclayton
Copy link
Contributor Author

Because sync is fundamentally different than regular update - I think there
are cases where you want to reduce the amount of bash crap that goes into
this. Mindset I had coming in was the same we apply to the controllers -
no one should be implementing a controller - they should be implementing a
small set of specific behaviors for a use case.

So sync should probably be a batch operation (or at least have begin and
end). Individual updates should be incremental.

However, I did intend to have the formatting to STDOUT so that someone
could script that way. It's just that I don't think that's enough to make
this usable - I expect most of these loops to be keeping two systems in
sync, so the harder problems like reconciling the two maps needs to be
solved somehow.

On Mon, Aug 17, 2015 at 12:56 PM, Tim Hockin [email protected]
wrote:

Why not make sync just emit to stdout and let the processing be done by
the caller. It's much more composable and flexible.

$ oc sync -t {.metadata.namespace} -t {.metadata.name} -t {.spec.clusterIP}
| while read NS NAME IP; do
echo "name:$NAME namespace:$NS ip:$IP";
done

or even

$ oc sync -t {.metadata.namespace} -t {.metadata.name} -t {.spec.clusterIP} | grep -v "hack-hack-hack"
| while read NS NAME IP; do
echo "name:$NAME namespace:$NS ip:$IP";
done

You'll never have enough builtin matchers and processors to keep people
happy, just punt the whole thing to UNIX tools.


Reply to this email directly or view it on GitHub
#4196 (comment).

Clayton Coleman | Lead Engineer, OpenShift

@thockin
Copy link

thockin commented Aug 17, 2015

I don't follow. You're writing a CLI tool so people can do controller-like things outside of the core code. Whether they choose to script in bash, python, perl, or whatever - a universal interface is "here is a piece of text structured in a way we agree on". Having the --each flag is (to my mind) pointless and redundant when it's trivial to implement in terms of the universal interface (lines to stdout)

I am not sure what you mean by "should probably be a batch operation". Is there atomicity that matters? The beauty of a watch stream is that it is sequential - you start with a state dump and you process discrete mutations to that state, in series. What value does a begin/end marker offer other than information? why not make a new meta-field for update-transaction-number and prefix every output line?

$ oc sync -t {.metadata.namespace} -t {.metadata.name} -t {.spec.clusterIP} \
    | while read NS NAME IP; do \
        echo "name:$NAME namespace:$NS ip:$IP"; \
    done
0 ADD default foo 1.2.3.4
0 ADD default bar 1.2.3.5
1 UPDATE default foo 1.2.3.4
2 REMOVE default bar 1.2.3.5
^C

@thockin
Copy link

thockin commented Aug 17, 2015

Actually, how is this different than get -w ?

@smarterclayton
Copy link
Contributor Author

There's a difference between "trivial to implement for Tim Hockin" and
"trivial to implement for a sysadmin who needs to understand that writing
sync loops have lots of corner cases they need to consider".

As an example, consider the "for every PVC that comes in, dynamically
provision a PV in GCE/EC2" use case for sync (to allow admins to do dynamic
provisioning).

The naive implementation is "for every PVC that is unbound, there should be
a GCE disk and a PV containing that disk". But we know that's not enough.
There is the next requirement "I need to ensure that there are no GCE disks
that aren't bound to a PV due to a crash". Then there is "I need to ensure
that GCE disks that aren't stored in a PV are really not bound to a PV,
instead of this just being a race condition". Also there is "if the GCE
disk is deleted, delete the PV".

So I'm approaching this from the perspective of yes, you could go do get -w and pipe it to bash. Should you? Probably not. You should probably
have scripts that do different things - resync, vs incremental add. You'll
also need a script that needs to do cleanup. As an admin, I'm going to
screw this up, so I want to have enough debug output that I can reconstruct
the failure. You also want to make it easy for someone to run this in a
container, and hugely long command lines with pipes are terrible inside of
a container. It's about engineering a sync loop, vs hacking one together.

On Mon, Aug 17, 2015 at 4:07 PM, Tim Hockin [email protected]
wrote:

Actually, how is this different than get -w ?


Reply to this email directly or view it on GitHub
#4196 (comment).

Clayton Coleman | Lead Engineer, OpenShift

@smarterclayton
Copy link
Contributor Author

I'm also not saying that I want to take away the simple thing - but even
stuff like command arguments is important to get right. What about
timeouts on commands? What about parallel workers? My goal is to build
something that captures the heart of a controller loop for a few patterns
that will be common, and do enough of the heavy lifting that admins can
easily script the "react to events" part vs spend their time doing timeouts
in bash.

On Mon, Aug 17, 2015 at 4:51 PM, Clayton Coleman [email protected]
wrote:

There's a difference between "trivial to implement for Tim Hockin" and
"trivial to implement for a sysadmin who needs to understand that writing
sync loops have lots of corner cases they need to consider".

As an example, consider the "for every PVC that comes in, dynamically
provision a PV in GCE/EC2" use case for sync (to allow admins to do dynamic
provisioning).

The naive implementation is "for every PVC that is unbound, there should
be a GCE disk and a PV containing that disk". But we know that's not
enough. There is the next requirement "I need to ensure that there are no
GCE disks that aren't bound to a PV due to a crash". Then there is "I need
to ensure that GCE disks that aren't stored in a PV are really not bound to
a PV, instead of this just being a race condition". Also there is "if the
GCE disk is deleted, delete the PV".

So I'm approaching this from the perspective of yes, you could go do get -w and pipe it to bash. Should you? Probably not. You should probably
have scripts that do different things - resync, vs incremental add. You'll
also need a script that needs to do cleanup. As an admin, I'm going to
screw this up, so I want to have enough debug output that I can reconstruct
the failure. You also want to make it easy for someone to run this in a
container, and hugely long command lines with pipes are terrible inside of
a container. It's about engineering a sync loop, vs hacking one together.

On Mon, Aug 17, 2015 at 4:07 PM, Tim Hockin [email protected]
wrote:

Actually, how is this different than get -w ?


Reply to this email directly or view it on GitHub
#4196 (comment).

Clayton Coleman | Lead Engineer, OpenShift

Clayton Coleman | Lead Engineer, OpenShift

@thockin
Copy link

thockin commented Aug 17, 2015

So this is get -w with more specific semantics for control-loops? Fine. I still don't see --each (it's an exec, right?) being more useful than pipe-to-shell. I don't buy that anyone who is writing a controller around this tool is not capable of writing a shell script. Or rather, if they get the shell script wrong, they'll get this wrong too.

@smarterclayton
Copy link
Contributor Author

So this command is intended to solve "allow someone to build a controller
that can be scripted and give them all the tools and examples and
experience to prevent them from cutting themselves on the rough edges".
It's not intended to solve get -w | bash except at the on ramp.

Are you saying you think the approach of providing an interface (commands
corresponding to methods that have equivalents in the internal controller
patterns library) usable from bash is flawed? Where would you take it?

My goal is:
"Any sysadmin capable of calling out to an external system can produce a
reasonable stable, resilient, and comprehensible controller / sync pattern
for the common external integration use cases in bash in a container, and
do so in an incremental fashion where hard problems are made tractable for
them".

I think if you need to write a bash script with looping, extensive timeout
control, process management, deep logging, or implement core parts of the
controller framework, then the tool will have failed to give you the
assistance you needed. It should be possible to step up to Python or Ruby
or JS fairly easily and not have to change much.

On Aug 17, 2015, at 5:18 PM, Tim Hockin [email protected] wrote:

So this is get -w with more specific semantics for control-loops? Fine. I
still don't see --each (it's an exec, right?) being more useful than
pipe-to-shell. I don't buy that anyone who is writing a controller around
this tool is not capable of writing a shell script. Or rather, if they get
the shell script wrong, they'll get this wrong too.


Reply to this email directly or view it on GitHub
#4196 (comment).

@smarterclayton
Copy link
Contributor Author

Also I want to be able to avoid deep conditional logic in the bash code as much as possible by offering helpers that can do hard stuff

@thockin
Copy link

thockin commented Aug 19, 2015

Are you saying you think the approach of providing an interface (commands
corresponding to methods that have equivalents in the internal controller
patterns library) usable from bash is flawed? Where would you take it?

No, I just want to make sure that the plumbing is sufficient to handle
"real" scripting and not not just simple exec callouts. I see those exec
callouts as a subset of the interface. If having exec callouts solves a
real use case (which I have not seen), then fine.

Can you sketch a script (pseudocode is fine) using this feature that
implements a "stable, resilient, and comprehensible controller" ? Maybe I
just don't see the whole feature.

On Mon, Aug 17, 2015 at 2:56 PM, Clayton Coleman [email protected]
wrote:

So this command is intended to solve "allow someone to build a controller
that can be scripted and give them all the tools and examples and
experience to prevent them from cutting themselves on the rough edges".
It's not intended to solve get -w | bash except at the on ramp.

Are you saying you think the approach of providing an interface (commands
corresponding to methods that have equivalents in the internal controller
patterns library) usable from bash is flawed? Where would you take it?

My goal is:
"Any sysadmin capable of calling out to an external system can produce a
reasonable stable, resilient, and comprehensible controller / sync pattern
for the common external integration use cases in bash in a container, and
do so in an incremental fashion where hard problems are made tractable for
them".

I think if you need to write a bash script with looping, extensive timeout
control, process management, deep logging, or implement core parts of the
controller framework, then the tool will have failed to give you the
assistance you needed. It should be possible to step up to Python or Ruby
or JS fairly easily and not have to change much.

On Aug 17, 2015, at 5:18 PM, Tim Hockin [email protected] wrote:

So this is get -w with more specific semantics for control-loops? Fine. I
still don't see --each (it's an exec, right?) being more useful than
pipe-to-shell. I don't buy that anyone who is writing a controller around
this tool is not capable of writing a shell script. Or rather, if they get
the shell script wrong, they'll get this wrong too.


Reply to this email directly or view it on GitHub
#4196 (comment).


Reply to this email directly or view it on GitHub
#4196 (comment).

@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 22, 2015
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 4, 2015
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 13, 2015
@danmcp danmcp closed this Jan 18, 2016
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 26, 2016
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 6, 2016

// add additional arguments / info to the server
cmd.Flags().StringVar(&options.templateType, "use", options.templateType, "Format for template arguments. jsonpath and gotemplate are supported.")
cmd.Flags().BoolVar(&options.strictTemplates, "strict-templates", false, "If true return an error on any field or map key that is not missing in a template.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think that I want to have a "make it fail" option. An env var indicating the failure ought to be sufficient.

cmd.Flags().StringVar(&options.objectEnvVar, "object-env-var", "", "The name of an env var to serialize the object to when calling --each, optional.")

// control retries of individual commands
cmd.Flags().IntVar(&options.maximumErrors, "maximum-errors", options.maximumErrors, "Exit after this many errors have been detected with. May be set to -1 for no maximum.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I think you're in a pod, and if you hit enough errors
you restart. There is no sliding window that will work for all users. We
could do % for sync but it wouldn't have any meaning after sync. In
practice, your scripts should never error. If you're erroring, you want to
restart.

10 errors out of 20 attempts is a lot different than 10 errors out of 20,000 attempts. Having one value is silly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this is to get you to fix your bugs. The worst outcome to me
is you have a lot of uncovered edge cases and you never discover it because
you don't look at your logs. The best outcome is by default you get a nice
error "hey, you hit too many errors, maybe you should think about fixing
your stuff". I am worried about hitting the limit on sync and not getting
through the whole list (so maybe we can defer the exit until after the
first sync is complete). OTOH if you have > 50 items and you fail the first
twenty I don't see much point to continuing.

I think of this as gamifying error handling - you are a new user, we will
gently nudge you to do things that handle errors, without trying to
surprise you.

On Tue, Aug 30, 2016 at 3:52 PM, David Eads [email protected]
wrote:

In pkg/cmd/cli/cmd/observe/observe.go
#4196 (comment):

  • // the commands to execute on changes
  • cmd.Flags().VarP(&options.eachCommand, "each", "e", "A command to run for each change. Specify mulitple times to add arguments.")
  • // to perform deletion synchronization
  • cmd.Flags().VarP(&options.deleteCommand, "delete", "d", "A command to run when resources are deleted. Specify mulitple times to add arguments.")
  • cmd.Flags().Var(&options.nameSyncCommand, "names", "A command that will list all of the currently known names, optional. Specify multiple times to add arguments. Use to get notifications when objects are deleted.")
  • // add additional arguments / info to the server
  • cmd.Flags().StringVar(&options.templateType, "use", options.templateType, "Format for template arguments. jsonpath and gotemplate are supported.")
  • cmd.Flags().BoolVar(&options.strictTemplates, "strict-templates", false, "If true return an error on any field or map key that is not missing in a template.")
  • cmd.Flags().VarP(&options.templates, "argument", "a", "Template for the arguments to be passed to each command in the format defined by --use.")
  • cmd.Flags().StringVar(&options.typeEnvVar, "type-env-var", "", "The name of an env var to set with the type of event received ('Sync', 'Updated', 'Deleted', 'Added') as the first argument to the --each command.")
  • cmd.Flags().StringVar(&options.objectEnvVar, "object-env-var", "", "The name of an env var to serialize the object to when calling --each, optional.")
  • // control retries of individual commands
  • cmd.Flags().IntVar(&options.maximumErrors, "maximum-errors", options.maximumErrors, "Exit after this many errors have been detected with. May be set to -1 for no maximum.")

I don't think so. I think you're in a pod, and if you hit enough errors
you restart. There is no sliding window that will work for all users. We
could do % for sync but it wouldn't have any meaning after sync. In
practice, your scripts should never error. If you're erroring, you want to
restart.

10 errors out of 20 attempts is a lot different than 10 errors out of
20,000 attempts. Having one value is silly.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/openshift/origin/pull/4196/files/d21d6896f2faeac799f81a73286a7821db37b0d4#r76866832,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p_sCJ3E5qT64tInJ1Fmp0aecYoRuks5qlIoAgaJpZM4Fsaph
.

@deads2k
Copy link
Contributor

deads2k commented Aug 30, 2016

I want downward API, not arbitrary API.

About the env vars. I don't see us ending up in a place where this command still make sense, but a watch on a resource doesn't have:

  1. type
  2. namespace
  3. name
  4. uid
  5. resource
  6. group
  7. version
  8. content

Why would I make a user explicitly opt-in (maybe even have to describe it using templates) and end up with scripts that each use slightly different values? Seems like that's just making it hard to work with.

@smarterclayton smarterclayton force-pushed the sync branch 2 times, most recently from 41aa0ff to b4d4f52 Compare August 31, 2016 15:21
@smarterclayton
Copy link
Contributor Author

For those things you mentioned each of them is in the JSON. I think JSON as env var is best (any advanced case do what you want in your script). The only other field you can't get at is type, so that's an env var as well.

@smarterclayton
Copy link
Contributor Author

Changed --use to --output, but made -a behave like --template. It's really
hard to have an ambiguous pattern of defaulting, so I ran it by jessica and
she said given the choice of simplicity or flexibility she'd pick
simplicity (because most of the time you won't need the flexibility) and
you can do complex things in the script. Given that I'd prefer to stick
with "-a 'jsonpathstuff' --output jsonpath'.

No other changes outstanding.

On Wed, Aug 31, 2016 at 4:57 PM, OpenShift Bot [email protected]
wrote:

continuous-integration/openshift-jenkins/test Running (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8602/)


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#4196 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5yP2a1deVxUIEQ4LsWwVNzzRbrWks5qlerPgaJpZM4Fsaph
.

@smarterclayton
Copy link
Contributor Author

[test]

On Wed, Aug 31, 2016 at 8:02 PM, OpenShift Bot [email protected]
wrote:

continuous-integration/openshift-jenkins/test ABORTED (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8602/)


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#4196 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pyjTPnsrq_R61ywNDYTTOzH41oqjks5qlhYSgaJpZM4Fsaph
.

@smarterclayton
Copy link
Contributor Author

[test]

On Thu, Sep 1, 2016 at 12:27 AM, OpenShift Bot [email protected]
wrote:

continuous-integration/openshift-jenkins/test ABORTED (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8610/)


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
#4196 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p7dBpPyhlxCho46uk-Je72bg2gdqks5qllQwgaJpZM4Fsaph
.

@smarterclayton
Copy link
Contributor Author

[test]

The 'observe' command tracks changes to objects and invokes a named
script each time the object is updated. The command can also assist
users in building external integrations by assisting in reconciling
differences between data recorded in one resource and data recorded in
an external system.

See the help documentation for more.
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 185e992

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8648/)

@smarterclayton
Copy link
Contributor Author

[merge] for great justice

@openshift-bot
Copy link
Contributor

openshift-bot commented Sep 2, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8648/) (Image: devenv-rhel7_4972)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 185e992

@openshift-bot openshift-bot merged commit 0a9f79a into openshift:master Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants