Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins memory cleanup #442

Merged

Conversation

jim-minter
Copy link
Contributor

No description provided.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 13, 2017
@jim-minter
Copy link
Contributor Author

@bparees this is a starting point - want to get things about right in run-jnlp-client, then maybe squash, then carry the same changes through the Jenkins startup. Not tested yet either. ptal for early feedback and discussion?

Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one immediate worry I have: We're now going to pick the java arch based on the jnlp JVM's heap. Which means someone who tunes for a small JNLP heap will get 32bit java and then their maven jvms will also be run in 32bit java.

I wonder if we should pick 32bit java purely based on the container's cgroup limit. (less than a 2gig container? you get 32bit. otherwise you get 64bit and if you want something else you tune it yourself).

DEFAULT_MEMORY_CEILING=$((2**40-1))
if [ "${CONTAINER_MEMORY_IN_BYTES}" -lt "${DEFAULT_MEMORY_CEILING}" ]; then
if [[ -z "${JAVA_TOOL_OPTIONS}" ]]; then
# these options will automatically be picked up by any child JVM process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any child jvm process or just "any jvm process"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change to "any JVM process". I mean(t) the spawned JNLP JVM and any subsequent JVMs it may spawn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my education, JAVA_OPTS was a first class env var of sorts, specifically looked for by say maven and/or gradle or some such, and hence we are avoiding overlap, right?

echo "max heap in MB is ${CONTAINER_HEAP_MAX} and 64 bit was not explicitly set so using 32 bit Java"
alternatives --set java $JVMPath32bit
export MALLOC_ARENA_MAX=1
if [[ $CONTAINER_HEAP_LIMIT_IN_MB && $CONTAINER_HEAP_MAX -gt $CONTAINER_HEAP_LIMIT_IN_MB ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this(CONTAINER_HEAP_LIMIT_IN_MB) "JNLP_MAX_HEAP_UPPER_BOUND_MB"? yeah it's wordy but nothing else offers the same clarity that:

  1. this is for the jnlp jvm (presumably we'll want a different, or no, value for the other JVMs)
  2. it's the upper bound for the max heap setting (vs the upper bound for the min heap setting, or just being the max heap limit setting for the jvm)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

# set, use 32 bit JVM for space efficiency
if [[ -z $OPENSHIFT_JENKINS_JVM_ARCH ]]; then
if [[ "${CONTAINER_HEAP_MAX}" -gt 2048 ]]; then
echo "max heap in MB is ${CONTAINER_HEAP_MAX}, using 64 bit Java"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're cleaning things up, both of these messages(here and the 32 bit one below) should say "OPENSHIFT_JENKINS_JVM_ARCH was not set and max heap in MB is X. Using Y bit Java"

else
# no k8s/docker memory limit set
if [[ -z $OPENSHIFT_JENKINS_JVM_ARCH ]]; then
echo "no cgroup memory limit found, using 64 bit Java"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPENSHIFT_JENKINS_JVM_ARCH is not set and no cgroup memory limit was found, using 64 bit Java"

if [ -z "${JAVA_OPTS}" ]; then
JAVA_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true"
if [[ -z "${CONTAINER_JAVA_OPTIONS}" ]]; then
CONTAINER_JAVA_OPTIONS="$JAVA_GC_OPTS $JAVA_INITIAL_HEAP_PARAM $JAVA_MAX_HEAP_PARAM $JAVA_CORE_LIMIT $JAVA_DIAGNOSTICS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JNLP_JAVA_OPTIONS. Same elsewhere as applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do - am assuming to keep the already established CONTAINER_CORE_LIMIT / CONTAINER_INITIAL_PERCENT etc. as they are.

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

/cc @gabemontero since he was involved in a lot of this logic.

@gabemontero
Copy link
Contributor

At first blush I like @bparees 's "I wonder if we should pick 32bit java purely based on the container's cgroup limit. (less than a 2gig container? you get 32bit. otherwise you get 64bit and if you want something else you tune it yourself)."

We'll see if that still holds after parsing @jim-minter 's changes in detail and @bparees 's feedback.

@@ -31,7 +31,7 @@ elif [ "${OPENSHIFT_JENKINS_JVM_ARCH}" == "x86_64" ]; then
else
echo "OPENSHIFT_JENKINS_JVM_ARCH is set to ${OPENSHIFT_JENKINS_JVM_ARCH} so using 32 bit Java"
alternatives --set java $JVMPath32bit
export MALLOC_ARENA_MAX=1
export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-1}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an fyi, via commit 8f60691 this stemmed from purely duplicating advice from the JBOSS side of the house.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely see that it could make a difference to export MALLOC_ARENA_MAX; I'm just trying to ensure that things are reasonably overridable.

@jim-minter
Copy link
Contributor Author

We're now going to pick the java arch based on the jnlp JVM's heap. Which means someone who tunes for a small JNLP heap will get 32bit java and then their maven jvms will also be run in 32bit java.

I wonder if we should pick 32bit java purely based on the container's cgroup limit. (less than a 2gig container? you get 32bit. otherwise you get 64bit and if you want something else you tune it yourself).

I think that's a good call (IMO an alternative in the JNLP slave would be entirely removing the 32/64-bit JVM selection).

I think that due to a bug the existing checked in behaviour is effectively: 32-bit if (overridden && !overridden to 64-bit) || (!overridden && cgroup limit set && cgroup limit < 8 GiB). @bparees you want to change that to 2GiB (i.e. 1GiB heap) or 4GiB (i.e. expected 2GiB heap)? This PR should currently be doing the latter, I'm actually in favour of the former.

@jim-minter jim-minter force-pushed the trello142-jenkins-memory-cleanup branch from 3774e40 to 5e94fff Compare December 13, 2017 20:43
@jim-minter
Copy link
Contributor Author

[the above also has the advantage of getting all the OPENSHIFT_JENKINS_JVM_ARCH code in one place, so that it can then go in a separate bash function]

Copy link
Contributor

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variety of historical footnotes, questions, and a few minor request for changes

thanks

DEFAULT_MEMORY_CEILING=$((2**40-1))
if [ "${CONTAINER_MEMORY_IN_BYTES}" -lt "${DEFAULT_MEMORY_CEILING}" ]; then
if [[ -z "${JAVA_TOOL_OPTIONS}" ]]; then
# these options will automatically be picked up by any child JVM process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my education, JAVA_OPTS was a first class env var of sorts, specifically looked for by say maven and/or gradle or some such, and hence we are avoiding overlap, right?

# USE_JAVA_DIAGNOSTICS and JAVA_DIAGNOSTICS are legacy and may be removed in
# a future version of this script.
if [[ "${USE_JAVA_DIAGNOSTICS}" ]]; then
JAVA_DIAGNOSTICS="-XX:NativeMemoryTracking=summary -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UnlockDiagnosticVMOptions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from the comment, how about we echo out that message if those variables were in fact set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

echo "64 bit Java explicitly set in OPENSHIFT_JENKINS_JVM_ARCH"
alternatives --set java $JVMPath64bit
else
elif [[ "${OPENSHIFT_JENKINS_JVM_ARCH}" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't noticed the duplicate logging in the master logs (but maybe I've just always missed it?), but does it make sense if the same fix planned in either this PR a separate PR for the master run script? Or was something different that prevented duplicate logging from occurring there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#423 ;-)
I'll keep a single PR for now but close that issue off when it goes in.

CONTAINER_INITIAL_HEAP=$(echo "${CONTAINER_HEAP_MAX} ${CONTAINER_INITIAL_PERCENT}" | awk '{ printf "%d", $1 * $2 }')
if [[ -z "$JAVA_INITIAL_HEAP_PARAM" ]]; then
JAVA_INITIAL_HEAP_PARAM="-Xms${CONTAINER_INITIAL_HEAP}m"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if I caught the gist of the recent exchanges with Severin correctly (not tagging him on purpose to save him some additional chatter) the gist was that the JVM will do a better job at picking the default when used in conjunction with the honor cgroup flags, right?

if so, a comment explaining all that would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JVM default is Xms=1/64 of Xmx, the 64 is configurable using a JVM flag. Severin made a change to stop us setting Xms==Xmx and set it to 7% instead. In fact I think he could have gone further, I think there's no harm in using the JVM default here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for that ...now, maybe it wasn't clear ... I meant a comment in the code

# child JVMs). -Xmx can be calculated as a percentage, capped to a maximum,
# or specified straight. -Xms can be calculated as a percentage or
# specified straight. For the JNLP slave by default we specify -Xmx of 50%,
# uncapped; -Xms unspecified (JVM default is 1/64 of -Xmx).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I'm reviewing commit by commit ... this comment is probably sufficient wrt my ask for a comment elsewhere, unless @jim-minter feels so inclined

# if the calculated max heap is < 2GiB and OPENSHIFT_JENKINS_JVM_ARCH is not
# set, use 32 bit JVM for space efficiency
if [[ -z $OPENSHIFT_JENKINS_JVM_ARCH ]]; then
if [[ "${CONTAINER_HEAP_MAX}" -gt 2048 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit curious as to how the old logic broke down, but it has been so long, I don't precisely remember the various test scenarios I did in fact run (though I certainly ran something).

Any break down here or off line would be cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    HEAP_LIMIT_FOR_32BIT=$((2**32-1))
    HEAP_LIMIT_FOR_32BIT_IN_MB=$((${HEAP_LIMIT_FOR_32BIT}/1024**2))

means that HEAP_LIMIT_FOR_32BIT_IN_MB comes out at 4095 - this was then tested against the heap size, not the container size.

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

@bparees you want to change that to 2GiB (i.e. 1GiB heap) or 4GiB (i.e. expected 2GiB heap)? This PR should currently be doing the latter, I'm actually in favour of the former.

Why do you prefer the former? Just because you don't want to default people into 32bit jvms except in very small containers?

@jim-minter
Copy link
Contributor Author

Yes.

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

@jim-minter i guess that's fine with me.

@jim-minter
Copy link
Contributor Author

just for my education, JAVA_OPTS was a first class env var of sorts, specifically looked for by say maven and/or gradle or some such, and hence we are avoiding overlap, right?

@gabemontero Yes. Gradle, among others, but not Maven.

@jim-minter jim-minter force-pushed the trello142-jenkins-memory-cleanup branch from 5e94fff to fd4b441 Compare December 13, 2017 23:09
@jim-minter
Copy link
Contributor Author

repushed, should include all changes.

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

this looks like a fine start to me, no further comments on this iteration.

@gabemontero
Copy link
Contributor

same here - let's squash and merge !

I'll start a test run in the interim.

[test]

- use set -x instead of duplicating command line
- allow MALLOC_ARENA_MAX to be overridden if necessary (unlikely)
- use JAVA_TOOL_OPTIONS to pass -XX:+UseCGroupMemoryLimitForHeap to all child
  JVMs (e.g. spawned maven or gradle builds).  Avoid use of JAVA_OPTS name as
  this could conflict with other tools which recognise JAVA_OPTS.
- better document JAVA_GC_OPTS
- indicate deprecation of USE_JAVA_DIAGNOSTICS and JAVA_DIAGNOSTICS
- add JNLP_JAVA_OPTIONS and JNLP_JAVA_OVERRIDES variables to allow users to
  entirely overwrite or append command line arguments to java invocation
- base JVM architecture on container size, not heap size.  Fix logic error
  where *heaps* < 4095MiB were resulting in a 32-bit JVM: fix to *containers*
  < 2GiB to result in a 32-bit JVM
- avoid setting -Xms unless CONTAINER_INITIAL_PERCENT is specified.  Allow
  JAVA_INITIAL_HEAP_PARAM to be overridden
- add JNLP_MAX_HEAP_UPPER_BOUND_MB variable to cap -Xmx; allow override of
  JAVA_MAX_HEAP_PARAM
- clarify calculations and remove intermediate environment variables
- prefer [[ ]] to [ ]
- carry changes through to Jenkins master startup code (2/contrib/s2i/run)
@jim-minter jim-minter force-pushed the trello142-jenkins-memory-cleanup branch from 5b717aa to f809b59 Compare December 14, 2017 21:13
sed -i "s#<image>\(docker.io/\)\{0,1\}openshift/jenkins-slave-maven-centos7\(:.*\)\{0,1\}</image>#<image>docker.io/openshift/jenkins-slave-maven-centos7:${JENKINS_SLAVE_IMAGE_TAG}</image>#" /var/lib/jenkins/config.xml
sed -i "s#<image>\(docker.io/\)\{0,1\}openshift/jenkins-slave-nodejs-centos7\(:.*\)\{0,1\}</image>#<image>docker.io/openshift/jenkins-slave-nodejs-centos7:${JENKINS_SLAVE_IMAGE_TAG}</image>#" /var/lib/jenkins/config.xml
# replace [docker.io/]openshift/jekins-slave-xxxx-centos7[:optionaltag] with [docker.io/]openshift/jenkins-slave-xxxx-centos7:VersionTag
sed -i "s#<image>\(docker.io/\)\{0,1\}openshift/jenkins-slave-maven-centos7\(:.*\)\{0,1\}</image>#<image>\1openshift/jenkins-slave-maven-centos7:${JENKINS_SLAVE_IMAGE_TAG}</image>#" /var/lib/jenkins/config.xml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees ptal - needed this for testing - do you want it in a separate PR?

@jim-minter
Copy link
Contributor Author

[test]

@jim-minter jim-minter force-pushed the trello142-jenkins-memory-cleanup branch from f809b59 to a1050a9 Compare December 14, 2017 21:27
@jim-minter
Copy link
Contributor Author

[test]

@openshift-bot
Copy link
Contributor

Evaluated for jenkins test up to a1050a9

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_jenkins_images/122/) (Base Commit: cd54876) (PR Branch Commit: a1050a9)

@jim-minter jim-minter changed the title [WIP] Jenkins memory cleanup Jenkins memory cleanup Dec 18, 2017
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2017
@jim-minter
Copy link
Contributor Author

I believe this is ready to be merged.

@gabemontero
Copy link
Contributor

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for jenkins merge up to a1050a9

@openshift-bot
Copy link
Contributor

openshift-bot commented Dec 18, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_jenkins_images/70/) (Base Commit: d3830da) (PR Branch Commit: a1050a9) (Image: devenv-rhel7_71)

@openshift-bot openshift-bot merged commit 26a73bd into openshift:master Dec 18, 2017
@jim-minter
Copy link
Contributor Author

Thanks @gabemontero. Do I need to do anything to get new images built and pushed to docker hub?

@gabemontero
Copy link
Contributor

gabemontero commented Dec 19, 2017 via email

@jim-minter
Copy link
Contributor Author

@gabemontero You bet. Thanks and happy holidays!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants