Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

origin:v3.7.2 origin-master-controllers HPA works for deployments but not deploymentconfigs #19045

Closed
mshutt opened this issue Mar 21, 2018 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/pod

Comments

@mshutt
Copy link

mshutt commented Mar 21, 2018

HPA no longer functions correctly for DeploymentConfigs, still works for Deployments

Version

oc version

oc v3.7.2+26304a3-2
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server [redacted]
openshift v3.7.2+26304a3-2
kubernetes v1.7.6+a08f5eeb62

Steps To Reproduce
  1. Create HPA against a DeploymentConfig
# oc autoscale dc/friendlyhello --min 1 --max 10 --cpu-percent=5
  1. Monitor HPA Status with oc describe hpa/friendlyhello
Current Result
# oc describe hpa/friendlyhello
Name:							friendlyhello
Namespace:						test-dockerfile-build
Labels:							<none>
Annotations:						<none>
CreationTimestamp:					Tue, 20 Mar 2018 23:42:28 +0000
Reference:						DeploymentConfig/friendlyhello
Metrics:						( current / target )
  resource cpu on pods  (as a percentage of request):	<unknown> / 5%
Min replicas:						2
Max replicas:						10
Conditions:
  Type		Status	Reason		Message
  ----		------	------		-------
  AbleToScale	False	FailedGetScale	the HPA controller was unable to get the target's current scale: no kind "Scale" is registered for version "extensions/v1beta1"
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  18m		18m		2	horizontal-pod-autoscaler		Warning		FailedGetScale	no kind "Scale" is registered for version "extensions/v1beta1"
  16m		1m		31	horizontal-pod-autoscaler		Warning		FailedGetScale	no kind "Scale" is registered for version "extensions/v1beta1"
Expected Result
# oc describe hpa/friendlyhello
Name:							friendlyhello
Namespace:						test-dockerfile-build
Labels:							<none>
Annotations:						<none>
CreationTimestamp:					Fri, 26 Jan 2018 22:30:28 +0000
Reference:						DeploymentConfig/friendlyhello
Metrics:						( current / target )
  resource cpu on pods  (as a percentage of request):	0% (0) / 5%
Min replicas:						1
Max replicas:						10
Conditions:
  Type			Status	Reason			Message
  ----			------	------			-------
  AbleToScale		True	ReadyForNewScale	the last scale time was sufficiently old as to warrant a new scale
  ScalingActive		True	ValidMetricFound	the HPA was able to succesfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited	True	TooFewReplicas		the desired replica count was zero
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  5m		5m		1	horizontal-pod-autoscaler		Normal		SuccessfulRescale	New size: 4; reason: cpu resource utilization (percentage of request) above target
Additional Information

This works correctly for Deployments

Here is the output of LOGLEVEL=6 from origin-master-controllers while it is running the HPA:

Mar 20 23:42:28 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:28.299323       1 graph_builder.go:475] GraphBuilder process object: autoscaling/v1/HorizontalPodAutoscaler, namespace test-dockerfile-build, name friendlyhello, uid 597eecbb-2c98-11e8-8ef4-40a8f02674ac, event type add
<SNIP>
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:58.302131       1 round_trippers.go:405] GET https://r1009.assets.rivet.example.net/apis/apps.openshift.io/v1/namespaces/test-dockerfile-build/deploymentconfigs/friendlyhello/scale 200 OK in 2 milliseconds
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:58.306473       1 round_trippers.go:405] POST https://r1009.assets.rivet.example.net/api/v1/namespaces/test-dockerfile-build/events 201 Created in 3 milliseconds
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:58.306631       1 round_trippers.go:405] PUT https://r1009.assets.rivet.example.net/apis/autoscaling/v1/namespaces/test-dockerfile-build/horizontalpodautoscalers/friendlyhello/status 200 OK in 4 milliseconds
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:58.306638       1 graph_builder.go:475] GraphBuilder process object: autoscaling/v1/HorizontalPodAutoscaler, namespace test-dockerfile-build, name friendlyhello, uid 597eecbb-2c98-11e8-8ef4-40a8f02674ac, event type update
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: I0320 23:42:58.306728       1 horizontal.go:633] Successfully updated status for friendlyhello
Mar 20 23:42:58 r1009.assets.rivet.example.net origin-master-controllers[5957]: E0320 23:42:58.306840       1 horizontal.go:206] failed to query scale subresource for DeploymentConfig/test-dockerfile-build/friendlyhello: no kind "Scale" is registered for version "extensions/v1beta1"

Here is a raw query of the scale resource the deploymentconfig:

# oc get --raw https://localhost/apis/apps/v1beta1/namespaces/test-dockerfile-build/deployments/busybox/scale | jq .
{
  "kind": "Scale",
  "apiVersion": "apps/v1beta1",
<SNIP>

And here is a raw query of the scale resource for the deployment:

# oc get --raw https://r1009.assets.rivet.example.net/apis/apps.openshift.io/v1/namespaces/test-dockerfile-build/deploymentconfigs/friendlyhello/scale | jq .
{
  "kind": "Scale",
  "apiVersion": "extensions/v1beta1",

This seems somehow inversely related to:

https://bugzilla.redhat.com/show_bug.cgi?id=1549873
#17517

@liggitt - Any thoughts? I am not a go developer (yet), but I'll help any way that I can!

/label bug
/label question

@mshutt
Copy link
Author

mshutt commented Mar 26, 2018

Heya!

Not sure if this is related, but:

# oc version
oc v3.7.2+5eda3fa-5
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://[redacted]:443
openshift v3.7.2+5eda3fa-5
kubernetes v1.7.6+a08f5eeb62
# oc get --raw https://localhost/apis/extensions/v1beta1 | jq . | egrep scale
      "name": "deployments/scale",
      "name": "replicasets/scale",
      "name": "replicationcontrollers/scale",
# oc get --raw https://localhost/apis/apps/v1beta1 | jq . | egrep scale
      "name": "deployments/scale",
# oc get --raw https://localhost/apis/apps.openshift.io/v1 | jq . | egrep scale
      "name": "deploymentconfigs/scale",
# oc get --raw https://localhost/apis/apps.openshift.io/v1/namespaces/test-dockerfile-build/deploymentconfigs/friendlyhello/scale | jq .  | egrep apiVersion
  "apiVersion": "extensions/v1beta1",

Thoughts?

@mshutt
Copy link
Author

mshutt commented Mar 26, 2018

I've also tried modifying the hpa spec to change the scaleTargetRef 'apiVersion' to extensions/v1beta1 to no avail. Also, I've tried to change the scaleTargetRef apiVersion to apps/v1beta1 to no avail. Finally, I tried to change scaleTargetRef to apps.openshift.io/v1 as it was in 3.7.0 and as you can guess, we get the same error.

@mshutt
Copy link
Author

mshutt commented Mar 26, 2018

This is what oc autoscale creates:

    scaleTargetRef:
      apiVersion: v1
      kind: DeploymentConfig
      name: friendlyhello
    targetCPUUtilizationPercentage: 5

@jwforres
Copy link
Member

@openshift/sig-pod

@jwforres jwforres added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2018
@jwforres
Copy link
Member

i know we already resolved a number of bugs related to this, might be fixed in master already

@DirectXMan12
Copy link
Contributor

Yes, please double-check in master, it should be fixed there.

@davidaah
Copy link

davidaah commented Mar 27, 2018

@DirectXMan12 @jwforres could you clarify the fix in master you are referring to? Unfortunately the similar change applied to master (#17587) was also some time ago so much of the code has been significantly refactored (especially in apiserver/apiserver.go)

as best i can tell, the change from 3.7.0 (where we had working HPA) is related to changing what was returned by the deploymentconfigs/scale subresource introduced in 3.7.1/3.7.2 (#17517) but may need some clarification if possible from @liggitt .

@sjenning sjenning removed their assignment Apr 12, 2018
@sjenning
Copy link
Contributor

@mshutt could you verify this is fixed in 3.9?

@mshutt
Copy link
Author

mshutt commented Apr 13, 2018

@sjenning We'll be doing the 3.9 upgrade in our lab as soon as is possible. Hoping for early next week. We're all containerized and I saw that another user tripped over the etcd upgrade issue with Origin vs. the paid bits (which I'd reported previously working around by setting the var openshift_etcd_upgrade: false and then re-running byo/config.yml after doing the upgrade plays openshift/openshift-ansible#6931)

@liggitt
Copy link
Contributor

liggitt commented Apr 19, 2018

This was related to HPA using a faulty client. It is resolved in 3.9. A fix for this in 3.7 is in #19437 but I don't know if there are more 3.7 releases planned

@mshutt
Copy link
Author

mshutt commented May 3, 2018

@sjenning @liggitt 3.9 upgrade indeed has fixed this. Thank you all for your tireless efforts!

@liggitt
Copy link
Contributor

liggitt commented May 24, 2018

fixed in the release-3.7 branch in #19437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/pod
Projects
None yet
Development

No branches or pull requests

7 participants