Replies: 15 comments 1 reply
-
You probably scaled the Kafka cluster down in the past with an older Strimzi version and Kafka still has this node registered but invisible because of missing APIs. This is not a Strimzi bug but a Kafka KRaft limitation. It should be addressed only in Kafka 4.0. You have to work around it manually by unregistering the node using the Kafka Admin API. |
Beta Was this translation helpful? Give feedback.
-
@scholzj Thanks, that could be it. It seems the command line tools to list and unregister nodes aren't available with 3.9.0, is this correct? |
Beta Was this translation helpful? Give feedback.
-
There was no command line tool for it. But I'm not sure I ever checked in Kafka 3.9 - maybe someone added it there in that version. You could also try to scale-up (add the node reported in the error message) and scale it down again. New Strimzi versions try to workaround this Kafka limitation and should unregister the node. But if it was controller, the scaling is tricky as that is another unsupported thing :-/. You can also try to add it to the |
Beta Was this translation helpful? Give feedback.
-
Triaged on 23.1.2025: it seems the |
Beta Was this translation helpful? Give feedback.
-
Not sure I did the right thing, but it complains that the "given broker ID was not registered" (tried with id >= 3).
|
Beta Was this translation helpful? Give feedback.
-
How did you get it was the right broker id when you scaled down? |
Beta Was this translation helpful? Give feedback.
-
0, 1, 2 are the current broker/controller combos, and at some point there was 0, 1, 2 as brokers and 3, 4, 5 that were controller only IIRC. At least I assumed that those were the ids needed for kafka-cluster.sh unregister. |
Beta Was this translation helpful? Give feedback.
-
So you scaled down controllers which is something not really supported by KRaft right now. The quorum is static and dynamic quorum (with controllers to be scaled down), will come with Kafka 4.x. So I guess this is the reason why the unregister doesn't work, because it's for brokers.
Also can you describe the steps you did to go from brokers 0,1,2 (in one nodepool) and controllers 3,4,5 (in another nodepool) to brokers/controllers 0,1,2? I could try to replicate what you had. |
Beta Was this translation helpful? Give feedback.
-
@ppatierno If you mean "You can also try to add it to the .status.registeredNodeIds list in the Kafka CR with kubectl edit kafka my-cluster --subresource=status to trigger the unregistration." - I did try that (by adding 3,4,5 to .status.registeredNodeIds which contained 0,1,2). It had no effect and only 0,1,2 remained. It was a little while ago, but I recall doing this:
|
Beta Was this translation helpful? Give feedback.
-
Hey. We have the same issue.
For the Strimzi operator 0.44
Creating the 2nd cluster and migrating data there with KafkaMirrorMaker is more straightforward withswitch Kubernetes services to the new location. WDYT? |
Beta Was this translation helpful? Give feedback.
-
Not an option of what? You should probably share the full command and full output.
Not sure if this is related as this is really a Kafka business. But you should probably run it against your own listener and not against 9090, expecially with older Kafka versions where the 9090 port is mostly unresponsive to most commands. |
Beta Was this translation helpful? Give feedback.
-
The Strimzi controllers have only three ports. Am I correct that we must connect to --subresource=status is not an option because the operator doesn't provide meaningful information regarding the lost broker node. The cluster with
|
Beta Was this translation helpful? Give feedback.
-
For the note and other readers. we have
It is tough to make mistakes in
|
Beta Was this translation helpful? Give feedback.
-
Regarding the suggestion to connect to brokers instead of controllers 🤷
We may need to wait for Kafka 4, but mirroring the data to the new cluster is the simplest way for now, as I see. P.S. I tried to unregister ID:6 because we don't have it in our cluster, to be sure. The outcome is the same. |
Beta Was this translation helpful? Give feedback.
-
We fixed the issue. 🎉 We scaled the controller pool (not brokers) to 7 pods and then scaled the back to 3. If you see that controllers go crazy and cannot select the leader in 10-15 minutes, kill/restart them. |
Beta Was this translation helpful? Give feedback.
-
Bug Description
I have upgraded operator from 0.44.0 to 0.45.0 and then edited the Kafka CRD to change spec.kafka.version from 3.8.0 to 3.9.0. The pods recreated with new image but the upgrade did not complete. Now the cluster got this status:
I tried manual upgrade:
Steps to reproduce
No response
Expected behavior
No response
Strimzi version
0.45.0
Kubernetes version
1.27.11
Installation method
Helm
Infrastructure
Bare-metal
Configuration files and logs
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions