You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Sameer Kumar <sa...@gmail.com> on 2017/05/03 13:04:33 UTC

Kafka Streams Failed to rebalance error

Hi,



I ran two nodes in my streams compute cluster, they were running fine for
few hours before outputting with failure to rebalance errors.


I couldnt understand why this happened but I saw one strange behaviour...

at 16:53 on node1, I saw "Failed to lock the state directory" error, this
might have caused the partitions to relocate and hence the error.



I am attaching detailed logs for both the nodes, please see if you can help.



Some of the logs for quick reference are these.



2017-05-03 16:53:53 ERROR Kafka10Base:44 - Exception caught in thread
StreamThread-2

org.apache.kafka.streams.errors.StreamsException: stream-thread
[StreamThread-2] Failed to rebalance

                at org.apache.kafka.streams.processor.internals.
StreamThread.runLoop(StreamThread.java:612)

                at org.apache.kafka.streams.processor.internals.
StreamThread.run(StreamThread.java:368)

Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread
[StreamThread-2] failed to suspend stream tasks

                at org.apache.kafka.streams.processor.internals.
StreamThread.suspendTasksAndState(StreamThread.java:488)

                at org.apache.kafka.streams.processor.internals.
StreamThread.access$1200(StreamThread.java:69)

                at org.apache.kafka.streams.processor.internals.
StreamThread$1.onPartitionsRevoked(StreamThread.java:259)

                at org.apache.kafka.clients.consumer.internals.
ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:396)

                at org.apache.kafka.clients.consumer.internals.
AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:329)

                at org.apache.kafka.clients.consumer.internals.
AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:303)

                at org.apache.kafka.clients.consumer.internals.
ConsumerCoordinator.poll(ConsumerCoordinator.java:286)

                at org.apache.kafka.clients.consumer.KafkaConsumer.
pollOnce(KafkaConsumer.java:1030)

                at org.apache.kafka.clients.consumer.KafkaConsumer.poll(
KafkaConsumer.java:995)

                at org.apache.kafka.streams.processor.internals.
StreamThread.runLoop(StreamThread.java:582)

                ... 1 more

Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit
cannot be completed since the group has already rebalanced and assigned the
partitions to another member. This means that the time between subsequent
calls to poll() was longer than the configured max.poll.interval.ms, which
typically implies that the poll loop is spending too much time message
processing. You can address this either by increasing the session timeout
or by reducing the maximum size of batches returned in poll() with
max.poll.records.

                at org.apache.kafka.clients.consumer.internals.
ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:698)

                at org.apache.kafka.clients.consumer.internals.
ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:577)

                at org.apache.kafka.clients.consumer.KafkaConsumer.
commitSync(KafkaConsumer.java:1125)

                at org.apache.kafka.streams.processor.internals.
StreamTask.commitOffsets(StreamTask.java:296)

                at org.apache.kafka.streams.processor.internals.
StreamThread$3.apply(StreamThread.java:535)

                at org.apache.kafka.streams.processor.internals.
StreamThread.performOnAllTasks(StreamThread.java:503)

                at org.apache.kafka.streams.processor.internals.
StreamThread.commitOffsets(StreamThread.java:531)

                at org.apache.kafka.streams.processor.internals.
StreamThread.suspendTasksAndState(StreamThread.java:480)

                ... 10 more



2017-05-03 16:53:57 WARN  StreamThread:1184 - Could not create task 1_38.
Will retry.

org.apache.kafka.streams.errors.LockException: task [1_38] Failed to lock
the state directory: /data/streampoc/LIC2-5/1_38

                at org.apache.kafka.streams.processor.internals.
ProcessorStateManager.<init>(ProcessorStateManager.java:102)

                at org.apache.kafka.streams.processor.internals.
AbstractTask.<init>(AbstractTask.java:73)

                at org.apache.kafka.streams.processor.internals.
StreamTask.<init>(StreamTask.java:108)

                at org.apache.kafka.streams.processor.internals.
StreamThread.createStreamTask(StreamThread.java:834)

                at org.apache.kafka.streams.processor.internals.
StreamThread$TaskCreator.createTask(StreamThread.java:1207)

                at org.apache.kafka.streams.processor.internals.
StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:1180)

                at org.apache.kafka.streams.processor.internals.
StreamThread.addStreamTasks(StreamThread.java:937)

                at org.apache.kafka.streams.processor.internals.
StreamThread.access$500(StreamThread.java:69)

                at org.apache.kafka.streams.processor.internals.
StreamThread$1.onPartitionsAssigned(StreamThread.java:236)


Regards,

-Sameer.