You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by wenxing zheng <we...@gmail.com> on 2017/09/28 06:20:47 UTC

Failure in committing offset due to group rebalance

Dear all,

We are running Flume v1.7.0 with Http Source and HDFS sink in pair, which
are making use of the Kafka as the channel. And we often see the Exception
in the HDFSEventSink with the following exception:

28 Sep 2017 11:52:14,683 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle:550)
>  - Error ILLEGAL_GENERATION occurred while committing offsets for group
> csdn.flume.http.kafka.hdfs
> 28 Sep 2017 11:52:14,684 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process failed
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed due to group rebalance
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
>         at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>         at
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>         at
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>         at
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>         at
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>         at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>         at
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
>         at
> org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
>         at
> org.apache.flume.channel.kafka.KafkaChannel$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
>         at
> org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction.doCommit(KafkaChannel.java:567)
>         at
> org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
>         at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:433)
>         at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>         at java.lang.Thread.run(Thread.java:745)
> 28 Sep 2017 11:52:14,716 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to deliver
> event. Exception follows.
> org.apache.flume.EventDeliveryException:
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
> completed due to group rebalance
>         at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:451)
>         at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
>         at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Commit
> cannot be completed due to group rebalance


Is the problem related with the JIRA ticket:
https://issues.apache.org/jira/browse/KAFKA-3409 and we need to upgrade the
Kafka library to 0.10.0.0?

Appreciated for any advice.
Kind Regards, Wenxing

Re: Failure in committing offset due to group rebalance

Posted by wenxing zheng <we...@gmail.com>.
We are using the Kafka version in the Confluent 3.0.0, so I think it should
be 0.10.0.0-cp1.

We need to get the Flume out of the timeout in order to get back to work
again. Any advice?

On Fri, Sep 29, 2017 at 10:34 PM, Matt Sicker <bo...@gmail.com> wrote:

> What version of Kafka broker are you using? Up until one of the 0.10.x
> releases (forget which), you have to use the same version or earlier of the
> client library from what I remember. Compatibility is getting better from
> 0.11 onward (especially by the 1.0 release), but it's still rather
> confusing.
>
> On 28 September 2017 at 22:49, wenxing zheng <we...@gmail.com>
> wrote:
>
>> by the way, according to https://issues.apache.org/jira/browse/KAFKA-3409 ,
>> we tried to upgrade the client package of kafka to 0.10.0.0, but the
>> confluent failed to startup.
>> It seemed it's an issue in the compatibility.
>>
>> On Fri, Sep 29, 2017 at 11:37 AM, wenxing zheng <we...@gmail.com>
>> wrote:
>>
>>> Thanks to Ferenc.
>>>
>>> We have do various adjustment on those settings. And we found that the
>>> case was due to Saturation of network bandwidth, and no matter what we set,
>>> it will get timeout.
>>> But the problem is after the network restored, Flume will not continue
>>> to work.
>>>
>>> On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fs...@cloudera.com>
>>> wrote:
>>>
>>>> Dear Wenxing,
>>>>
>>>> If I guess correctly you have time periods with very few messages and
>>>> that is when the issue happen.
>>>> If that is the case:
>>>> try to increase
>>>> kafka.consumer.heartbeat.interval.ms
>>>> and
>>>> kafka.consumer.session.timeout.ms
>>>> (session.timeout have to be more than the heartbeat interval)
>>>>
>>>> or lower the
>>>> kafka.consumer.max.partition.fetch.bytes to a little bit more than the
>>>> max size of 1 event.
>>>>
>>>> basically you can tweak kafka settings with
>>>> <channel>.kafka.consumer.*
>>>> and
>>>> <channel>.kafka.producer.*
>>>>
>>>> any setting you find here: http://kafka.apache.org/090/do
>>>> cumentation.html
>>>> can be set with this method.
>>>>
>>>> Let us know if that helped or if some other config modification solved
>>>> the issue.
>>>>
>>>> Best Regards,
>>>> Ferenc Szabo
>>>>
>>>> On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <wenxing.zheng@gmail.com
>>>> > wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> We are running Flume v1.7.0 with Http Source and HDFS sink in pair,
>>>>> which are making use of the Kafka as the channel. And we often see the
>>>>> Exception in the HDFSEventSink with the following exception:
>>>>>
>>>>> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin
>>>>>> ator$OffsetCommitResponseHandler.handle:550)  - Error
>>>>>> ILLEGAL_GENERATION occurred while committing offsets for group
>>>>>> csdn.flume.http.kafka.hdfs
>>>>>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process
>>>>>> failed
>>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>>> cannot be completed due to group rebalance
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>>> r.handle(ConsumerCoordinator.java:552)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>>> r.handle(ConsumerCoordinator.java:493)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>>> .onSuccess(AbstractCoordinator.java:665)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>>> .onSuccess(AbstractCoordinator.java:644)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.RequestFuture.complete(RequestFuture.java:107)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient$RequestFutureCompletionH
>>>>>> andler.onComplete(ConsumerNetworkClient.java:380)
>>>>>>         at org.apache.kafka.clients.Netwo
>>>>>> rkClient.poll(NetworkClient.java:274)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetwo
>>>>>> rkClient.java:320)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:213)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:193)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>>> nt.java:163)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.internals.ConsumerCoordinator.commitOffsetsSync(Consumer
>>>>>> Coordinator.java:358)
>>>>>>         at org.apache.kafka.clients.consu
>>>>>> mer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
>>>>>>         at org.apache.flume.channel.kafka
>>>>>> .KafkaChannel$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
>>>>>>         at org.apache.flume.channel.kafka
>>>>>> .KafkaChannel$KafkaTransaction.doCommit(KafkaChannel.java:567)
>>>>>>         at org.apache.flume.channel.Basic
>>>>>> TransactionSemantics.commit(BasicTransactionSemantics.java:151)
>>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>>> SEventSink.process(HDFSEventSink.java:433)
>>>>>>         at org.apache.flume.sink.DefaultS
>>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>>         at org.apache.flume.SinkRunner$Po
>>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to
>>>>>> deliver event. Exception follows.
>>>>>> org.apache.flume.EventDeliveryException:
>>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>>> cannot be completed due to group rebalance
>>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>>> SEventSink.process(HDFSEventSink.java:451)
>>>>>>         at org.apache.flume.sink.DefaultS
>>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>>         at org.apache.flume.SinkRunner$Po
>>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>>>>>> Commit cannot be completed due to group rebalance
>>>>>
>>>>>
>>>>> Is the problem related with the JIRA ticket:
>>>>> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to
>>>>> upgrade the Kafka library to 0.10.0.0?
>>>>>
>>>>> Appreciated for any advice.
>>>>> Kind Regards, Wenxing
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Matt Sicker <bo...@gmail.com>
>

Re: Failure in committing offset due to group rebalance

Posted by Matt Sicker <bo...@gmail.com>.
What version of Kafka broker are you using? Up until one of the 0.10.x
releases (forget which), you have to use the same version or earlier of the
client library from what I remember. Compatibility is getting better from
0.11 onward (especially by the 1.0 release), but it's still rather
confusing.

On 28 September 2017 at 22:49, wenxing zheng <we...@gmail.com>
wrote:

> by the way, according to https://issues.apache.org/jira/browse/KAFKA-3409 ,
> we tried to upgrade the client package of kafka to 0.10.0.0, but the
> confluent failed to startup.
> It seemed it's an issue in the compatibility.
>
> On Fri, Sep 29, 2017 at 11:37 AM, wenxing zheng <we...@gmail.com>
> wrote:
>
>> Thanks to Ferenc.
>>
>> We have do various adjustment on those settings. And we found that the
>> case was due to Saturation of network bandwidth, and no matter what we set,
>> it will get timeout.
>> But the problem is after the network restored, Flume will not continue to
>> work.
>>
>> On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fs...@cloudera.com>
>> wrote:
>>
>>> Dear Wenxing,
>>>
>>> If I guess correctly you have time periods with very few messages and
>>> that is when the issue happen.
>>> If that is the case:
>>> try to increase
>>> kafka.consumer.heartbeat.interval.ms
>>> and
>>> kafka.consumer.session.timeout.ms
>>> (session.timeout have to be more than the heartbeat interval)
>>>
>>> or lower the
>>> kafka.consumer.max.partition.fetch.bytes to a little bit more than the
>>> max size of 1 event.
>>>
>>> basically you can tweak kafka settings with
>>> <channel>.kafka.consumer.*
>>> and
>>> <channel>.kafka.producer.*
>>>
>>> any setting you find here: http://kafka.apache.org/090/do
>>> cumentation.html
>>> can be set with this method.
>>>
>>> Let us know if that helped or if some other config modification solved
>>> the issue.
>>>
>>> Best Regards,
>>> Ferenc Szabo
>>>
>>> On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <we...@gmail.com>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> We are running Flume v1.7.0 with Http Source and HDFS sink in pair,
>>>> which are making use of the Kafka as the channel. And we often see the
>>>> Exception in the HDFSEventSink with the following exception:
>>>>
>>>> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin
>>>>> ator$OffsetCommitResponseHandler.handle:550)  - Error
>>>>> ILLEGAL_GENERATION occurred while committing offsets for group
>>>>> csdn.flume.http.kafka.hdfs
>>>>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process
>>>>> failed
>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>> cannot be completed due to group rebalance
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>> r.handle(ConsumerCoordinator.java:552)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerCoordinator$OffsetCommitResponseHandle
>>>>> r.handle(ConsumerCoordinator.java:493)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>> .onSuccess(AbstractCoordinator.java:665)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.AbstractCoordinator$CoordinatorResponseHandler
>>>>> .onSuccess(AbstractCoordinator.java:644)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.RequestFuture.complete(RequestFuture.java:107)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerNetworkClient$RequestFutureCompletionH
>>>>> andler.onComplete(ConsumerNetworkClient.java:380)
>>>>>         at org.apache.kafka.clients.Netwo
>>>>> rkClient.poll(NetworkClient.java:274)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetwo
>>>>> rkClient.java:320)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>> nt.java:213)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>> nt.java:193)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClie
>>>>> nt.java:163)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.internals.ConsumerCoordinator.commitOffsetsSync(Consumer
>>>>> Coordinator.java:358)
>>>>>         at org.apache.kafka.clients.consu
>>>>> mer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
>>>>>         at org.apache.flume.channel.kafka
>>>>> .KafkaChannel$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
>>>>>         at org.apache.flume.channel.kafka
>>>>> .KafkaChannel$KafkaTransaction.doCommit(KafkaChannel.java:567)
>>>>>         at org.apache.flume.channel.Basic
>>>>> TransactionSemantics.commit(BasicTransactionSemantics.java:151)
>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>> SEventSink.process(HDFSEventSink.java:433)
>>>>>         at org.apache.flume.sink.DefaultS
>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>         at org.apache.flume.SinkRunner$Po
>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to
>>>>> deliver event. Exception follows.
>>>>> org.apache.flume.EventDeliveryException:
>>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit
>>>>> cannot be completed due to group rebalance
>>>>>         at org.apache.flume.sink.hdfs.HDF
>>>>> SEventSink.process(HDFSEventSink.java:451)
>>>>>         at org.apache.flume.sink.DefaultS
>>>>> inkProcessor.process(DefaultSinkProcessor.java:67)
>>>>>         at org.apache.flume.SinkRunner$Po
>>>>> llingRunner.run(SinkRunner.java:145)
>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>>>>> Commit cannot be completed due to group rebalance
>>>>
>>>>
>>>> Is the problem related with the JIRA ticket:
>>>> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to
>>>> upgrade the Kafka library to 0.10.0.0?
>>>>
>>>> Appreciated for any advice.
>>>> Kind Regards, Wenxing
>>>>
>>>
>>>
>>
>


-- 
Matt Sicker <bo...@gmail.com>

Re: Failure in committing offset due to group rebalance

Posted by wenxing zheng <we...@gmail.com>.
by the way, according to https://issues.apache.org/jira/browse/KAFKA-3409 ,
we tried to upgrade the client package of kafka to 0.10.0.0, but the
confluent failed to startup.
It seemed it's an issue in the compatibility.

On Fri, Sep 29, 2017 at 11:37 AM, wenxing zheng <we...@gmail.com>
wrote:

> Thanks to Ferenc.
>
> We have do various adjustment on those settings. And we found that the
> case was due to Saturation of network bandwidth, and no matter what we set,
> it will get timeout.
> But the problem is after the network restored, Flume will not continue to
> work.
>
> On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fs...@cloudera.com> wrote:
>
>> Dear Wenxing,
>>
>> If I guess correctly you have time periods with very few messages and
>> that is when the issue happen.
>> If that is the case:
>> try to increase
>> kafka.consumer.heartbeat.interval.ms
>> and
>> kafka.consumer.session.timeout.ms
>> (session.timeout have to be more than the heartbeat interval)
>>
>> or lower the
>> kafka.consumer.max.partition.fetch.bytes to a little bit more than the
>> max size of 1 event.
>>
>> basically you can tweak kafka settings with
>> <channel>.kafka.consumer.*
>> and
>> <channel>.kafka.producer.*
>>
>> any setting you find here: http://kafka.apache.org/090/documentation.html
>> can be set with this method.
>>
>> Let us know if that helped or if some other config modification solved
>> the issue.
>>
>> Best Regards,
>> Ferenc Szabo
>>
>> On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <we...@gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> We are running Flume v1.7.0 with Http Source and HDFS sink in pair,
>>> which are making use of the Kafka as the channel. And we often see the
>>> Exception in the HDFSEventSink with the following exception:
>>>
>>> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin
>>>> ator$OffsetCommitResponseHandler.handle:550)  - Error
>>>> ILLEGAL_GENERATION occurred while committing offsets for group
>>>> csdn.flume.http.kafka.hdfs
>>>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process
>>>> failed
>>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
>>>> be completed due to group rebalance
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
>>>>         at org.apache.kafka.clients.consumer.internals.AbstractCoordina
>>>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>>>>         at org.apache.kafka.clients.consumer.internals.AbstractCoordina
>>>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture$1.
>>>> onSuccess(RequestFuture.java:167)
>>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture.fi
>>>> reSuccess(RequestFuture.java:133)
>>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture.co
>>>> mplete(RequestFuture.java:107)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>>> lient$RequestFutureCompletionHandler.onComplete(ConsumerNetw
>>>> orkClient.java:380)
>>>>         at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.ja
>>>> va:274)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>>> lient.clientPoll(ConsumerNetworkClient.java:320)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>>> lient.poll(ConsumerNetworkClient.java:213)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>>> lient.poll(ConsumerNetworkClient.java:193)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>>> lient.poll(ConsumerNetworkClient.java:163)
>>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>>> tor.commitOffsetsSync(ConsumerCoordinator.java:358)
>>>>         at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(K
>>>> afkaConsumer.java:968)
>>>>         at org.apache.flume.channel.kafka.KafkaChannel$ConsumerAndRecor
>>>> ds.commitOffsets(KafkaChannel.java:684)
>>>>         at org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction
>>>> .doCommit(KafkaChannel.java:567)
>>>>         at org.apache.flume.channel.BasicTransactionSemantics.commit(Ba
>>>> sicTransactionSemantics.java:151)
>>>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi
>>>> nk.java:433)
>>>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi
>>>> nkProcessor.java:67)
>>>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav
>>>> a:145)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to
>>>> deliver event. Exception follows.
>>>> org.apache.flume.EventDeliveryException: org.apache.kafka.clients.consumer.CommitFailedException:
>>>> Commit cannot be completed due to group rebalance
>>>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi
>>>> nk.java:451)
>>>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi
>>>> nkProcessor.java:67)
>>>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav
>>>> a:145)
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>>>> Commit cannot be completed due to group rebalance
>>>
>>>
>>> Is the problem related with the JIRA ticket:
>>> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to upgrade
>>> the Kafka library to 0.10.0.0?
>>>
>>> Appreciated for any advice.
>>> Kind Regards, Wenxing
>>>
>>
>>
>

Re: Failure in committing offset due to group rebalance

Posted by wenxing zheng <we...@gmail.com>.
Thanks to Ferenc.

We have do various adjustment on those settings. And we found that the case
was due to Saturation of network bandwidth, and no matter what we set, it
will get timeout.
But the problem is after the network restored, Flume will not continue to
work.

On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <fs...@cloudera.com> wrote:

> Dear Wenxing,
>
> If I guess correctly you have time periods with very few messages and that
> is when the issue happen.
> If that is the case:
> try to increase
> kafka.consumer.heartbeat.interval.ms
> and
> kafka.consumer.session.timeout.ms
> (session.timeout have to be more than the heartbeat interval)
>
> or lower the
> kafka.consumer.max.partition.fetch.bytes to a little bit more than the
> max size of 1 event.
>
> basically you can tweak kafka settings with
> <channel>.kafka.consumer.*
> and
> <channel>.kafka.producer.*
>
> any setting you find here: http://kafka.apache.org/090/documentation.html
> can be set with this method.
>
> Let us know if that helped or if some other config modification solved the
> issue.
>
> Best Regards,
> Ferenc Szabo
>
> On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <we...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> We are running Flume v1.7.0 with Http Source and HDFS sink in pair, which
>> are making use of the Kafka as the channel. And we often see the Exception
>> in the HDFSEventSink with the following exception:
>>
>> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin
>>> ator$OffsetCommitResponseHandler.handle:550)  - Error
>>> ILLEGAL_GENERATION occurred while committing offsets for group
>>> csdn.flume.http.kafka.hdfs
>>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process failed
>>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
>>> be completed due to group rebalance
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
>>>         at org.apache.kafka.clients.consumer.internals.AbstractCoordina
>>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
>>>         at org.apache.kafka.clients.consumer.internals.AbstractCoordina
>>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture$1.
>>> onSuccess(RequestFuture.java:167)
>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture.
>>> fireSuccess(RequestFuture.java:133)
>>>         at org.apache.kafka.clients.consumer.internals.RequestFuture.
>>> complete(RequestFuture.java:107)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>> lient$RequestFutureCompletionHandler.onComplete(ConsumerNetw
>>> orkClient.java:380)
>>>         at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.
>>> java:274)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>> lient.clientPoll(ConsumerNetworkClient.java:320)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>> lient.poll(ConsumerNetworkClient.java:213)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>> lient.poll(ConsumerNetworkClient.java:193)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC
>>> lient.poll(ConsumerNetworkClient.java:163)
>>>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordina
>>> tor.commitOffsetsSync(ConsumerCoordinator.java:358)
>>>         at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(
>>> KafkaConsumer.java:968)
>>>         at org.apache.flume.channel.kafka.KafkaChannel$ConsumerAndRecor
>>> ds.commitOffsets(KafkaChannel.java:684)
>>>         at org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction
>>> .doCommit(KafkaChannel.java:567)
>>>         at org.apache.flume.channel.BasicTransactionSemantics.commit(Ba
>>> sicTransactionSemantics.java:151)
>>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi
>>> nk.java:433)
>>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi
>>> nkProcessor.java:67)
>>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav
>>> a:145)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to
>>> deliver event. Exception follows.
>>> org.apache.flume.EventDeliveryException: org.apache.kafka.clients.consumer.CommitFailedException:
>>> Commit cannot be completed due to group rebalance
>>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi
>>> nk.java:451)
>>>         at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi
>>> nkProcessor.java:67)
>>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav
>>> a:145)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>>> Commit cannot be completed due to group rebalance
>>
>>
>> Is the problem related with the JIRA ticket:
>> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to upgrade
>> the Kafka library to 0.10.0.0?
>>
>> Appreciated for any advice.
>> Kind Regards, Wenxing
>>
>
>

Re: Failure in committing offset due to group rebalance

Posted by Ferenc Szabo <fs...@cloudera.com>.
Dear Wenxing,

If I guess correctly you have time periods with very few messages and that
is when the issue happen.
If that is the case:
try to increase
kafka.consumer.heartbeat.interval.ms
and
kafka.consumer.session.timeout.ms
(session.timeout have to be more than the heartbeat interval)

or lower the
kafka.consumer.max.partition.fetch.bytes to a little bit more than the max
size of 1 event.

basically you can tweak kafka settings with
<channel>.kafka.consumer.*
and
<channel>.kafka.producer.*

any setting you find here: http://kafka.apache.org/090/documentation.html
can be set with this method.

Let us know if that helped or if some other config modification solved the
issue.

Best Regards,
Ferenc Szabo

On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <we...@gmail.com>
wrote:

> Dear all,
>
> We are running Flume v1.7.0 with Http Source and HDFS sink in pair, which
> are making use of the Kafka as the channel. And we often see the Exception
> in the HDFSEventSink with the following exception:
>
> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$
>> OffsetCommitResponseHandler.handle:550)  - Error ILLEGAL_GENERATION
>> occurred while committing offsets for group csdn.flume.http.kafka.hdfs
>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447)  - process failed
>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
>> be completed due to group rebalance
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerCoordinator$OffsetCommitResponseHandler.
>> handle(ConsumerCoordinator.java:552)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerCoordinator$OffsetCommitResponseHandler.
>> handle(ConsumerCoordinator.java:493)
>>         at org.apache.kafka.clients.consumer.internals.
>> AbstractCoordinator$CoordinatorResponseHandler.
>> onSuccess(AbstractCoordinator.java:665)
>>         at org.apache.kafka.clients.consumer.internals.
>> AbstractCoordinator$CoordinatorResponseHandler.
>> onSuccess(AbstractCoordinator.java:644)
>>         at org.apache.kafka.clients.consumer.internals.
>> RequestFuture$1.onSuccess(RequestFuture.java:167)
>>         at org.apache.kafka.clients.consumer.internals.
>> RequestFuture.fireSuccess(RequestFuture.java:133)
>>         at org.apache.kafka.clients.consumer.internals.
>> RequestFuture.complete(RequestFuture.java:107)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(
>> ConsumerNetworkClient.java:380)
>>         at org.apache.kafka.clients.NetworkClient.poll(
>> NetworkClient.java:274)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>>         at org.apache.kafka.clients.consumer.internals.
>> ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
>>         at org.apache.kafka.clients.consumer.KafkaConsumer.
>> commitSync(KafkaConsumer.java:968)
>>         at org.apache.flume.channel.kafka.KafkaChannel$
>> ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
>>         at org.apache.flume.channel.kafka.KafkaChannel$
>> KafkaTransaction.doCommit(KafkaChannel.java:567)
>>         at org.apache.flume.channel.BasicTransactionSemantics.commit(
>> BasicTransactionSemantics.java:151)
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
>> HDFSEventSink.java:433)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(
>> DefaultSinkProcessor.java:67)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
>> java:145)
>>         at java.lang.Thread.run(Thread.java:745)
>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to deliver
>> event. Exception follows.
>> org.apache.flume.EventDeliveryException: org.apache.kafka.clients.
>> consumer.CommitFailedException: Commit cannot be completed due to group
>> rebalance
>>         at org.apache.flume.sink.hdfs.HDFSEventSink.process(
>> HDFSEventSink.java:451)
>>         at org.apache.flume.sink.DefaultSinkProcessor.process(
>> DefaultSinkProcessor.java:67)
>>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.
>> java:145)
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException:
>> Commit cannot be completed due to group rebalance
>
>
> Is the problem related with the JIRA ticket: https://issues.apache.org/
> jira/browse/KAFKA-3409 and we need to upgrade the Kafka library to
> 0.10.0.0?
>
> Appreciated for any advice.
> Kind Regards, Wenxing
>