You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Han JU <ju...@gmail.com> on 2016/01/22 16:13:43 UTC

Stuck consumer with new consumer API in 0.9

Hi,

I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
interested in the `ConsumerRebalanceListener`.

My test setup is like the following:
  - 5M messages pre-loaded in one node kafka 0.9
  - 12 partitions, auto offset commit set to false
  - in `onPartitionsRevoked`, commit offset and flush the local state

The test run is like the following:
  - launch one process with 2 consumers and let it consume for a while
  - launch another process with 2 consumers, this triggers a rebalancing,
and let these 2 processes run until messages are all consumed

The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd

So at first, the 2 consumers of the first process each got 6 partitions.
And after the rebalancing, each consumer got 3 partitions. It's confirmed
by logging inside the `onPartitionAssigned` callback.

But after the rebalancing, one of the 2 consumers of the first process stop
receiving messages, even if it has partitions assigned to:

balance-1 pulled 7237 msgs ...
balance-0 pulled 7263 msgs ...
2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
group is rebalancing, try to re-join group.
balance-1 flush @ 536637
balance-1 committed offset for List(balance-11, balance-10, balance-9,
balance-8, balance-7, balance-6)
2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
group is rebalancing, try to re-join group.
balance-0 flush @ 543845
balance-0 committed offset for List(balance-5, balance-4, balance-3,
balance-2, balance-1, balance-0)
balance-0 got assigned List(balance-5, balance-4, balance-3)
balance-1 got assigned List(balance-11, balance-10, balance-9)
balance-1 pulled 3625 msgs ...
balance-0 pulled 3621 msgs ...
balance-0 pulled 3631 msgs ...
balance-0 pulled 3631 msgs ...
balance-1 pulled 0 msgs ...
balance-0 pulled 3643 msgs ...
balance-0 pulled 3643 msgs ...
balance-1 pulled 0 msgs ...
balance-0 pulled 3622 msgs ...
balance-0 pulled 3632 msgs ...
balance-1 pulled 0 msgs ...
balance-0 pulled 3637 msgs ...
balance-0 pulled 3641 msgs ...
balance-0 pulled 3640 msgs ...
balance-1 pulled 0 msgs ...
balance-0 pulled 3632 msgs ...
balance-0 pulled 3630 msgs ...
balance-1 pulled 0 msgs ...
......

`balance-0` and `balance-1` are the names of the consumer thread. So after
the rebalancing, thread `balance-1` continues to poll but no message
arrive, given that it has got 3 partitions assigned to after the
rebalancing.

Finally other 3 consumers pulls all their partitions' message, the
situation is like

GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1

So you can see, partition [6, 7, 8] still has messages, but the consumer
can't pull them after the rebalancing.

I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
and client.

I hope the code is clear enough to illustrate/reproduce the problem. It's
quite a surprise for me because this is the main feature of the new
consumer API, but it does not seem to work properly.
Feel free to talk to me for any details.
-- 
*JU Han*

Software Engineer @ Teads.tv

+33 0619608888

Re: Stuck consumer with new consumer API in 0.9

Posted by Bruno Rassaerts <br...@novazone.be>.
We do not seek in the onPartitionAssigned.

In our test setup (evaluating kafka for a new project) we put a constant load on one of the topics.
We have a consumer group pulling messages from the different partitions on the topic.

At a certain point in time, the poll() does not return any messages anymore.
When this happens, we just seek() to the current offset, and messages come in again.

It is a bit annoying as we do seeks when there are no messages as well, but at least it prevents stalling the client.

Bruno


> On 25 Jan 2016, at 16:07, Han JU <ju...@gmail.com> wrote:
> 
> Hi Bruno,
> 
> Can you tell me a little bit more about that? A seek() in the
> `onPartitionAssigned`?
> 
> Thanks.
> 
> 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> 
>> Ok I'll create a JIRA issue on this.
>> 
>> Thanks!
>> 
>> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <br...@novazone.be>:
>> 
>>> +1 here
>>> 
>>> As a workaround we seek to the current offset which resets the current
>>> clients internal states and everything continues.
>>> 
>>> Regards,
>>> Bruno Rassaerts | Freelance Java Developer
>>> 
>>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>>> bruno.rassaerts@novazone.be -www.novazone.be
>>> 
>>>> On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Can you please file an issue in JIRA so that we make sure this is
>>>> investigated?
>>>> 
>>>> Ismael
>>>> 
>>>>> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>>> particularly
>>>>> interested in the `ConsumerRebalanceListener`.
>>>>> 
>>>>> My test setup is like the following:
>>>>> - 5M messages pre-loaded in one node kafka 0.9
>>>>> - 12 partitions, auto offset commit set to false
>>>>> - in `onPartitionsRevoked`, commit offset and flush the local state
>>>>> 
>>>>> The test run is like the following:
>>>>> - launch one process with 2 consumers and let it consume for a while
>>>>> - launch another process with 2 consumers, this triggers a
>>> rebalancing,
>>>>> and let these 2 processes run until messages are all consumed
>>>>> 
>>>>> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>>>>> 
>>>>> So at first, the 2 consumers of the first process each got 6
>>> partitions.
>>>>> And after the rebalancing, each consumer got 3 partitions. It's
>>> confirmed
>>>>> by logging inside the `onPartitionAssigned` callback.
>>>>> 
>>>>> But after the rebalancing, one of the 2 consumers of the first process
>>> stop
>>>>> receiving messages, even if it has partitions assigned to:
>>>>> 
>>>>> balance-1 pulled 7237 msgs ...
>>>>> balance-0 pulled 7263 msgs ...
>>>>> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>>>>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>>>> group is rebalancing, try to re-join group.
>>>>> balance-1 flush @ 536637
>>>>> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>>>>> balance-8, balance-7, balance-6)
>>>>> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>>>>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>>>> group is rebalancing, try to re-join group.
>>>>> balance-0 flush @ 543845
>>>>> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>>>>> balance-2, balance-1, balance-0)
>>>>> balance-0 got assigned List(balance-5, balance-4, balance-3)
>>>>> balance-1 got assigned List(balance-11, balance-10, balance-9)
>>>>> balance-1 pulled 3625 msgs ...
>>>>> balance-0 pulled 3621 msgs ...
>>>>> balance-0 pulled 3631 msgs ...
>>>>> balance-0 pulled 3631 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> balance-0 pulled 3643 msgs ...
>>>>> balance-0 pulled 3643 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> balance-0 pulled 3622 msgs ...
>>>>> balance-0 pulled 3632 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> balance-0 pulled 3637 msgs ...
>>>>> balance-0 pulled 3641 msgs ...
>>>>> balance-0 pulled 3640 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> balance-0 pulled 3632 msgs ...
>>>>> balance-0 pulled 3630 msgs ...
>>>>> balance-1 pulled 0 msgs ...
>>>>> ......
>>>>> 
>>>>> `balance-0` and `balance-1` are the names of the consumer thread. So
>>> after
>>>>> the rebalancing, thread `balance-1` continues to poll but no message
>>>>> arrive, given that it has got 3 partitions assigned to after the
>>>>> rebalancing.
>>>>> 
>>>>> Finally other 3 consumers pulls all their partitions' message, the
>>>>> situation is like
>>>>> 
>>>>> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
>>>>> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
>>>>> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
>>>>> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
>>>>> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
>>> 127.0.0.1
>>>>> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
>>> 127.0.0.1
>>>>> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
>>> 127.0.0.1
>>>>> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
>>>>> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
>>>>> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
>>>>> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
>>>>> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
>>>>> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>>>>> 
>>>>> So you can see, partition [6, 7, 8] still has messages, but the
>>> consumer
>>>>> can't pull them after the rebalancing.
>>>>> 
>>>>> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
>>> server/broker
>>>>> and client.
>>>>> 
>>>>> I hope the code is clear enough to illustrate/reproduce the problem.
>>> It's
>>>>> quite a surprise for me because this is the main feature of the new
>>>>> consumer API, but it does not seem to work properly.
>>>>> Feel free to talk to me for any details.
>>>>> --
>>>>> *JU Han*
>>>>> 
>>>>> Software Engineer @ Teads.tv
>>>>> 
>>>>> +33 0619608888
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> *JU Han*
>> 
>> Software Engineer @ Teads.tv
>> 
>> +33 0619608888
>> 
> 
> 
> 
> -- 
> *JU Han*
> 
> Software Engineer @ Teads.tv
> 
> +33 0619608888


Re: Stuck consumer with new consumer API in 0.9

Posted by Guozhang Wang <wa...@gmail.com>.
Han,

>From your logs it seems the thread which cannot fetch more data is
rebalance-1, which is assigned with partitions [balance-11, balance-10,
balance-9];

>From your consumer-group command the partitions that is lagging are [balance-6,
balance-7, balance-8] which is not assigned to this process, and [11, 10,
9] are all caught up.

So I'm a little confused here, since it seems reasonable for rebalance-1 to
fetch no data as all has been consumed, and [6,7,8] are indeed lagging but
not clear why as we cannot see the logs from the other process.

BTW, the bug that Jason mentioned is KAFKA-2978
<https://issues.apache.org/jira/browse/KAFKA-2978>, and is already merged
into 0.9.0 branch, so if you still see this in 0.9.0 branch, it should not
be the cause of your issue.

Guozhang


On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Apologies for the late arrival to this thread. There was a bug in the
> 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> from a partition after a rebalance. If you're seeing this, please checkout
> the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
> can, then it would be really helpful if you file a JIRA with the steps to
> reproduce.
>
> From Han's initial example, it kind of looks like the problem might be in
> the usage. The consumer lag as shown by the kafka-consumer-groups script
> relies on the last committed position to determine lag. To update progress,
> you need to commit offsets regularly. In the gist, offsets are only
> committed on shutdown or when a rebalance occurs. When the group is stable,
> no progress will be seen because there are no commits to update the
> position.
>
> Thanks,
> Jason
>
> On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > Thanks!
> >
> > Ismael
> >
> > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com> wrote:
> >
> > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > >
> > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > >
> > > > Hi Bruno,
> > > >
> > > > Can you tell me a little bit more about that? A seek() in the
> > > > `onPartitionAssigned`?
> > > >
> > > > Thanks.
> > > >
> > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > >
> > > >> Ok I'll create a JIRA issue on this.
> > > >>
> > > >> Thanks!
> > > >>
> > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > bruno.rassaerts@novazone.be
> > > >:
> > > >>
> > > >>> +1 here
> > > >>>
> > > >>> As a workaround we seek to the current offset which resets the
> > current
> > > >>> clients internal states and everything continues.
> > > >>>
> > > >>> Regards,
> > > >>> Bruno Rassaerts | Freelance Java Developer
> > > >>>
> > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > >>>
> > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> > > >>> >
> > > >>> > Hi,
> > > >>> >
> > > >>> > Can you please file an issue in JIRA so that we make sure this is
> > > >>> > investigated?
> > > >>> >
> > > >>> > Ismael
> > > >>> >
> > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.felix@gmail.com
> >
> > > >>> wrote:
> > > >>> >>
> > > >>> >> Hi,
> > > >>> >>
> > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > >>> particularly
> > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > >>> >>
> > > >>> >> My test setup is like the following:
> > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > >>> >>  - 12 partitions, auto offset commit set to false
> > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > state
> > > >>> >>
> > > >>> >> The test run is like the following:
> > > >>> >>  - launch one process with 2 consumers and let it consume for a
> > > while
> > > >>> >>  - launch another process with 2 consumers, this triggers a
> > > >>> rebalancing,
> > > >>> >> and let these 2 processes run until messages are all consumed
> > > >>> >>
> > > >>> >> The code is here:
> > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > >>> >>
> > > >>> >> So at first, the 2 consumers of the first process each got 6
> > > >>> partitions.
> > > >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> > > >>> confirmed
> > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > >>> >>
> > > >>> >> But after the rebalancing, one of the 2 consumers of the first
> > > >>> process stop
> > > >>> >> receiving messages, even if it has partitions assigned to:
> > > >>> >>
> > > >>> >> balance-1 pulled 7237 msgs ...
> > > >>> >> balance-0 pulled 7263 msgs ...
> > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > since
> > > >>> the
> > > >>> >> group is rebalancing, try to re-join group.
> > > >>> >> balance-1 flush @ 536637
> > > >>> >> balance-1 committed offset for List(balance-11, balance-10,
> > > balance-9,
> > > >>> >> balance-8, balance-7, balance-6)
> > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > since
> > > >>> the
> > > >>> >> group is rebalancing, try to re-join group.
> > > >>> >> balance-0 flush @ 543845
> > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > balance-3,
> > > >>> >> balance-2, balance-1, balance-0)
> > > >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> > > >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> > > >>> >> balance-1 pulled 3625 msgs ...
> > > >>> >> balance-0 pulled 3621 msgs ...
> > > >>> >> balance-0 pulled 3631 msgs ...
> > > >>> >> balance-0 pulled 3631 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3643 msgs ...
> > > >>> >> balance-0 pulled 3643 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3622 msgs ...
> > > >>> >> balance-0 pulled 3632 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3637 msgs ...
> > > >>> >> balance-0 pulled 3641 msgs ...
> > > >>> >> balance-0 pulled 3640 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3632 msgs ...
> > > >>> >> balance-0 pulled 3630 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> ......
> > > >>> >>
> > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> thread.
> > So
> > > >>> after
> > > >>> >> the rebalancing, thread `balance-1` continues to poll but no
> > message
> > > >>> >> arrive, given that it has got 3 partitions assigned to after the
> > > >>> >> rebalancing.
> > > >>> >>
> > > >>> >> Finally other 3 consumers pulls all their partitions' message,
> the
> > > >>> >> situation is like
> > > >>> >>
> > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG,
> > OWNER
> > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >>
> > > >>> >> So you can see, partition [6, 7, 8] still has messages, but the
> > > >>> consumer
> > > >>> >> can't pull them after the rebalancing.
> > > >>> >>
> > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> > > >>> server/broker
> > > >>> >> and client.
> > > >>> >>
> > > >>> >> I hope the code is clear enough to illustrate/reproduce the
> > problem.
> > > >>> It's
> > > >>> >> quite a surprise for me because this is the main feature of the
> > new
> > > >>> >> consumer API, but it does not seem to work properly.
> > > >>> >> Feel free to talk to me for any details.
> > > >>> >> --
> > > >>> >> *JU Han*
> > > >>> >>
> > > >>> >> Software Engineer @ Teads.tv
> > > >>> >>
> > > >>> >> +33 0619608888
> > > >>> >>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *JU Han*
> > > >>
> > > >> Software Engineer @ Teads.tv
> > > >>
> > > >> +33 0619608888
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *JU Han*
> > > >
> > > > Software Engineer @ Teads.tv
> > > >
> > > > +33 0619608888
> > > >
> > >
> > >
> > >
> > > --
> > > *JU Han*
> > >
> > > Software Engineer @ Teads.tv
> > >
> > > +33 0619608888
> > >
> >
>



-- 
-- Guozhang

Re: Stuck consumer with new consumer API in 0.9

Posted by Rajiv Kurian <ra...@signalfx.com>.
Thanks Jun.

On Tue, Jan 26, 2016 at 3:48 PM, Jun Rao <ju...@confluent.io> wrote:

> Rajiv,
>
> We haven't released 0.9.0.1 yet. To try the fix, you can build a new client
> jar off the 0.9.0 branch.
>
> Thanks,
>
> Jun
>
> On Mon, Jan 25, 2016 at 12:03 PM, Rajiv Kurian <ra...@signalfx.com> wrote:
>
> > Thanks Jason. We are using an affected client I guess.
> >
> > Is there a 0.9.0 client available on maven? My search at
> > http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> > the 0.9.0.0 client which seems to have this issue.
> >
> >
> > Thanks,
> > Rajiv
> >
> > On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > > https://issues.apache.org/jira/browse/KAFKA-2978.
> > >
> > > -Jason
> > >
> > > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com>
> > wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > Was this a server bug or a client bug?
> > > >
> > > > Thanks,
> > > > Rajiv
> > > >
> > > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <
> jason@confluent.io>
> > > > wrote:
> > > >
> > > > > Apologies for the late arrival to this thread. There was a bug in
> the
> > > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > > fetching
> > > > > from a partition after a rebalance. If you're seeing this, please
> > > > checkout
> > > > > the 0.9.0 branch of Kafka and see if you can reproduce this
> problem.
> > If
> > > > you
> > > > > can, then it would be really helpful if you file a JIRA with the
> > steps
> > > to
> > > > > reproduce.
> > > > >
> > > > > From Han's initial example, it kind of looks like the problem might
> > be
> > > in
> > > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > > script
> > > > > relies on the last committed position to determine lag. To update
> > > > progress,
> > > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > > committed on shutdown or when a rebalance occurs. When the group is
> > > > stable,
> > > > > no progress will be seen because there are no commits to update the
> > > > > position.
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Issue created:
> https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > > >
> > > > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > > >
> > > > > > > > Hi Bruno,
> > > > > > > >
> > > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > > `onPartitionAssigned`?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > > > >
> > > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >>
> > > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > > bruno.rassaerts@novazone.be
> > > > > > > >:
> > > > > > > >>
> > > > > > > >>> +1 here
> > > > > > > >>>
> > > > > > > >>> As a workaround we seek to the current offset which resets
> > the
> > > > > > current
> > > > > > > >>> clients internal states and everything continues.
> > > > > > > >>>
> > > > > > > >>> Regards,
> > > > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > > > >>>
> > > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > > > > > >>>
> > > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <ismael@juma.me.uk
> >
> > > > wrote:
> > > > > > > >>> >
> > > > > > > >>> > Hi,
> > > > > > > >>> >
> > > > > > > >>> > Can you please file an issue in JIRA so that we make sure
> > > this
> > > > is
> > > > > > > >>> > investigated?
> > > > > > > >>> >
> > > > > > > >>> > Ismael
> > > > > > > >>> >
> > > > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > > > ju.han.felix@gmail.com
> > > > > >
> > > > > > > >>> wrote:
> > > > > > > >>> >>
> > > > > > > >>> >> Hi,
> > > > > > > >>> >>
> > > > > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9
> and
> > > I'm
> > > > > > > >>> particularly
> > > > > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > > > > >>> >>
> > > > > > > >>> >> My test setup is like the following:
> > > > > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the
> > > local
> > > > > > state
> > > > > > > >>> >>
> > > > > > > >>> >> The test run is like the following:
> > > > > > > >>> >>  - launch one process with 2 consumers and let it
> consume
> > > for
> > > > a
> > > > > > > while
> > > > > > > >>> >>  - launch another process with 2 consumers, this
> triggers
> > a
> > > > > > > >>> rebalancing,
> > > > > > > >>> >> and let these 2 processes run until messages are all
> > > consumed
> > > > > > > >>> >>
> > > > > > > >>> >> The code is here:
> > > > > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > > > > >>> >>
> > > > > > > >>> >> So at first, the 2 consumers of the first process each
> > got 6
> > > > > > > >>> partitions.
> > > > > > > >>> >> And after the rebalancing, each consumer got 3
> partitions.
> > > > It's
> > > > > > > >>> confirmed
> > > > > > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > > > > > >>> >>
> > > > > > > >>> >> But after the rebalancing, one of the 2 consumers of the
> > > first
> > > > > > > >>> process stop
> > > > > > > >>> >> receiving messages, even if it has partitions assigned
> to:
> > > > > > > >>> >>
> > > > > > > >>> >> balance-1 pulled 7237 msgs ...
> > > > > > > >>> >> balance-0 pulled 7263 msgs ...
> > > > > > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > > failed
> > > > > > since
> > > > > > > >>> the
> > > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > > >>> >> balance-1 flush @ 536637
> > > > > > > >>> >> balance-1 committed offset for List(balance-11,
> > balance-10,
> > > > > > > balance-9,
> > > > > > > >>> >> balance-8, balance-7, balance-6)
> > > > > > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > > failed
> > > > > > since
> > > > > > > >>> the
> > > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > > >>> >> balance-0 flush @ 543845
> > > > > > > >>> >> balance-0 committed offset for List(balance-5,
> balance-4,
> > > > > > balance-3,
> > > > > > > >>> >> balance-2, balance-1, balance-0)
> > > > > > > >>> >> balance-0 got assigned List(balance-5, balance-4,
> > balance-3)
> > > > > > > >>> >> balance-1 got assigned List(balance-11, balance-10,
> > > balance-9)
> > > > > > > >>> >> balance-1 pulled 3625 msgs ...
> > > > > > > >>> >> balance-0 pulled 3621 msgs ...
> > > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > > >>> >> balance-0 pulled 3622 msgs ...
> > > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > > >>> >> balance-0 pulled 3637 msgs ...
> > > > > > > >>> >> balance-0 pulled 3641 msgs ...
> > > > > > > >>> >> balance-0 pulled 3640 msgs ...
> > > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > > >>> >> balance-0 pulled 3630 msgs ...
> > > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > > >>> >> ......
> > > > > > > >>> >>
> > > > > > > >>> >> `balance-0` and `balance-1` are the names of the
> consumer
> > > > > thread.
> > > > > > So
> > > > > > > >>> after
> > > > > > > >>> >> the rebalancing, thread `balance-1` continues to poll
> but
> > no
> > > > > > message
> > > > > > > >>> >> arrive, given that it has got 3 partitions assigned to
> > after
> > > > the
> > > > > > > >>> >> rebalancing.
> > > > > > > >>> >>
> > > > > > > >>> >> Finally other 3 consumers pulls all their partitions'
> > > message,
> > > > > the
> > > > > > > >>> >> situation is like
> > > > > > > >>> >>
> > > > > > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET,
> > > LAG,
> > > > > > OWNER
> > > > > > > >>> >> balance-test, balance, 9, 417467, 417467, 0,
> consumer-2_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 10, 417467, 417467, 0,
> consumer-2_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 11, 417467, 417467, 0,
> consumer-2_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 6, 180269, 417467, 237198,
> > > consumer-2_/
> > > > > > > >>> 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 7, 180036, 417468, 237432,
> > > consumer-2_/
> > > > > > > >>> 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 8, 180197, 417467, 237270,
> > > consumer-2_/
> > > > > > > >>> 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 3, 417467, 417467, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 4, 417468, 417468, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 5, 417468, 417468, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 0, 417467, 417467, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 1, 417467, 417467, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >> balance-test, balance, 2, 417467, 417467, 0,
> consumer-1_/
> > > > > > 127.0.0.1
> > > > > > > >>> >>
> > > > > > > >>> >> So you can see, partition [6, 7, 8] still has messages,
> > but
> > > > the
> > > > > > > >>> consumer
> > > > > > > >>> >> can't pull them after the rebalancing.
> > > > > > > >>> >>
> > > > > > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for
> > both
> > > > > > > >>> server/broker
> > > > > > > >>> >> and client.
> > > > > > > >>> >>
> > > > > > > >>> >> I hope the code is clear enough to illustrate/reproduce
> > the
> > > > > > problem.
> > > > > > > >>> It's
> > > > > > > >>> >> quite a surprise for me because this is the main feature
> > of
> > > > the
> > > > > > new
> > > > > > > >>> >> consumer API, but it does not seem to work properly.
> > > > > > > >>> >> Feel free to talk to me for any details.
> > > > > > > >>> >> --
> > > > > > > >>> >> *JU Han*
> > > > > > > >>> >>
> > > > > > > >>> >> Software Engineer @ Teads.tv
> > > > > > > >>> >>
> > > > > > > >>> >> +33 0619608888
> > > > > > > >>> >>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> *JU Han*
> > > > > > > >>
> > > > > > > >> Software Engineer @ Teads.tv
> > > > > > > >>
> > > > > > > >> +33 0619608888
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *JU Han*
> > > > > > > >
> > > > > > > > Software Engineer @ Teads.tv
> > > > > > > >
> > > > > > > > +33 0619608888
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *JU Han*
> > > > > > >
> > > > > > > Software Engineer @ Teads.tv
> > > > > > >
> > > > > > > +33 0619608888
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Jun Rao <ju...@confluent.io>.
Rajiv,

We haven't released 0.9.0.1 yet. To try the fix, you can build a new client
jar off the 0.9.0 branch.

Thanks,

Jun

On Mon, Jan 25, 2016 at 12:03 PM, Rajiv Kurian <ra...@signalfx.com> wrote:

> Thanks Jason. We are using an affected client I guess.
>
> Is there a 0.9.0 client available on maven? My search at
> http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> the 0.9.0.0 client which seems to have this issue.
>
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-2978.
> >
> > -Jason
> >
> > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > Hi Jason,
> > >
> > > Was this a server bug or a client bug?
> > >
> > > Thanks,
> > > Rajiv
> > >
> > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Apologies for the late arrival to this thread. There was a bug in the
> > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > fetching
> > > > from a partition after a rebalance. If you're seeing this, please
> > > checkout
> > > > the 0.9.0 branch of Kafka and see if you can reproduce this problem.
> If
> > > you
> > > > can, then it would be really helpful if you file a JIRA with the
> steps
> > to
> > > > reproduce.
> > > >
> > > > From Han's initial example, it kind of looks like the problem might
> be
> > in
> > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > script
> > > > relies on the last committed position to determine lag. To update
> > > progress,
> > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > committed on shutdown or when a rebalance occurs. When the group is
> > > stable,
> > > > no progress will be seen because there are no commits to update the
> > > > position.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > >
> > > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > >
> > > > > > > Hi Bruno,
> > > > > > >
> > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > `onPartitionAssigned`?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > > >
> > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > bruno.rassaerts@novazone.be
> > > > > > >:
> > > > > > >>
> > > > > > >>> +1 here
> > > > > > >>>
> > > > > > >>> As a workaround we seek to the current offset which resets
> the
> > > > > current
> > > > > > >>> clients internal states and everything continues.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > > >>>
> > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > > > > >>>
> > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > > > >>> >
> > > > > > >>> > Hi,
> > > > > > >>> >
> > > > > > >>> > Can you please file an issue in JIRA so that we make sure
> > this
> > > is
> > > > > > >>> > investigated?
> > > > > > >>> >
> > > > > > >>> > Ismael
> > > > > > >>> >
> > > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > > ju.han.felix@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>> >>
> > > > > > >>> >> Hi,
> > > > > > >>> >>
> > > > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and
> > I'm
> > > > > > >>> particularly
> > > > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > > > >>> >>
> > > > > > >>> >> My test setup is like the following:
> > > > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the
> > local
> > > > > state
> > > > > > >>> >>
> > > > > > >>> >> The test run is like the following:
> > > > > > >>> >>  - launch one process with 2 consumers and let it consume
> > for
> > > a
> > > > > > while
> > > > > > >>> >>  - launch another process with 2 consumers, this triggers
> a
> > > > > > >>> rebalancing,
> > > > > > >>> >> and let these 2 processes run until messages are all
> > consumed
> > > > > > >>> >>
> > > > > > >>> >> The code is here:
> > > > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > > > >>> >>
> > > > > > >>> >> So at first, the 2 consumers of the first process each
> got 6
> > > > > > >>> partitions.
> > > > > > >>> >> And after the rebalancing, each consumer got 3 partitions.
> > > It's
> > > > > > >>> confirmed
> > > > > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > > > > >>> >>
> > > > > > >>> >> But after the rebalancing, one of the 2 consumers of the
> > first
> > > > > > >>> process stop
> > > > > > >>> >> receiving messages, even if it has partitions assigned to:
> > > > > > >>> >>
> > > > > > >>> >> balance-1 pulled 7237 msgs ...
> > > > > > >>> >> balance-0 pulled 7263 msgs ...
> > > > > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > failed
> > > > > since
> > > > > > >>> the
> > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > >>> >> balance-1 flush @ 536637
> > > > > > >>> >> balance-1 committed offset for List(balance-11,
> balance-10,
> > > > > > balance-9,
> > > > > > >>> >> balance-8, balance-7, balance-6)
> > > > > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > failed
> > > > > since
> > > > > > >>> the
> > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > >>> >> balance-0 flush @ 543845
> > > > > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > > > > balance-3,
> > > > > > >>> >> balance-2, balance-1, balance-0)
> > > > > > >>> >> balance-0 got assigned List(balance-5, balance-4,
> balance-3)
> > > > > > >>> >> balance-1 got assigned List(balance-11, balance-10,
> > balance-9)
> > > > > > >>> >> balance-1 pulled 3625 msgs ...
> > > > > > >>> >> balance-0 pulled 3621 msgs ...
> > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3622 msgs ...
> > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3637 msgs ...
> > > > > > >>> >> balance-0 pulled 3641 msgs ...
> > > > > > >>> >> balance-0 pulled 3640 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > >>> >> balance-0 pulled 3630 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> ......
> > > > > > >>> >>
> > > > > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> > > > thread.
> > > > > So
> > > > > > >>> after
> > > > > > >>> >> the rebalancing, thread `balance-1` continues to poll but
> no
> > > > > message
> > > > > > >>> >> arrive, given that it has got 3 partitions assigned to
> after
> > > the
> > > > > > >>> >> rebalancing.
> > > > > > >>> >>
> > > > > > >>> >> Finally other 3 consumers pulls all their partitions'
> > message,
> > > > the
> > > > > > >>> >> situation is like
> > > > > > >>> >>
> > > > > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET,
> > LAG,
> > > > > OWNER
> > > > > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 6, 180269, 417467, 237198,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 7, 180036, 417468, 237432,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 8, 180197, 417467, 237270,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >>
> > > > > > >>> >> So you can see, partition [6, 7, 8] still has messages,
> but
> > > the
> > > > > > >>> consumer
> > > > > > >>> >> can't pull them after the rebalancing.
> > > > > > >>> >>
> > > > > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for
> both
> > > > > > >>> server/broker
> > > > > > >>> >> and client.
> > > > > > >>> >>
> > > > > > >>> >> I hope the code is clear enough to illustrate/reproduce
> the
> > > > > problem.
> > > > > > >>> It's
> > > > > > >>> >> quite a surprise for me because this is the main feature
> of
> > > the
> > > > > new
> > > > > > >>> >> consumer API, but it does not seem to work properly.
> > > > > > >>> >> Feel free to talk to me for any details.
> > > > > > >>> >> --
> > > > > > >>> >> *JU Han*
> > > > > > >>> >>
> > > > > > >>> >> Software Engineer @ Teads.tv
> > > > > > >>> >>
> > > > > > >>> >> +33 0619608888
> > > > > > >>> >>
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> *JU Han*
> > > > > > >>
> > > > > > >> Software Engineer @ Teads.tv
> > > > > > >>
> > > > > > >> +33 0619608888
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *JU Han*
> > > > > > >
> > > > > > > Software Engineer @ Teads.tv
> > > > > > >
> > > > > > > +33 0619608888
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *JU Han*
> > > > > >
> > > > > > Software Engineer @ Teads.tv
> > > > > >
> > > > > > +33 0619608888
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Han JU <ju...@gmail.com>.
Hi Guozhang,

Sorry for that example. They does not come from the same run, just paste
that to illustrate the problem.
I'll try out what Jason suggests tomorrow and also retry the 0.9.0 branch.

2016-01-25 21:03 GMT+01:00 Rajiv Kurian <ra...@signalfx.com>:

> Thanks Jason. We are using an affected client I guess.
>
> Is there a 0.9.0 client available on maven? My search at
> http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
> the 0.9.0.0 client which seems to have this issue.
>
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-2978.
> >
> > -Jason
> >
> > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com>
> wrote:
> >
> > > Hi Jason,
> > >
> > > Was this a server bug or a client bug?
> > >
> > > Thanks,
> > > Rajiv
> > >
> > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Apologies for the late arrival to this thread. There was a bug in the
> > > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> > fetching
> > > > from a partition after a rebalance. If you're seeing this, please
> > > checkout
> > > > the 0.9.0 branch of Kafka and see if you can reproduce this problem.
> If
> > > you
> > > > can, then it would be really helpful if you file a JIRA with the
> steps
> > to
> > > > reproduce.
> > > >
> > > > From Han's initial example, it kind of looks like the problem might
> be
> > in
> > > > the usage. The consumer lag as shown by the kafka-consumer-groups
> > script
> > > > relies on the last committed position to determine lag. To update
> > > progress,
> > > > you need to commit offsets regularly. In the gist, offsets are only
> > > > committed on shutdown or when a rebalance occurs. When the group is
> > > stable,
> > > > no progress will be seen because there are no commits to update the
> > > > position.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > > >
> > > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > >
> > > > > > > Hi Bruno,
> > > > > > >
> > > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > > `onPartitionAssigned`?
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > > >
> > > > > > >> Ok I'll create a JIRA issue on this.
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > > bruno.rassaerts@novazone.be
> > > > > > >:
> > > > > > >>
> > > > > > >>> +1 here
> > > > > > >>>
> > > > > > >>> As a workaround we seek to the current offset which resets
> the
> > > > > current
> > > > > > >>> clients internal states and everything continues.
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > > >>>
> > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > > > > >>>
> > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > > > >>> >
> > > > > > >>> > Hi,
> > > > > > >>> >
> > > > > > >>> > Can you please file an issue in JIRA so that we make sure
> > this
> > > is
> > > > > > >>> > investigated?
> > > > > > >>> >
> > > > > > >>> > Ismael
> > > > > > >>> >
> > > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > > ju.han.felix@gmail.com
> > > > >
> > > > > > >>> wrote:
> > > > > > >>> >>
> > > > > > >>> >> Hi,
> > > > > > >>> >>
> > > > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and
> > I'm
> > > > > > >>> particularly
> > > > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > > > >>> >>
> > > > > > >>> >> My test setup is like the following:
> > > > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the
> > local
> > > > > state
> > > > > > >>> >>
> > > > > > >>> >> The test run is like the following:
> > > > > > >>> >>  - launch one process with 2 consumers and let it consume
> > for
> > > a
> > > > > > while
> > > > > > >>> >>  - launch another process with 2 consumers, this triggers
> a
> > > > > > >>> rebalancing,
> > > > > > >>> >> and let these 2 processes run until messages are all
> > consumed
> > > > > > >>> >>
> > > > > > >>> >> The code is here:
> > > > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > > > >>> >>
> > > > > > >>> >> So at first, the 2 consumers of the first process each
> got 6
> > > > > > >>> partitions.
> > > > > > >>> >> And after the rebalancing, each consumer got 3 partitions.
> > > It's
> > > > > > >>> confirmed
> > > > > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > > > > >>> >>
> > > > > > >>> >> But after the rebalancing, one of the 2 consumers of the
> > first
> > > > > > >>> process stop
> > > > > > >>> >> receiving messages, even if it has partitions assigned to:
> > > > > > >>> >>
> > > > > > >>> >> balance-1 pulled 7237 msgs ...
> > > > > > >>> >> balance-0 pulled 7263 msgs ...
> > > > > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > failed
> > > > > since
> > > > > > >>> the
> > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > >>> >> balance-1 flush @ 536637
> > > > > > >>> >> balance-1 committed offset for List(balance-11,
> balance-10,
> > > > > > balance-9,
> > > > > > >>> >> balance-8, balance-7, balance-6)
> > > > > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> > failed
> > > > > since
> > > > > > >>> the
> > > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > > >>> >> balance-0 flush @ 543845
> > > > > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > > > > balance-3,
> > > > > > >>> >> balance-2, balance-1, balance-0)
> > > > > > >>> >> balance-0 got assigned List(balance-5, balance-4,
> balance-3)
> > > > > > >>> >> balance-1 got assigned List(balance-11, balance-10,
> > balance-9)
> > > > > > >>> >> balance-1 pulled 3625 msgs ...
> > > > > > >>> >> balance-0 pulled 3621 msgs ...
> > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3622 msgs ...
> > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3637 msgs ...
> > > > > > >>> >> balance-0 pulled 3641 msgs ...
> > > > > > >>> >> balance-0 pulled 3640 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > > >>> >> balance-0 pulled 3630 msgs ...
> > > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > > >>> >> ......
> > > > > > >>> >>
> > > > > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> > > > thread.
> > > > > So
> > > > > > >>> after
> > > > > > >>> >> the rebalancing, thread `balance-1` continues to poll but
> no
> > > > > message
> > > > > > >>> >> arrive, given that it has got 3 partitions assigned to
> after
> > > the
> > > > > > >>> >> rebalancing.
> > > > > > >>> >>
> > > > > > >>> >> Finally other 3 consumers pulls all their partitions'
> > message,
> > > > the
> > > > > > >>> >> situation is like
> > > > > > >>> >>
> > > > > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET,
> > LAG,
> > > > > OWNER
> > > > > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 6, 180269, 417467, 237198,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 7, 180036, 417468, 237432,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 8, 180197, 417467, 237270,
> > consumer-2_/
> > > > > > >>> 127.0.0.1
> > > > > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > > > > 127.0.0.1
> > > > > > >>> >>
> > > > > > >>> >> So you can see, partition [6, 7, 8] still has messages,
> but
> > > the
> > > > > > >>> consumer
> > > > > > >>> >> can't pull them after the rebalancing.
> > > > > > >>> >>
> > > > > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for
> both
> > > > > > >>> server/broker
> > > > > > >>> >> and client.
> > > > > > >>> >>
> > > > > > >>> >> I hope the code is clear enough to illustrate/reproduce
> the
> > > > > problem.
> > > > > > >>> It's
> > > > > > >>> >> quite a surprise for me because this is the main feature
> of
> > > the
> > > > > new
> > > > > > >>> >> consumer API, but it does not seem to work properly.
> > > > > > >>> >> Feel free to talk to me for any details.
> > > > > > >>> >> --
> > > > > > >>> >> *JU Han*
> > > > > > >>> >>
> > > > > > >>> >> Software Engineer @ Teads.tv
> > > > > > >>> >>
> > > > > > >>> >> +33 0619608888
> > > > > > >>> >>
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> *JU Han*
> > > > > > >>
> > > > > > >> Software Engineer @ Teads.tv
> > > > > > >>
> > > > > > >> +33 0619608888
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *JU Han*
> > > > > > >
> > > > > > > Software Engineer @ Teads.tv
> > > > > > >
> > > > > > > +33 0619608888
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *JU Han*
> > > > > >
> > > > > > Software Engineer @ Teads.tv
> > > > > >
> > > > > > +33 0619608888
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 0619608888

Re: Stuck consumer with new consumer API in 0.9

Posted by Rajiv Kurian <ra...@signalfx.com>.
Thanks Jason. We are using an affected client I guess.

Is there a 0.9.0 client available on maven? My search at
http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows
the 0.9.0.0 client which seems to have this issue.


Thanks,
Rajiv

On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
> https://issues.apache.org/jira/browse/KAFKA-2978.
>
> -Jason
>
> On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com> wrote:
>
> > Hi Jason,
> >
> > Was this a server bug or a client bug?
> >
> > Thanks,
> > Rajiv
> >
> > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Apologies for the late arrival to this thread. There was a bug in the
> > > 0.9.0.0 release of Kafka which could cause the consumer to stop
> fetching
> > > from a partition after a rebalance. If you're seeing this, please
> > checkout
> > > the 0.9.0 branch of Kafka and see if you can reproduce this problem. If
> > you
> > > can, then it would be really helpful if you file a JIRA with the steps
> to
> > > reproduce.
> > >
> > > From Han's initial example, it kind of looks like the problem might be
> in
> > > the usage. The consumer lag as shown by the kafka-consumer-groups
> script
> > > relies on the last committed position to determine lag. To update
> > progress,
> > > you need to commit offsets regularly. In the gist, offsets are only
> > > committed on shutdown or when a rebalance occurs. When the group is
> > stable,
> > > no progress will be seen because there are no commits to update the
> > > position.
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com>
> > wrote:
> > > >
> > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > > >
> > > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > >
> > > > > > Hi Bruno,
> > > > > >
> > > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > > `onPartitionAssigned`?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > > >
> > > > > >> Ok I'll create a JIRA issue on this.
> > > > > >>
> > > > > >> Thanks!
> > > > > >>
> > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > > bruno.rassaerts@novazone.be
> > > > > >:
> > > > > >>
> > > > > >>> +1 here
> > > > > >>>
> > > > > >>> As a workaround we seek to the current offset which resets the
> > > > current
> > > > > >>> clients internal states and everything continues.
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > > >>>
> > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > > > >>>
> > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > > > >>> >
> > > > > >>> > Hi,
> > > > > >>> >
> > > > > >>> > Can you please file an issue in JIRA so that we make sure
> this
> > is
> > > > > >>> > investigated?
> > > > > >>> >
> > > > > >>> > Ismael
> > > > > >>> >
> > > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> > ju.han.felix@gmail.com
> > > >
> > > > > >>> wrote:
> > > > > >>> >>
> > > > > >>> >> Hi,
> > > > > >>> >>
> > > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and
> I'm
> > > > > >>> particularly
> > > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > > >>> >>
> > > > > >>> >> My test setup is like the following:
> > > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the
> local
> > > > state
> > > > > >>> >>
> > > > > >>> >> The test run is like the following:
> > > > > >>> >>  - launch one process with 2 consumers and let it consume
> for
> > a
> > > > > while
> > > > > >>> >>  - launch another process with 2 consumers, this triggers a
> > > > > >>> rebalancing,
> > > > > >>> >> and let these 2 processes run until messages are all
> consumed
> > > > > >>> >>
> > > > > >>> >> The code is here:
> > > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > > >>> >>
> > > > > >>> >> So at first, the 2 consumers of the first process each got 6
> > > > > >>> partitions.
> > > > > >>> >> And after the rebalancing, each consumer got 3 partitions.
> > It's
> > > > > >>> confirmed
> > > > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > > > >>> >>
> > > > > >>> >> But after the rebalancing, one of the 2 consumers of the
> first
> > > > > >>> process stop
> > > > > >>> >> receiving messages, even if it has partitions assigned to:
> > > > > >>> >>
> > > > > >>> >> balance-1 pulled 7237 msgs ...
> > > > > >>> >> balance-0 pulled 7263 msgs ...
> > > > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> failed
> > > > since
> > > > > >>> the
> > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > >>> >> balance-1 flush @ 536637
> > > > > >>> >> balance-1 committed offset for List(balance-11, balance-10,
> > > > > balance-9,
> > > > > >>> >> balance-8, balance-7, balance-6)
> > > > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat
> failed
> > > > since
> > > > > >>> the
> > > > > >>> >> group is rebalancing, try to re-join group.
> > > > > >>> >> balance-0 flush @ 543845
> > > > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > > > balance-3,
> > > > > >>> >> balance-2, balance-1, balance-0)
> > > > > >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> > > > > >>> >> balance-1 got assigned List(balance-11, balance-10,
> balance-9)
> > > > > >>> >> balance-1 pulled 3625 msgs ...
> > > > > >>> >> balance-0 pulled 3621 msgs ...
> > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > >>> >> balance-0 pulled 3622 msgs ...
> > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > >>> >> balance-0 pulled 3637 msgs ...
> > > > > >>> >> balance-0 pulled 3641 msgs ...
> > > > > >>> >> balance-0 pulled 3640 msgs ...
> > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > > >>> >> balance-0 pulled 3630 msgs ...
> > > > > >>> >> balance-1 pulled 0 msgs ...
> > > > > >>> >> ......
> > > > > >>> >>
> > > > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> > > thread.
> > > > So
> > > > > >>> after
> > > > > >>> >> the rebalancing, thread `balance-1` continues to poll but no
> > > > message
> > > > > >>> >> arrive, given that it has got 3 partitions assigned to after
> > the
> > > > > >>> >> rebalancing.
> > > > > >>> >>
> > > > > >>> >> Finally other 3 consumers pulls all their partitions'
> message,
> > > the
> > > > > >>> >> situation is like
> > > > > >>> >>
> > > > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET,
> LAG,
> > > > OWNER
> > > > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 6, 180269, 417467, 237198,
> consumer-2_/
> > > > > >>> 127.0.0.1
> > > > > >>> >> balance-test, balance, 7, 180036, 417468, 237432,
> consumer-2_/
> > > > > >>> 127.0.0.1
> > > > > >>> >> balance-test, balance, 8, 180197, 417467, 237270,
> consumer-2_/
> > > > > >>> 127.0.0.1
> > > > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > > > 127.0.0.1
> > > > > >>> >>
> > > > > >>> >> So you can see, partition [6, 7, 8] still has messages, but
> > the
> > > > > >>> consumer
> > > > > >>> >> can't pull them after the rebalancing.
> > > > > >>> >>
> > > > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> > > > > >>> server/broker
> > > > > >>> >> and client.
> > > > > >>> >>
> > > > > >>> >> I hope the code is clear enough to illustrate/reproduce the
> > > > problem.
> > > > > >>> It's
> > > > > >>> >> quite a surprise for me because this is the main feature of
> > the
> > > > new
> > > > > >>> >> consumer API, but it does not seem to work properly.
> > > > > >>> >> Feel free to talk to me for any details.
> > > > > >>> >> --
> > > > > >>> >> *JU Han*
> > > > > >>> >>
> > > > > >>> >> Software Engineer @ Teads.tv
> > > > > >>> >>
> > > > > >>> >> +33 0619608888
> > > > > >>> >>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> *JU Han*
> > > > > >>
> > > > > >> Software Engineer @ Teads.tv
> > > > > >>
> > > > > >> +33 0619608888
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *JU Han*
> > > > > >
> > > > > > Software Engineer @ Teads.tv
> > > > > >
> > > > > > +33 0619608888
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *JU Han*
> > > > >
> > > > > Software Engineer @ Teads.tv
> > > > >
> > > > > +33 0619608888
> > > > >
> > > >
> > >
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Rajiv, the bug was on the client. Here's a link to the JIRA:
https://issues.apache.org/jira/browse/KAFKA-2978.

-Jason

On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian <ra...@signalfx.com> wrote:

> Hi Jason,
>
> Was this a server bug or a client bug?
>
> Thanks,
> Rajiv
>
> On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Apologies for the late arrival to this thread. There was a bug in the
> > 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> > from a partition after a rebalance. If you're seeing this, please
> checkout
> > the 0.9.0 branch of Kafka and see if you can reproduce this problem. If
> you
> > can, then it would be really helpful if you file a JIRA with the steps to
> > reproduce.
> >
> > From Han's initial example, it kind of looks like the problem might be in
> > the usage. The consumer lag as shown by the kafka-consumer-groups script
> > relies on the last committed position to determine lag. To update
> progress,
> > you need to commit offsets regularly. In the gist, offsets are only
> > committed on shutdown or when a rebalance occurs. When the group is
> stable,
> > no progress will be seen because there are no commits to update the
> > position.
> >
> > Thanks,
> > Jason
> >
> > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Thanks!
> > >
> > > Ismael
> > >
> > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com>
> wrote:
> > >
> > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > > >
> > > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > > >
> > > > > Hi Bruno,
> > > > >
> > > > > Can you tell me a little bit more about that? A seek() in the
> > > > > `onPartitionAssigned`?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > > >
> > > > >> Ok I'll create a JIRA issue on this.
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > > bruno.rassaerts@novazone.be
> > > > >:
> > > > >>
> > > > >>> +1 here
> > > > >>>
> > > > >>> As a workaround we seek to the current offset which resets the
> > > current
> > > > >>> clients internal states and everything continues.
> > > > >>>
> > > > >>> Regards,
> > > > >>> Bruno Rassaerts | Freelance Java Developer
> > > > >>>
> > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > > >>>
> > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk>
> wrote:
> > > > >>> >
> > > > >>> > Hi,
> > > > >>> >
> > > > >>> > Can you please file an issue in JIRA so that we make sure this
> is
> > > > >>> > investigated?
> > > > >>> >
> > > > >>> > Ismael
> > > > >>> >
> > > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <
> ju.han.felix@gmail.com
> > >
> > > > >>> wrote:
> > > > >>> >>
> > > > >>> >> Hi,
> > > > >>> >>
> > > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > > >>> particularly
> > > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > > >>> >>
> > > > >>> >> My test setup is like the following:
> > > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > > >>> >>  - 12 partitions, auto offset commit set to false
> > > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > > state
> > > > >>> >>
> > > > >>> >> The test run is like the following:
> > > > >>> >>  - launch one process with 2 consumers and let it consume for
> a
> > > > while
> > > > >>> >>  - launch another process with 2 consumers, this triggers a
> > > > >>> rebalancing,
> > > > >>> >> and let these 2 processes run until messages are all consumed
> > > > >>> >>
> > > > >>> >> The code is here:
> > > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > > >>> >>
> > > > >>> >> So at first, the 2 consumers of the first process each got 6
> > > > >>> partitions.
> > > > >>> >> And after the rebalancing, each consumer got 3 partitions.
> It's
> > > > >>> confirmed
> > > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > > >>> >>
> > > > >>> >> But after the rebalancing, one of the 2 consumers of the first
> > > > >>> process stop
> > > > >>> >> receiving messages, even if it has partitions assigned to:
> > > > >>> >>
> > > > >>> >> balance-1 pulled 7237 msgs ...
> > > > >>> >> balance-0 pulled 7263 msgs ...
> > > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > > since
> > > > >>> the
> > > > >>> >> group is rebalancing, try to re-join group.
> > > > >>> >> balance-1 flush @ 536637
> > > > >>> >> balance-1 committed offset for List(balance-11, balance-10,
> > > > balance-9,
> > > > >>> >> balance-8, balance-7, balance-6)
> > > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > > since
> > > > >>> the
> > > > >>> >> group is rebalancing, try to re-join group.
> > > > >>> >> balance-0 flush @ 543845
> > > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > > balance-3,
> > > > >>> >> balance-2, balance-1, balance-0)
> > > > >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> > > > >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> > > > >>> >> balance-1 pulled 3625 msgs ...
> > > > >>> >> balance-0 pulled 3621 msgs ...
> > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > >>> >> balance-0 pulled 3631 msgs ...
> > > > >>> >> balance-1 pulled 0 msgs ...
> > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > >>> >> balance-0 pulled 3643 msgs ...
> > > > >>> >> balance-1 pulled 0 msgs ...
> > > > >>> >> balance-0 pulled 3622 msgs ...
> > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > >>> >> balance-1 pulled 0 msgs ...
> > > > >>> >> balance-0 pulled 3637 msgs ...
> > > > >>> >> balance-0 pulled 3641 msgs ...
> > > > >>> >> balance-0 pulled 3640 msgs ...
> > > > >>> >> balance-1 pulled 0 msgs ...
> > > > >>> >> balance-0 pulled 3632 msgs ...
> > > > >>> >> balance-0 pulled 3630 msgs ...
> > > > >>> >> balance-1 pulled 0 msgs ...
> > > > >>> >> ......
> > > > >>> >>
> > > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> > thread.
> > > So
> > > > >>> after
> > > > >>> >> the rebalancing, thread `balance-1` continues to poll but no
> > > message
> > > > >>> >> arrive, given that it has got 3 partitions assigned to after
> the
> > > > >>> >> rebalancing.
> > > > >>> >>
> > > > >>> >> Finally other 3 consumers pulls all their partitions' message,
> > the
> > > > >>> >> situation is like
> > > > >>> >>
> > > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG,
> > > OWNER
> > > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
> > > > >>> 127.0.0.1
> > > > >>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
> > > > >>> 127.0.0.1
> > > > >>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
> > > > >>> 127.0.0.1
> > > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > > 127.0.0.1
> > > > >>> >>
> > > > >>> >> So you can see, partition [6, 7, 8] still has messages, but
> the
> > > > >>> consumer
> > > > >>> >> can't pull them after the rebalancing.
> > > > >>> >>
> > > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> > > > >>> server/broker
> > > > >>> >> and client.
> > > > >>> >>
> > > > >>> >> I hope the code is clear enough to illustrate/reproduce the
> > > problem.
> > > > >>> It's
> > > > >>> >> quite a surprise for me because this is the main feature of
> the
> > > new
> > > > >>> >> consumer API, but it does not seem to work properly.
> > > > >>> >> Feel free to talk to me for any details.
> > > > >>> >> --
> > > > >>> >> *JU Han*
> > > > >>> >>
> > > > >>> >> Software Engineer @ Teads.tv
> > > > >>> >>
> > > > >>> >> +33 0619608888
> > > > >>> >>
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *JU Han*
> > > > >>
> > > > >> Software Engineer @ Teads.tv
> > > > >>
> > > > >> +33 0619608888
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *JU Han*
> > > > >
> > > > > Software Engineer @ Teads.tv
> > > > >
> > > > > +33 0619608888
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *JU Han*
> > > >
> > > > Software Engineer @ Teads.tv
> > > >
> > > > +33 0619608888
> > > >
> > >
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Rajiv Kurian <ra...@signalfx.com>.
Hi Jason,

Was this a server bug or a client bug?

Thanks,
Rajiv

On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson <ja...@confluent.io>
wrote:

> Apologies for the late arrival to this thread. There was a bug in the
> 0.9.0.0 release of Kafka which could cause the consumer to stop fetching
> from a partition after a rebalance. If you're seeing this, please checkout
> the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
> can, then it would be really helpful if you file a JIRA with the steps to
> reproduce.
>
> From Han's initial example, it kind of looks like the problem might be in
> the usage. The consumer lag as shown by the kafka-consumer-groups script
> relies on the last committed position to determine lag. To update progress,
> you need to commit offsets regularly. In the gist, offsets are only
> committed on shutdown or when a rebalance occurs. When the group is stable,
> no progress will be seen because there are no commits to update the
> position.
>
> Thanks,
> Jason
>
> On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > Thanks!
> >
> > Ismael
> >
> > On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com> wrote:
> >
> > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> > >
> > > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> > >
> > > > Hi Bruno,
> > > >
> > > > Can you tell me a little bit more about that? A seek() in the
> > > > `onPartitionAssigned`?
> > > >
> > > > Thanks.
> > > >
> > > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > > >
> > > >> Ok I'll create a JIRA issue on this.
> > > >>
> > > >> Thanks!
> > > >>
> > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> > bruno.rassaerts@novazone.be
> > > >:
> > > >>
> > > >>> +1 here
> > > >>>
> > > >>> As a workaround we seek to the current offset which resets the
> > current
> > > >>> clients internal states and everything continues.
> > > >>>
> > > >>> Regards,
> > > >>> Bruno Rassaerts | Freelance Java Developer
> > > >>>
> > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > > >>>
> > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> > > >>> >
> > > >>> > Hi,
> > > >>> >
> > > >>> > Can you please file an issue in JIRA so that we make sure this is
> > > >>> > investigated?
> > > >>> >
> > > >>> > Ismael
> > > >>> >
> > > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju.han.felix@gmail.com
> >
> > > >>> wrote:
> > > >>> >>
> > > >>> >> Hi,
> > > >>> >>
> > > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > > >>> particularly
> > > >>> >> interested in the `ConsumerRebalanceListener`.
> > > >>> >>
> > > >>> >> My test setup is like the following:
> > > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > > >>> >>  - 12 partitions, auto offset commit set to false
> > > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> > state
> > > >>> >>
> > > >>> >> The test run is like the following:
> > > >>> >>  - launch one process with 2 consumers and let it consume for a
> > > while
> > > >>> >>  - launch another process with 2 consumers, this triggers a
> > > >>> rebalancing,
> > > >>> >> and let these 2 processes run until messages are all consumed
> > > >>> >>
> > > >>> >> The code is here:
> > > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > > >>> >>
> > > >>> >> So at first, the 2 consumers of the first process each got 6
> > > >>> partitions.
> > > >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> > > >>> confirmed
> > > >>> >> by logging inside the `onPartitionAssigned` callback.
> > > >>> >>
> > > >>> >> But after the rebalancing, one of the 2 consumers of the first
> > > >>> process stop
> > > >>> >> receiving messages, even if it has partitions assigned to:
> > > >>> >>
> > > >>> >> balance-1 pulled 7237 msgs ...
> > > >>> >> balance-0 pulled 7263 msgs ...
> > > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > since
> > > >>> the
> > > >>> >> group is rebalancing, try to re-join group.
> > > >>> >> balance-1 flush @ 536637
> > > >>> >> balance-1 committed offset for List(balance-11, balance-10,
> > > balance-9,
> > > >>> >> balance-8, balance-7, balance-6)
> > > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> > since
> > > >>> the
> > > >>> >> group is rebalancing, try to re-join group.
> > > >>> >> balance-0 flush @ 543845
> > > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> > balance-3,
> > > >>> >> balance-2, balance-1, balance-0)
> > > >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> > > >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> > > >>> >> balance-1 pulled 3625 msgs ...
> > > >>> >> balance-0 pulled 3621 msgs ...
> > > >>> >> balance-0 pulled 3631 msgs ...
> > > >>> >> balance-0 pulled 3631 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3643 msgs ...
> > > >>> >> balance-0 pulled 3643 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3622 msgs ...
> > > >>> >> balance-0 pulled 3632 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3637 msgs ...
> > > >>> >> balance-0 pulled 3641 msgs ...
> > > >>> >> balance-0 pulled 3640 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> balance-0 pulled 3632 msgs ...
> > > >>> >> balance-0 pulled 3630 msgs ...
> > > >>> >> balance-1 pulled 0 msgs ...
> > > >>> >> ......
> > > >>> >>
> > > >>> >> `balance-0` and `balance-1` are the names of the consumer
> thread.
> > So
> > > >>> after
> > > >>> >> the rebalancing, thread `balance-1` continues to poll but no
> > message
> > > >>> >> arrive, given that it has got 3 partitions assigned to after the
> > > >>> >> rebalancing.
> > > >>> >>
> > > >>> >> Finally other 3 consumers pulls all their partitions' message,
> the
> > > >>> >> situation is like
> > > >>> >>
> > > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG,
> > OWNER
> > > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
> > > >>> 127.0.0.1
> > > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> > 127.0.0.1
> > > >>> >>
> > > >>> >> So you can see, partition [6, 7, 8] still has messages, but the
> > > >>> consumer
> > > >>> >> can't pull them after the rebalancing.
> > > >>> >>
> > > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> > > >>> server/broker
> > > >>> >> and client.
> > > >>> >>
> > > >>> >> I hope the code is clear enough to illustrate/reproduce the
> > problem.
> > > >>> It's
> > > >>> >> quite a surprise for me because this is the main feature of the
> > new
> > > >>> >> consumer API, but it does not seem to work properly.
> > > >>> >> Feel free to talk to me for any details.
> > > >>> >> --
> > > >>> >> *JU Han*
> > > >>> >>
> > > >>> >> Software Engineer @ Teads.tv
> > > >>> >>
> > > >>> >> +33 0619608888
> > > >>> >>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *JU Han*
> > > >>
> > > >> Software Engineer @ Teads.tv
> > > >>
> > > >> +33 0619608888
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *JU Han*
> > > >
> > > > Software Engineer @ Teads.tv
> > > >
> > > > +33 0619608888
> > > >
> > >
> > >
> > >
> > > --
> > > *JU Han*
> > >
> > > Software Engineer @ Teads.tv
> > >
> > > +33 0619608888
> > >
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Jason Gustafson <ja...@confluent.io>.
Apologies for the late arrival to this thread. There was a bug in the
0.9.0.0 release of Kafka which could cause the consumer to stop fetching
from a partition after a rebalance. If you're seeing this, please checkout
the 0.9.0 branch of Kafka and see if you can reproduce this problem. If you
can, then it would be really helpful if you file a JIRA with the steps to
reproduce.

>From Han's initial example, it kind of looks like the problem might be in
the usage. The consumer lag as shown by the kafka-consumer-groups script
relies on the last committed position to determine lag. To update progress,
you need to commit offsets regularly. In the gist, offsets are only
committed on shutdown or when a rebalance occurs. When the group is stable,
no progress will be seen because there are no commits to update the
position.

Thanks,
Jason

On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma <is...@juma.me.uk> wrote:

> Thanks!
>
> Ismael
>
> On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com> wrote:
>
> > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
> >
> > 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
> >
> > > Hi Bruno,
> > >
> > > Can you tell me a little bit more about that? A seek() in the
> > > `onPartitionAssigned`?
> > >
> > > Thanks.
> > >
> > > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> > >
> > >> Ok I'll create a JIRA issue on this.
> > >>
> > >> Thanks!
> > >>
> > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <
> bruno.rassaerts@novazone.be
> > >:
> > >>
> > >>> +1 here
> > >>>
> > >>> As a workaround we seek to the current offset which resets the
> current
> > >>> clients internal states and everything continues.
> > >>>
> > >>> Regards,
> > >>> Bruno Rassaerts | Freelance Java Developer
> > >>>
> > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> > >>> bruno.rassaerts@novazone.be -www.novazone.be
> > >>>
> > >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> > >>> >
> > >>> > Hi,
> > >>> >
> > >>> > Can you please file an issue in JIRA so that we make sure this is
> > >>> > investigated?
> > >>> >
> > >>> > Ismael
> > >>> >
> > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com>
> > >>> wrote:
> > >>> >>
> > >>> >> Hi,
> > >>> >>
> > >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> > >>> particularly
> > >>> >> interested in the `ConsumerRebalanceListener`.
> > >>> >>
> > >>> >> My test setup is like the following:
> > >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> > >>> >>  - 12 partitions, auto offset commit set to false
> > >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local
> state
> > >>> >>
> > >>> >> The test run is like the following:
> > >>> >>  - launch one process with 2 consumers and let it consume for a
> > while
> > >>> >>  - launch another process with 2 consumers, this triggers a
> > >>> rebalancing,
> > >>> >> and let these 2 processes run until messages are all consumed
> > >>> >>
> > >>> >> The code is here:
> > https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> > >>> >>
> > >>> >> So at first, the 2 consumers of the first process each got 6
> > >>> partitions.
> > >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> > >>> confirmed
> > >>> >> by logging inside the `onPartitionAssigned` callback.
> > >>> >>
> > >>> >> But after the rebalancing, one of the 2 consumers of the first
> > >>> process stop
> > >>> >> receiving messages, even if it has partitions assigned to:
> > >>> >>
> > >>> >> balance-1 pulled 7237 msgs ...
> > >>> >> balance-0 pulled 7263 msgs ...
> > >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> since
> > >>> the
> > >>> >> group is rebalancing, try to re-join group.
> > >>> >> balance-1 flush @ 536637
> > >>> >> balance-1 committed offset for List(balance-11, balance-10,
> > balance-9,
> > >>> >> balance-8, balance-7, balance-6)
> > >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> > >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed
> since
> > >>> the
> > >>> >> group is rebalancing, try to re-join group.
> > >>> >> balance-0 flush @ 543845
> > >>> >> balance-0 committed offset for List(balance-5, balance-4,
> balance-3,
> > >>> >> balance-2, balance-1, balance-0)
> > >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> > >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> > >>> >> balance-1 pulled 3625 msgs ...
> > >>> >> balance-0 pulled 3621 msgs ...
> > >>> >> balance-0 pulled 3631 msgs ...
> > >>> >> balance-0 pulled 3631 msgs ...
> > >>> >> balance-1 pulled 0 msgs ...
> > >>> >> balance-0 pulled 3643 msgs ...
> > >>> >> balance-0 pulled 3643 msgs ...
> > >>> >> balance-1 pulled 0 msgs ...
> > >>> >> balance-0 pulled 3622 msgs ...
> > >>> >> balance-0 pulled 3632 msgs ...
> > >>> >> balance-1 pulled 0 msgs ...
> > >>> >> balance-0 pulled 3637 msgs ...
> > >>> >> balance-0 pulled 3641 msgs ...
> > >>> >> balance-0 pulled 3640 msgs ...
> > >>> >> balance-1 pulled 0 msgs ...
> > >>> >> balance-0 pulled 3632 msgs ...
> > >>> >> balance-0 pulled 3630 msgs ...
> > >>> >> balance-1 pulled 0 msgs ...
> > >>> >> ......
> > >>> >>
> > >>> >> `balance-0` and `balance-1` are the names of the consumer thread.
> So
> > >>> after
> > >>> >> the rebalancing, thread `balance-1` continues to poll but no
> message
> > >>> >> arrive, given that it has got 3 partitions assigned to after the
> > >>> >> rebalancing.
> > >>> >>
> > >>> >> Finally other 3 consumers pulls all their partitions' message, the
> > >>> >> situation is like
> > >>> >>
> > >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG,
> OWNER
> > >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/
> 127.0.0.1
> > >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/
> 127.0.0.1
> > >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/
> 127.0.0.1
> > >>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
> > >>> 127.0.0.1
> > >>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
> > >>> 127.0.0.1
> > >>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
> > >>> 127.0.0.1
> > >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/
> 127.0.0.1
> > >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/
> 127.0.0.1
> > >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/
> 127.0.0.1
> > >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/
> 127.0.0.1
> > >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/
> 127.0.0.1
> > >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/
> 127.0.0.1
> > >>> >>
> > >>> >> So you can see, partition [6, 7, 8] still has messages, but the
> > >>> consumer
> > >>> >> can't pull them after the rebalancing.
> > >>> >>
> > >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> > >>> server/broker
> > >>> >> and client.
> > >>> >>
> > >>> >> I hope the code is clear enough to illustrate/reproduce the
> problem.
> > >>> It's
> > >>> >> quite a surprise for me because this is the main feature of the
> new
> > >>> >> consumer API, but it does not seem to work properly.
> > >>> >> Feel free to talk to me for any details.
> > >>> >> --
> > >>> >> *JU Han*
> > >>> >>
> > >>> >> Software Engineer @ Teads.tv
> > >>> >>
> > >>> >> +33 0619608888
> > >>> >>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> *JU Han*
> > >>
> > >> Software Engineer @ Teads.tv
> > >>
> > >> +33 0619608888
> > >>
> > >
> > >
> > >
> > > --
> > > *JU Han*
> > >
> > > Software Engineer @ Teads.tv
> > >
> > > +33 0619608888
> > >
> >
> >
> >
> > --
> > *JU Han*
> >
> > Software Engineer @ Teads.tv
> >
> > +33 0619608888
> >
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Ismael Juma <is...@juma.me.uk>.
Thanks!

Ismael

On Mon, Jan 25, 2016 at 4:03 PM, Han JU <ju...@gmail.com> wrote:

> Issue created: https://issues.apache.org/jira/browse/KAFKA-3146
>
> 2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:
>
> > Hi Bruno,
> >
> > Can you tell me a little bit more about that? A seek() in the
> > `onPartitionAssigned`?
> >
> > Thanks.
> >
> > 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
> >
> >> Ok I'll create a JIRA issue on this.
> >>
> >> Thanks!
> >>
> >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <bruno.rassaerts@novazone.be
> >:
> >>
> >>> +1 here
> >>>
> >>> As a workaround we seek to the current offset which resets the current
> >>> clients internal states and everything continues.
> >>>
> >>> Regards,
> >>> Bruno Rassaerts | Freelance Java Developer
> >>>
> >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> >>> bruno.rassaerts@novazone.be -www.novazone.be
> >>>
> >>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > Can you please file an issue in JIRA so that we make sure this is
> >>> > investigated?
> >>> >
> >>> > Ismael
> >>> >
> >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> >>> particularly
> >>> >> interested in the `ConsumerRebalanceListener`.
> >>> >>
> >>> >> My test setup is like the following:
> >>> >>  - 5M messages pre-loaded in one node kafka 0.9
> >>> >>  - 12 partitions, auto offset commit set to false
> >>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
> >>> >>
> >>> >> The test run is like the following:
> >>> >>  - launch one process with 2 consumers and let it consume for a
> while
> >>> >>  - launch another process with 2 consumers, this triggers a
> >>> rebalancing,
> >>> >> and let these 2 processes run until messages are all consumed
> >>> >>
> >>> >> The code is here:
> https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> >>> >>
> >>> >> So at first, the 2 consumers of the first process each got 6
> >>> partitions.
> >>> >> And after the rebalancing, each consumer got 3 partitions. It's
> >>> confirmed
> >>> >> by logging inside the `onPartitionAssigned` callback.
> >>> >>
> >>> >> But after the rebalancing, one of the 2 consumers of the first
> >>> process stop
> >>> >> receiving messages, even if it has partitions assigned to:
> >>> >>
> >>> >> balance-1 pulled 7237 msgs ...
> >>> >> balance-0 pulled 7263 msgs ...
> >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
> >>> the
> >>> >> group is rebalancing, try to re-join group.
> >>> >> balance-1 flush @ 536637
> >>> >> balance-1 committed offset for List(balance-11, balance-10,
> balance-9,
> >>> >> balance-8, balance-7, balance-6)
> >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
> >>> the
> >>> >> group is rebalancing, try to re-join group.
> >>> >> balance-0 flush @ 543845
> >>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> >>> >> balance-2, balance-1, balance-0)
> >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> >>> >> balance-1 pulled 3625 msgs ...
> >>> >> balance-0 pulled 3621 msgs ...
> >>> >> balance-0 pulled 3631 msgs ...
> >>> >> balance-0 pulled 3631 msgs ...
> >>> >> balance-1 pulled 0 msgs ...
> >>> >> balance-0 pulled 3643 msgs ...
> >>> >> balance-0 pulled 3643 msgs ...
> >>> >> balance-1 pulled 0 msgs ...
> >>> >> balance-0 pulled 3622 msgs ...
> >>> >> balance-0 pulled 3632 msgs ...
> >>> >> balance-1 pulled 0 msgs ...
> >>> >> balance-0 pulled 3637 msgs ...
> >>> >> balance-0 pulled 3641 msgs ...
> >>> >> balance-0 pulled 3640 msgs ...
> >>> >> balance-1 pulled 0 msgs ...
> >>> >> balance-0 pulled 3632 msgs ...
> >>> >> balance-0 pulled 3630 msgs ...
> >>> >> balance-1 pulled 0 msgs ...
> >>> >> ......
> >>> >>
> >>> >> `balance-0` and `balance-1` are the names of the consumer thread. So
> >>> after
> >>> >> the rebalancing, thread `balance-1` continues to poll but no message
> >>> >> arrive, given that it has got 3 partitions assigned to after the
> >>> >> rebalancing.
> >>> >>
> >>> >> Finally other 3 consumers pulls all their partitions' message, the
> >>> >> situation is like
> >>> >>
> >>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> >>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> >>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> >>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> >>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
> >>> 127.0.0.1
> >>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
> >>> 127.0.0.1
> >>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
> >>> 127.0.0.1
> >>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> >>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> >>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> >>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> >>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> >>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
> >>> >>
> >>> >> So you can see, partition [6, 7, 8] still has messages, but the
> >>> consumer
> >>> >> can't pull them after the rebalancing.
> >>> >>
> >>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> >>> server/broker
> >>> >> and client.
> >>> >>
> >>> >> I hope the code is clear enough to illustrate/reproduce the problem.
> >>> It's
> >>> >> quite a surprise for me because this is the main feature of the new
> >>> >> consumer API, but it does not seem to work properly.
> >>> >> Feel free to talk to me for any details.
> >>> >> --
> >>> >> *JU Han*
> >>> >>
> >>> >> Software Engineer @ Teads.tv
> >>> >>
> >>> >> +33 0619608888
> >>> >>
> >>>
> >>
> >>
> >>
> >> --
> >> *JU Han*
> >>
> >> Software Engineer @ Teads.tv
> >>
> >> +33 0619608888
> >>
> >
> >
> >
> > --
> > *JU Han*
> >
> > Software Engineer @ Teads.tv
> >
> > +33 0619608888
> >
>
>
>
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 0619608888
>

Re: Stuck consumer with new consumer API in 0.9

Posted by Han JU <ju...@gmail.com>.
Issue created: https://issues.apache.org/jira/browse/KAFKA-3146

2016-01-25 16:07 GMT+01:00 Han JU <ju...@gmail.com>:

> Hi Bruno,
>
> Can you tell me a little bit more about that? A seek() in the
> `onPartitionAssigned`?
>
> Thanks.
>
> 2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:
>
>> Ok I'll create a JIRA issue on this.
>>
>> Thanks!
>>
>> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <br...@novazone.be>:
>>
>>> +1 here
>>>
>>> As a workaround we seek to the current offset which resets the current
>>> clients internal states and everything continues.
>>>
>>> Regards,
>>> Bruno Rassaerts | Freelance Java Developer
>>>
>>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>>> bruno.rassaerts@novazone.be -www.novazone.be
>>>
>>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Can you please file an issue in JIRA so that we make sure this is
>>> > investigated?
>>> >
>>> > Ismael
>>> >
>>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>>> particularly
>>> >> interested in the `ConsumerRebalanceListener`.
>>> >>
>>> >> My test setup is like the following:
>>> >>  - 5M messages pre-loaded in one node kafka 0.9
>>> >>  - 12 partitions, auto offset commit set to false
>>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
>>> >>
>>> >> The test run is like the following:
>>> >>  - launch one process with 2 consumers and let it consume for a while
>>> >>  - launch another process with 2 consumers, this triggers a
>>> rebalancing,
>>> >> and let these 2 processes run until messages are all consumed
>>> >>
>>> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>>> >>
>>> >> So at first, the 2 consumers of the first process each got 6
>>> partitions.
>>> >> And after the rebalancing, each consumer got 3 partitions. It's
>>> confirmed
>>> >> by logging inside the `onPartitionAssigned` callback.
>>> >>
>>> >> But after the rebalancing, one of the 2 consumers of the first
>>> process stop
>>> >> receiving messages, even if it has partitions assigned to:
>>> >>
>>> >> balance-1 pulled 7237 msgs ...
>>> >> balance-0 pulled 7263 msgs ...
>>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>> >> group is rebalancing, try to re-join group.
>>> >> balance-1 flush @ 536637
>>> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>>> >> balance-8, balance-7, balance-6)
>>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>>> the
>>> >> group is rebalancing, try to re-join group.
>>> >> balance-0 flush @ 543845
>>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>>> >> balance-2, balance-1, balance-0)
>>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
>>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
>>> >> balance-1 pulled 3625 msgs ...
>>> >> balance-0 pulled 3621 msgs ...
>>> >> balance-0 pulled 3631 msgs ...
>>> >> balance-0 pulled 3631 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3643 msgs ...
>>> >> balance-0 pulled 3643 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3622 msgs ...
>>> >> balance-0 pulled 3632 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3637 msgs ...
>>> >> balance-0 pulled 3641 msgs ...
>>> >> balance-0 pulled 3640 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> balance-0 pulled 3632 msgs ...
>>> >> balance-0 pulled 3630 msgs ...
>>> >> balance-1 pulled 0 msgs ...
>>> >> ......
>>> >>
>>> >> `balance-0` and `balance-1` are the names of the consumer thread. So
>>> after
>>> >> the rebalancing, thread `balance-1` continues to poll but no message
>>> >> arrive, given that it has got 3 partitions assigned to after the
>>> >> rebalancing.
>>> >>
>>> >> Finally other 3 consumers pulls all their partitions' message, the
>>> >> situation is like
>>> >>
>>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
>>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
>>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
>>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
>>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
>>> 127.0.0.1
>>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
>>> 127.0.0.1
>>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
>>> 127.0.0.1
>>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
>>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
>>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
>>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
>>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
>>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>>> >>
>>> >> So you can see, partition [6, 7, 8] still has messages, but the
>>> consumer
>>> >> can't pull them after the rebalancing.
>>> >>
>>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
>>> server/broker
>>> >> and client.
>>> >>
>>> >> I hope the code is clear enough to illustrate/reproduce the problem.
>>> It's
>>> >> quite a surprise for me because this is the main feature of the new
>>> >> consumer API, but it does not seem to work properly.
>>> >> Feel free to talk to me for any details.
>>> >> --
>>> >> *JU Han*
>>> >>
>>> >> Software Engineer @ Teads.tv
>>> >>
>>> >> +33 0619608888
>>> >>
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer @ Teads.tv
>>
>> +33 0619608888
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 0619608888
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 0619608888

Re: Stuck consumer with new consumer API in 0.9

Posted by Han JU <ju...@gmail.com>.
Hi Bruno,

Can you tell me a little bit more about that? A seek() in the
`onPartitionAssigned`?

Thanks.

2016-01-25 10:51 GMT+01:00 Han JU <ju...@gmail.com>:

> Ok I'll create a JIRA issue on this.
>
> Thanks!
>
> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <br...@novazone.be>:
>
>> +1 here
>>
>> As a workaround we seek to the current offset which resets the current
>> clients internal states and everything continues.
>>
>> Regards,
>> Bruno Rassaerts | Freelance Java Developer
>>
>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
>> bruno.rassaerts@novazone.be -www.novazone.be
>>
>> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > Hi,
>> >
>> > Can you please file an issue in JIRA so that we make sure this is
>> > investigated?
>> >
>> > Ismael
>> >
>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
>> particularly
>> >> interested in the `ConsumerRebalanceListener`.
>> >>
>> >> My test setup is like the following:
>> >>  - 5M messages pre-loaded in one node kafka 0.9
>> >>  - 12 partitions, auto offset commit set to false
>> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
>> >>
>> >> The test run is like the following:
>> >>  - launch one process with 2 consumers and let it consume for a while
>> >>  - launch another process with 2 consumers, this triggers a
>> rebalancing,
>> >> and let these 2 processes run until messages are all consumed
>> >>
>> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>> >>
>> >> So at first, the 2 consumers of the first process each got 6
>> partitions.
>> >> And after the rebalancing, each consumer got 3 partitions. It's
>> confirmed
>> >> by logging inside the `onPartitionAssigned` callback.
>> >>
>> >> But after the rebalancing, one of the 2 consumers of the first process
>> stop
>> >> receiving messages, even if it has partitions assigned to:
>> >>
>> >> balance-1 pulled 7237 msgs ...
>> >> balance-0 pulled 7263 msgs ...
>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>> the
>> >> group is rebalancing, try to re-join group.
>> >> balance-1 flush @ 536637
>> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>> >> balance-8, balance-7, balance-6)
>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since
>> the
>> >> group is rebalancing, try to re-join group.
>> >> balance-0 flush @ 543845
>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>> >> balance-2, balance-1, balance-0)
>> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
>> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
>> >> balance-1 pulled 3625 msgs ...
>> >> balance-0 pulled 3621 msgs ...
>> >> balance-0 pulled 3631 msgs ...
>> >> balance-0 pulled 3631 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3643 msgs ...
>> >> balance-0 pulled 3643 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3622 msgs ...
>> >> balance-0 pulled 3632 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3637 msgs ...
>> >> balance-0 pulled 3641 msgs ...
>> >> balance-0 pulled 3640 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> balance-0 pulled 3632 msgs ...
>> >> balance-0 pulled 3630 msgs ...
>> >> balance-1 pulled 0 msgs ...
>> >> ......
>> >>
>> >> `balance-0` and `balance-1` are the names of the consumer thread. So
>> after
>> >> the rebalancing, thread `balance-1` continues to poll but no message
>> >> arrive, given that it has got 3 partitions assigned to after the
>> >> rebalancing.
>> >>
>> >> Finally other 3 consumers pulls all their partitions' message, the
>> >> situation is like
>> >>
>> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
>> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
>> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
>> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
>> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/
>> 127.0.0.1
>> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/
>> 127.0.0.1
>> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/
>> 127.0.0.1
>> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
>> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
>> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
>> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
>> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
>> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>> >>
>> >> So you can see, partition [6, 7, 8] still has messages, but the
>> consumer
>> >> can't pull them after the rebalancing.
>> >>
>> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
>> server/broker
>> >> and client.
>> >>
>> >> I hope the code is clear enough to illustrate/reproduce the problem.
>> It's
>> >> quite a surprise for me because this is the main feature of the new
>> >> consumer API, but it does not seem to work properly.
>> >> Feel free to talk to me for any details.
>> >> --
>> >> *JU Han*
>> >>
>> >> Software Engineer @ Teads.tv
>> >>
>> >> +33 0619608888
>> >>
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 0619608888
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 0619608888

Re: Stuck consumer with new consumer API in 0.9

Posted by Han JU <ju...@gmail.com>.
Ok I'll create a JIRA issue on this.

Thanks!

2016-01-23 21:47 GMT+01:00 Bruno Rassaerts <br...@novazone.be>:

> +1 here
>
> As a workaround we seek to the current offset which resets the current
> clients internal states and everything continues.
>
> Regards,
> Bruno Rassaerts | Freelance Java Developer
>
> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
> bruno.rassaerts@novazone.be -www.novazone.be
>
> > On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > Hi,
> >
> > Can you please file an issue in JIRA so that we make sure this is
> > investigated?
> >
> > Ismael
> >
> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm
> particularly
> >> interested in the `ConsumerRebalanceListener`.
> >>
> >> My test setup is like the following:
> >>  - 5M messages pre-loaded in one node kafka 0.9
> >>  - 12 partitions, auto offset commit set to false
> >>  - in `onPartitionsRevoked`, commit offset and flush the local state
> >>
> >> The test run is like the following:
> >>  - launch one process with 2 consumers and let it consume for a while
> >>  - launch another process with 2 consumers, this triggers a rebalancing,
> >> and let these 2 processes run until messages are all consumed
> >>
> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> >>
> >> So at first, the 2 consumers of the first process each got 6 partitions.
> >> And after the rebalancing, each consumer got 3 partitions. It's
> confirmed
> >> by logging inside the `onPartitionAssigned` callback.
> >>
> >> But after the rebalancing, one of the 2 consumers of the first process
> stop
> >> receiving messages, even if it has partitions assigned to:
> >>
> >> balance-1 pulled 7237 msgs ...
> >> balance-0 pulled 7263 msgs ...
> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> >> group is rebalancing, try to re-join group.
> >> balance-1 flush @ 536637
> >> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> >> balance-8, balance-7, balance-6)
> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> >> group is rebalancing, try to re-join group.
> >> balance-0 flush @ 543845
> >> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> >> balance-2, balance-1, balance-0)
> >> balance-0 got assigned List(balance-5, balance-4, balance-3)
> >> balance-1 got assigned List(balance-11, balance-10, balance-9)
> >> balance-1 pulled 3625 msgs ...
> >> balance-0 pulled 3621 msgs ...
> >> balance-0 pulled 3631 msgs ...
> >> balance-0 pulled 3631 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3643 msgs ...
> >> balance-0 pulled 3643 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3622 msgs ...
> >> balance-0 pulled 3632 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3637 msgs ...
> >> balance-0 pulled 3641 msgs ...
> >> balance-0 pulled 3640 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> balance-0 pulled 3632 msgs ...
> >> balance-0 pulled 3630 msgs ...
> >> balance-1 pulled 0 msgs ...
> >> ......
> >>
> >> `balance-0` and `balance-1` are the names of the consumer thread. So
> after
> >> the rebalancing, thread `balance-1` continues to poll but no message
> >> arrive, given that it has got 3 partitions assigned to after the
> >> rebalancing.
> >>
> >> Finally other 3 consumers pulls all their partitions' message, the
> >> situation is like
> >>
> >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
> >> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> >> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> >> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> >> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> >> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> >> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
> >>
> >> So you can see, partition [6, 7, 8] still has messages, but the consumer
> >> can't pull them after the rebalancing.
> >>
> >> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both
> server/broker
> >> and client.
> >>
> >> I hope the code is clear enough to illustrate/reproduce the problem.
> It's
> >> quite a surprise for me because this is the main feature of the new
> >> consumer API, but it does not seem to work properly.
> >> Feel free to talk to me for any details.
> >> --
> >> *JU Han*
> >>
> >> Software Engineer @ Teads.tv
> >>
> >> +33 0619608888
> >>
>



-- 
*JU Han*

Software Engineer @ Teads.tv

+33 0619608888

Re: Stuck consumer with new consumer API in 0.9

Posted by Bruno Rassaerts <br...@novazone.be>.
+1 here

As a workaround we seek to the current offset which resets the current clients internal states and everything continues. 

Regards,
Bruno Rassaerts | Freelance Java Developer

Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium
T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15
bruno.rassaerts@novazone.be -www.novazone.be

> On 23 Jan 2016, at 17:52, Ismael Juma <is...@juma.me.uk> wrote:
> 
> Hi,
> 
> Can you please file an issue in JIRA so that we make sure this is
> investigated?
> 
> Ismael
> 
>> On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
>> interested in the `ConsumerRebalanceListener`.
>> 
>> My test setup is like the following:
>>  - 5M messages pre-loaded in one node kafka 0.9
>>  - 12 partitions, auto offset commit set to false
>>  - in `onPartitionsRevoked`, commit offset and flush the local state
>> 
>> The test run is like the following:
>>  - launch one process with 2 consumers and let it consume for a while
>>  - launch another process with 2 consumers, this triggers a rebalancing,
>> and let these 2 processes run until messages are all consumed
>> 
>> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>> 
>> So at first, the 2 consumers of the first process each got 6 partitions.
>> And after the rebalancing, each consumer got 3 partitions. It's confirmed
>> by logging inside the `onPartitionAssigned` callback.
>> 
>> But after the rebalancing, one of the 2 consumers of the first process stop
>> receiving messages, even if it has partitions assigned to:
>> 
>> balance-1 pulled 7237 msgs ...
>> balance-0 pulled 7263 msgs ...
>> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
>> group is rebalancing, try to re-join group.
>> balance-1 flush @ 536637
>> balance-1 committed offset for List(balance-11, balance-10, balance-9,
>> balance-8, balance-7, balance-6)
>> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
>> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
>> group is rebalancing, try to re-join group.
>> balance-0 flush @ 543845
>> balance-0 committed offset for List(balance-5, balance-4, balance-3,
>> balance-2, balance-1, balance-0)
>> balance-0 got assigned List(balance-5, balance-4, balance-3)
>> balance-1 got assigned List(balance-11, balance-10, balance-9)
>> balance-1 pulled 3625 msgs ...
>> balance-0 pulled 3621 msgs ...
>> balance-0 pulled 3631 msgs ...
>> balance-0 pulled 3631 msgs ...
>> balance-1 pulled 0 msgs ...
>> balance-0 pulled 3643 msgs ...
>> balance-0 pulled 3643 msgs ...
>> balance-1 pulled 0 msgs ...
>> balance-0 pulled 3622 msgs ...
>> balance-0 pulled 3632 msgs ...
>> balance-1 pulled 0 msgs ...
>> balance-0 pulled 3637 msgs ...
>> balance-0 pulled 3641 msgs ...
>> balance-0 pulled 3640 msgs ...
>> balance-1 pulled 0 msgs ...
>> balance-0 pulled 3632 msgs ...
>> balance-0 pulled 3630 msgs ...
>> balance-1 pulled 0 msgs ...
>> ......
>> 
>> `balance-0` and `balance-1` are the names of the consumer thread. So after
>> the rebalancing, thread `balance-1` continues to poll but no message
>> arrive, given that it has got 3 partitions assigned to after the
>> rebalancing.
>> 
>> Finally other 3 consumers pulls all their partitions' message, the
>> situation is like
>> 
>> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
>> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
>> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
>> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
>> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
>> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
>> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
>> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
>> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
>> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
>> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
>> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
>> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>> 
>> So you can see, partition [6, 7, 8] still has messages, but the consumer
>> can't pull them after the rebalancing.
>> 
>> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
>> and client.
>> 
>> I hope the code is clear enough to illustrate/reproduce the problem. It's
>> quite a surprise for me because this is the main feature of the new
>> consumer API, but it does not seem to work properly.
>> Feel free to talk to me for any details.
>> --
>> *JU Han*
>> 
>> Software Engineer @ Teads.tv
>> 
>> +33 0619608888
>> 

Re: Stuck consumer with new consumer API in 0.9

Posted by Ismael Juma <is...@juma.me.uk>.
Hi,

Can you please file an issue in JIRA so that we make sure this is
investigated?

Ismael

On Fri, Jan 22, 2016 at 3:13 PM, Han JU <ju...@gmail.com> wrote:

> Hi,
>
> I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
> interested in the `ConsumerRebalanceListener`.
>
> My test setup is like the following:
>   - 5M messages pre-loaded in one node kafka 0.9
>   - 12 partitions, auto offset commit set to false
>   - in `onPartitionsRevoked`, commit offset and flush the local state
>
> The test run is like the following:
>   - launch one process with 2 consumers and let it consume for a while
>   - launch another process with 2 consumers, this triggers a rebalancing,
> and let these 2 processes run until messages are all consumed
>
> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
>
> So at first, the 2 consumers of the first process each got 6 partitions.
> And after the rebalancing, each consumer got 3 partitions. It's confirmed
> by logging inside the `onPartitionAssigned` callback.
>
> But after the rebalancing, one of the 2 consumers of the first process stop
> receiving messages, even if it has partitions assigned to:
>
> balance-1 pulled 7237 msgs ...
> balance-0 pulled 7263 msgs ...
> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-1 flush @ 536637
> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> balance-8, balance-7, balance-6)
> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-0 flush @ 543845
> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> balance-2, balance-1, balance-0)
> balance-0 got assigned List(balance-5, balance-4, balance-3)
> balance-1 got assigned List(balance-11, balance-10, balance-9)
> balance-1 pulled 3625 msgs ...
> balance-0 pulled 3621 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3622 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3637 msgs ...
> balance-0 pulled 3641 msgs ...
> balance-0 pulled 3640 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-0 pulled 3630 msgs ...
> balance-1 pulled 0 msgs ...
> ......
>
> `balance-0` and `balance-1` are the names of the consumer thread. So after
> the rebalancing, thread `balance-1` continues to poll but no message
> arrive, given that it has got 3 partitions assigned to after the
> rebalancing.
>
> Finally other 3 consumers pulls all their partitions' message, the
> situation is like
>
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
>
> So you can see, partition [6, 7, 8] still has messages, but the consumer
> can't pull them after the rebalancing.
>
> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
> and client.
>
> I hope the code is clear enough to illustrate/reproduce the problem. It's
> quite a surprise for me because this is the main feature of the new
> consumer API, but it does not seem to work properly.
> Feel free to talk to me for any details.
> --
> *JU Han*
>
> Software Engineer @ Teads.tv
>
> +33 0619608888
>

Re: Stuck consumer with new consumer API in 0.9

Posted by 何伟昌 <36...@qq.com>.
+1, facing same issue.

> 在 2016年1月22日,下午11:13,Han JU <ju...@gmail.com> 写道:
> 
> Hi,
> 
> I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly
> interested in the `ConsumerRebalanceListener`.
> 
> My test setup is like the following:
>  - 5M messages pre-loaded in one node kafka 0.9
>  - 12 partitions, auto offset commit set to false
>  - in `onPartitionsRevoked`, commit offset and flush the local state
> 
> The test run is like the following:
>  - launch one process with 2 consumers and let it consume for a while
>  - launch another process with 2 consumers, this triggers a rebalancing,
> and let these 2 processes run until messages are all consumed
> 
> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd
> 
> So at first, the 2 consumers of the first process each got 6 partitions.
> And after the rebalancing, each consumer got 3 partitions. It's confirmed
> by logging inside the `onPartitionAssigned` callback.
> 
> But after the rebalancing, one of the 2 consumers of the first process stop
> receiving messages, even if it has partitions assigned to:
> 
> balance-1 pulled 7237 msgs ...
> balance-0 pulled 7263 msgs ...
> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-1 flush @ 536637
> balance-1 committed offset for List(balance-11, balance-10, balance-9,
> balance-8, balance-7, balance-6)
> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1]
> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the
> group is rebalancing, try to re-join group.
> balance-0 flush @ 543845
> balance-0 committed offset for List(balance-5, balance-4, balance-3,
> balance-2, balance-1, balance-0)
> balance-0 got assigned List(balance-5, balance-4, balance-3)
> balance-1 got assigned List(balance-11, balance-10, balance-9)
> balance-1 pulled 3625 msgs ...
> balance-0 pulled 3621 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-0 pulled 3631 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-0 pulled 3643 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3622 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3637 msgs ...
> balance-0 pulled 3641 msgs ...
> balance-0 pulled 3640 msgs ...
> balance-1 pulled 0 msgs ...
> balance-0 pulled 3632 msgs ...
> balance-0 pulled 3630 msgs ...
> balance-1 pulled 0 msgs ...
> ......
> 
> `balance-0` and `balance-1` are the names of the consumer thread. So after
> the rebalancing, thread `balance-1` continues to poll but no message
> arrive, given that it has got 3 partitions assigned to after the
> rebalancing.
> 
> Finally other 3 consumers pulls all their partitions' message, the
> situation is like
> 
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1
> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1
> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1
> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1
> balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1
> balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1
> balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1
> 
> So you can see, partition [6, 7, 8] still has messages, but the consumer
> can't pull them after the rebalancing.
> 
> I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker
> and client.
> 
> I hope the code is clear enough to illustrate/reproduce the problem. It's
> quite a surprise for me because this is the main feature of the new
> consumer API, but it does not seem to work properly.
> Feel free to talk to me for any details.
> -- 
> *JU Han*
> 
> Software Engineer @ Teads.tv
> 
> +33 0619608888