You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Guy Doulberg <gu...@conduit.com> on 2012/07/05 15:48:10 UTC

A consumer that keeps stopping

Hi guys,

I am running a  kafka cluster with 3 brokers (0.7.0).

I have a 2 consumer-groups on the same topic,

One consumer -group is working fine (meaning it never stops consuming),

Unfortunately the other consumer-group - which contains one consumer, is 
consuming until is suddenly stops...

In the logs of that consumer or the brokers, I can't find anything that 
can indicates why it stopped consuming.

As far as I know, there is no re-balancing in the consumer (also there 
is one consumer),
I read about bug https://issues.apache.org/jira/browse/KAFKA-256 that 
was fixed at 0.7.1, but I am not sure it is relevant to my case, since 
there is no re-balancing


Any ideas what I can do here?

Thanks
Guy Doulberg

Re: A consumer that keeps stopping

Posted by Guy Doulberg <gu...@conduit.com>.
Neha,

no,
There was nothing in the try catch, I ignored this problem by using 
another consumer-group.

Using the consumer offset checker, I think some of the partitions didn't 
have any owner (the owner was null),


Thanks,


On 07/21/2012 02:27 AM, Neha Narkhede wrote:
> Guy,
>
> Were you able to diagnose the root cause for your consumer issue using
> Jun's suggestion ?
>
> Thanks,
> Neha
>
> On Sun, Jul 8, 2012 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> It's now in FAQ on the Kafka site. Could you add  a try/catch clause to log
>> all Throwable in the consumer logic?
>>
>> Thanks,
>>
>> Jun
>>
>> On Sat, Jul 7, 2012 at 11:59 PM, Guy Doulberg <guy.doulberg@conduit.com
>>> wrote:
>>> Thanks Jun,
>>>
>>> I think is a really good idea to have what you wrote at the FAQ,
>>>
>>>
>>> Regarding my issue,
>>>
>>> using your method, I now know for sure that my consumer has actually
>>> stopped,
>>>
>>> What should be my next step in diagnosing the problem?
>>>
>>>
>>> Thnaks, Guy
>>>
>>>
>>> On 07/05/2012 06:24 PM, Jun Rao wrote:
>>>
>>>> Guy,
>>>>
>>>> I am adding a FAQ to the website. Here is the content.
>>>>
>>>> My consumer seems to have stopped, why?First, try to figure out if the
>>>>
>>>> consumer has really stopped or is just slow, using our tool
>>>> ConsumerOffsetChecker.
>>>>
>>>> bin/kafka-run-class.sh kafka.tools.**ConsumerOffsetChecker --group
>>>> consumer-group1 --zkconnect zkhost:zkport --topic topic1
>>>> consumer-group1,topic1,0-0 (Group,Topic,BrokerId-**PartitionId)
>>>>               Owner = consumer-group1-consumer1
>>>>     Consumer offset = 70121994703
>>>>                     = 70,121,994,703 (65.31G)
>>>>            Log size = 70122018287
>>>>                     = 70,122,018,287 (65.31G)
>>>>        Consumer lag = 23584
>>>>                     = 23,584 (0.00G)
>>>>
>>>> If consumer offset is not moving after some time, then consumer is
>> likely
>>>> to have stopped. If consumer offset is moving, but consumer lag
>>>> (difference
>>>> between the end of the log and the consumer offset) is increasing, the
>>>> consumer is slower than the producer. If the consumer is slow, the
>> typical
>>>> solution is to increase the degree of parallelism in the consumer. This
>>>> may
>>>> require increasing the number of partitions of a topic. If a consumer
>> has
>>>> stopped, one of the typical causes is that the application code that
>>>> consumes messages somehow died and therefore killed the consumer thread.
>>>> We
>>>> recommend using a try/catch clause to log all Throwable in the consumer
>>>> logic.
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Thu, Jul 5, 2012 at 6:48 AM, Guy Doulberg <guy.doulberg@conduit.com
>>> **
>>>> wrote:
>>>>
>>>>   Hi guys,
>>>>> I am running a  kafka cluster with 3 brokers (0.7.0).
>>>>>
>>>>> I have a 2 consumer-groups on the same topic,
>>>>>
>>>>> One consumer -group is working fine (meaning it never stops consuming),
>>>>>
>>>>> Unfortunately the other consumer-group - which contains one consumer,
>> is
>>>>> consuming until is suddenly stops...
>>>>>
>>>>> In the logs of that consumer or the brokers, I can't find anything that
>>>>> can indicates why it stopped consuming.
>>>>>
>>>>> As far as I know, there is no re-balancing in the consumer (also there
>> is
>>>>> one consumer),
>>>>> I read about bug https://issues.apache.org/****jira/browse/KAFKA-256<
>> https://issues.apache.org/**jira/browse/KAFKA-256>
>>>>> <https://**issues.apache.org/jira/browse/**KAFKA-256<
>> https://issues.apache.org/jira/browse/KAFKA-256>>that
>>>>> was fixed at 0.7.1, but I am not sure it is relevant to my case, since
>>>>>
>>>>> there is no re-balancing
>>>>>
>>>>>
>>>>> Any ideas what I can do here?
>>>>>
>>>>> Thanks
>>>>> Guy Doulberg
>>>>>
>>>>>
>>>



Re: A consumer that keeps stopping

Posted by Neha Narkhede <ne...@gmail.com>.
Guy,

Were you able to diagnose the root cause for your consumer issue using
Jun's suggestion ?

Thanks,
Neha

On Sun, Jul 8, 2012 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:

> It's now in FAQ on the Kafka site. Could you add  a try/catch clause to log
> all Throwable in the consumer logic?
>
> Thanks,
>
> Jun
>
> On Sat, Jul 7, 2012 at 11:59 PM, Guy Doulberg <guy.doulberg@conduit.com
> >wrote:
>
> > Thanks Jun,
> >
> > I think is a really good idea to have what you wrote at the FAQ,
> >
> >
> > Regarding my issue,
> >
> > using your method, I now know for sure that my consumer has actually
> > stopped,
> >
> > What should be my next step in diagnosing the problem?
> >
> >
> > Thnaks, Guy
> >
> >
> > On 07/05/2012 06:24 PM, Jun Rao wrote:
> >
> >> Guy,
> >>
> >> I am adding a FAQ to the website. Here is the content.
> >>
> >> My consumer seems to have stopped, why?First, try to figure out if the
> >>
> >> consumer has really stopped or is just slow, using our tool
> >> ConsumerOffsetChecker.
> >>
> >> bin/kafka-run-class.sh kafka.tools.**ConsumerOffsetChecker --group
> >> consumer-group1 --zkconnect zkhost:zkport --topic topic1
> >> consumer-group1,topic1,0-0 (Group,Topic,BrokerId-**PartitionId)
> >>              Owner = consumer-group1-consumer1
> >>    Consumer offset = 70121994703
> >>                    = 70,121,994,703 (65.31G)
> >>           Log size = 70122018287
> >>                    = 70,122,018,287 (65.31G)
> >>       Consumer lag = 23584
> >>                    = 23,584 (0.00G)
> >>
> >> If consumer offset is not moving after some time, then consumer is
> likely
> >> to have stopped. If consumer offset is moving, but consumer lag
> >> (difference
> >> between the end of the log and the consumer offset) is increasing, the
> >> consumer is slower than the producer. If the consumer is slow, the
> typical
> >> solution is to increase the degree of parallelism in the consumer. This
> >> may
> >> require increasing the number of partitions of a topic. If a consumer
> has
> >> stopped, one of the typical causes is that the application code that
> >> consumes messages somehow died and therefore killed the consumer thread.
> >> We
> >> recommend using a try/catch clause to log all Throwable in the consumer
> >> logic.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Jul 5, 2012 at 6:48 AM, Guy Doulberg <guy.doulberg@conduit.com
> >**
> >> wrote:
> >>
> >>  Hi guys,
> >>>
> >>> I am running a  kafka cluster with 3 brokers (0.7.0).
> >>>
> >>> I have a 2 consumer-groups on the same topic,
> >>>
> >>> One consumer -group is working fine (meaning it never stops consuming),
> >>>
> >>> Unfortunately the other consumer-group - which contains one consumer,
> is
> >>> consuming until is suddenly stops...
> >>>
> >>> In the logs of that consumer or the brokers, I can't find anything that
> >>> can indicates why it stopped consuming.
> >>>
> >>> As far as I know, there is no re-balancing in the consumer (also there
> is
> >>> one consumer),
> >>> I read about bug https://issues.apache.org/****jira/browse/KAFKA-256<
> https://issues.apache.org/**jira/browse/KAFKA-256>
> >>> <https://**issues.apache.org/jira/browse/**KAFKA-256<
> https://issues.apache.org/jira/browse/KAFKA-256>>that
> >>> was fixed at 0.7.1, but I am not sure it is relevant to my case, since
> >>>
> >>> there is no re-balancing
> >>>
> >>>
> >>> Any ideas what I can do here?
> >>>
> >>> Thanks
> >>> Guy Doulberg
> >>>
> >>>
> >
> >
>

Re: A consumer that keeps stopping

Posted by Jun Rao <ju...@gmail.com>.
It's now in FAQ on the Kafka site. Could you add  a try/catch clause to log
all Throwable in the consumer logic?

Thanks,

Jun

On Sat, Jul 7, 2012 at 11:59 PM, Guy Doulberg <gu...@conduit.com>wrote:

> Thanks Jun,
>
> I think is a really good idea to have what you wrote at the FAQ,
>
>
> Regarding my issue,
>
> using your method, I now know for sure that my consumer has actually
> stopped,
>
> What should be my next step in diagnosing the problem?
>
>
> Thnaks, Guy
>
>
> On 07/05/2012 06:24 PM, Jun Rao wrote:
>
>> Guy,
>>
>> I am adding a FAQ to the website. Here is the content.
>>
>> My consumer seems to have stopped, why?First, try to figure out if the
>>
>> consumer has really stopped or is just slow, using our tool
>> ConsumerOffsetChecker.
>>
>> bin/kafka-run-class.sh kafka.tools.**ConsumerOffsetChecker --group
>> consumer-group1 --zkconnect zkhost:zkport --topic topic1
>> consumer-group1,topic1,0-0 (Group,Topic,BrokerId-**PartitionId)
>>              Owner = consumer-group1-consumer1
>>    Consumer offset = 70121994703
>>                    = 70,121,994,703 (65.31G)
>>           Log size = 70122018287
>>                    = 70,122,018,287 (65.31G)
>>       Consumer lag = 23584
>>                    = 23,584 (0.00G)
>>
>> If consumer offset is not moving after some time, then consumer is likely
>> to have stopped. If consumer offset is moving, but consumer lag
>> (difference
>> between the end of the log and the consumer offset) is increasing, the
>> consumer is slower than the producer. If the consumer is slow, the typical
>> solution is to increase the degree of parallelism in the consumer. This
>> may
>> require increasing the number of partitions of a topic. If a consumer has
>> stopped, one of the typical causes is that the application code that
>> consumes messages somehow died and therefore killed the consumer thread.
>> We
>> recommend using a try/catch clause to log all Throwable in the consumer
>> logic.
>>
>> Thanks,
>>
>> Jun
>>
>> On Thu, Jul 5, 2012 at 6:48 AM, Guy Doulberg <gu...@conduit.com>**
>> wrote:
>>
>>  Hi guys,
>>>
>>> I am running a  kafka cluster with 3 brokers (0.7.0).
>>>
>>> I have a 2 consumer-groups on the same topic,
>>>
>>> One consumer -group is working fine (meaning it never stops consuming),
>>>
>>> Unfortunately the other consumer-group - which contains one consumer, is
>>> consuming until is suddenly stops...
>>>
>>> In the logs of that consumer or the brokers, I can't find anything that
>>> can indicates why it stopped consuming.
>>>
>>> As far as I know, there is no re-balancing in the consumer (also there is
>>> one consumer),
>>> I read about bug https://issues.apache.org/****jira/browse/KAFKA-256<https://issues.apache.org/**jira/browse/KAFKA-256>
>>> <https://**issues.apache.org/jira/browse/**KAFKA-256<https://issues.apache.org/jira/browse/KAFKA-256>>that
>>> was fixed at 0.7.1, but I am not sure it is relevant to my case, since
>>>
>>> there is no re-balancing
>>>
>>>
>>> Any ideas what I can do here?
>>>
>>> Thanks
>>> Guy Doulberg
>>>
>>>
>
>

Re: A consumer that keeps stopping

Posted by Guy Doulberg <gu...@conduit.com>.
Thanks Jun,

I think is a really good idea to have what you wrote at the FAQ,


Regarding my issue,

using your method, I now know for sure that my consumer has actually 
stopped,

What should be my next step in diagnosing the problem?


Thnaks, Guy

On 07/05/2012 06:24 PM, Jun Rao wrote:
> Guy,
>
> I am adding a FAQ to the website. Here is the content.
>
> My consumer seems to have stopped, why?First, try to figure out if the
> consumer has really stopped or is just slow, using our tool
> ConsumerOffsetChecker.
>
> bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
> consumer-group1 --zkconnect zkhost:zkport --topic topic1
> consumer-group1,topic1,0-0 (Group,Topic,BrokerId-PartitionId)
>              Owner = consumer-group1-consumer1
>    Consumer offset = 70121994703
>                    = 70,121,994,703 (65.31G)
>           Log size = 70122018287
>                    = 70,122,018,287 (65.31G)
>       Consumer lag = 23584
>                    = 23,584 (0.00G)
>
> If consumer offset is not moving after some time, then consumer is likely
> to have stopped. If consumer offset is moving, but consumer lag (difference
> between the end of the log and the consumer offset) is increasing, the
> consumer is slower than the producer. If the consumer is slow, the typical
> solution is to increase the degree of parallelism in the consumer. This may
> require increasing the number of partitions of a topic. If a consumer has
> stopped, one of the typical causes is that the application code that
> consumes messages somehow died and therefore killed the consumer thread. We
> recommend using a try/catch clause to log all Throwable in the consumer
> logic.
>
> Thanks,
>
> Jun
>
> On Thu, Jul 5, 2012 at 6:48 AM, Guy Doulberg <gu...@conduit.com>wrote:
>
>> Hi guys,
>>
>> I am running a  kafka cluster with 3 brokers (0.7.0).
>>
>> I have a 2 consumer-groups on the same topic,
>>
>> One consumer -group is working fine (meaning it never stops consuming),
>>
>> Unfortunately the other consumer-group - which contains one consumer, is
>> consuming until is suddenly stops...
>>
>> In the logs of that consumer or the brokers, I can't find anything that
>> can indicates why it stopped consuming.
>>
>> As far as I know, there is no re-balancing in the consumer (also there is
>> one consumer),
>> I read about bug https://issues.apache.org/**jira/browse/KAFKA-256<https://issues.apache.org/jira/browse/KAFKA-256>that was fixed at 0.7.1, but I am not sure it is relevant to my case, since
>> there is no re-balancing
>>
>>
>> Any ideas what I can do here?
>>
>> Thanks
>> Guy Doulberg
>>



Re: A consumer that keeps stopping

Posted by Jun Rao <ju...@gmail.com>.
Guy,

I am adding a FAQ to the website. Here is the content.

My consumer seems to have stopped, why?First, try to figure out if the
consumer has really stopped or is just slow, using our tool
ConsumerOffsetChecker.

bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group
consumer-group1 --zkconnect zkhost:zkport --topic topic1
consumer-group1,topic1,0-0 (Group,Topic,BrokerId-PartitionId)
            Owner = consumer-group1-consumer1
  Consumer offset = 70121994703
                  = 70,121,994,703 (65.31G)
         Log size = 70122018287
                  = 70,122,018,287 (65.31G)
     Consumer lag = 23584
                  = 23,584 (0.00G)

If consumer offset is not moving after some time, then consumer is likely
to have stopped. If consumer offset is moving, but consumer lag (difference
between the end of the log and the consumer offset) is increasing, the
consumer is slower than the producer. If the consumer is slow, the typical
solution is to increase the degree of parallelism in the consumer. This may
require increasing the number of partitions of a topic. If a consumer has
stopped, one of the typical causes is that the application code that
consumes messages somehow died and therefore killed the consumer thread. We
recommend using a try/catch clause to log all Throwable in the consumer
logic.

Thanks,

Jun

On Thu, Jul 5, 2012 at 6:48 AM, Guy Doulberg <gu...@conduit.com>wrote:

> Hi guys,
>
> I am running a  kafka cluster with 3 brokers (0.7.0).
>
> I have a 2 consumer-groups on the same topic,
>
> One consumer -group is working fine (meaning it never stops consuming),
>
> Unfortunately the other consumer-group - which contains one consumer, is
> consuming until is suddenly stops...
>
> In the logs of that consumer or the brokers, I can't find anything that
> can indicates why it stopped consuming.
>
> As far as I know, there is no re-balancing in the consumer (also there is
> one consumer),
> I read about bug https://issues.apache.org/**jira/browse/KAFKA-256<https://issues.apache.org/jira/browse/KAFKA-256>that was fixed at 0.7.1, but I am not sure it is relevant to my case, since
> there is no re-balancing
>
>
> Any ideas what I can do here?
>
> Thanks
> Guy Doulberg
>