You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jagadish Bihani <ja...@pubmatic.com> on 2014/06/07 09:17:12 UTC

About peculiar scenario in kafka camus consumer

Hi
I have observed a peculiar scenario in production environment in which a 
mapper task for a particular topic-partition combination always fails 
with the exception  'Task attempt failed to report status for 600 seconds'.

When I dug deep I found it stucks at either fetch() method/getNext 
method of Kafkareader.

Things which I tried:
-------------------------
1. Network and /etc/hosts entries are checked. They are fine.
2. Machine on which that particular partition resides, there are another 
partition as well and there is no problem in reading those partitions. 
So it is not machine specific or network specific issue.
3. Tried increasing timeout parameters and changing buffering parameters.
4.  Records are zlib compressed.  I tried Kafka console-consumer but 
couldn't verify with it as data was large.

Here are relevant configs:
-----------------------------------
kafka.client.name=camus1
# Fetch Request Parameters
kafka.fetch.buffer.size=1048576
#kafka.fetch.request.correlationid=
kafka.fetch.request.max.wait=100000
#kafka.fetch.request.min.bytes=
socket.receive.buffer.bytes=1048576
fetch.message.max.bytes=10485760
# Connection parameters.
kafka.brokers=<list of ips>
kafka.timeout.value=30000


Re: About peculiar scenario in kafka camus consumer

Posted by Jun Rao <ju...@gmail.com>.
Are you saying that the consumer is stuck at fetching data at the same
offset again and again w/o returning any message? If so, what's the max
message size on the broker? You need to make sure that consumer fetch size
is larger than the max message size.

Thanks,

Jun


On Sat, Jun 7, 2014 at 12:17 AM, Jagadish Bihani <
jagadish.bihani@pubmatic.com> wrote:

> Hi
> I have observed a peculiar scenario in production environment in which a
> mapper task for a particular topic-partition combination always fails with
> the exception  'Task attempt failed to report status for 600 seconds'.
>
> When I dug deep I found it stucks at either fetch() method/getNext method
> of Kafkareader.
>
> Things which I tried:
> -------------------------
> 1. Network and /etc/hosts entries are checked. They are fine.
> 2. Machine on which that particular partition resides, there are another
> partition as well and there is no problem in reading those partitions. So
> it is not machine specific or network specific issue.
> 3. Tried increasing timeout parameters and changing buffering parameters.
> 4.  Records are zlib compressed.  I tried Kafka console-consumer but
> couldn't verify with it as data was large.
>
> Here are relevant configs:
> -----------------------------------
> kafka.client.name=camus1
> # Fetch Request Parameters
> kafka.fetch.buffer.size=1048576
> #kafka.fetch.request.correlationid=
> kafka.fetch.request.max.wait=100000
> #kafka.fetch.request.min.bytes=
> socket.receive.buffer.bytes=1048576
> fetch.message.max.bytes=10485760
> # Connection parameters.
> kafka.brokers=<list of ips>
> kafka.timeout.value=30000
>
>