You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Maximiliano Patricio Méndez <mm...@despegar.com> on 2016/02/10 18:39:20 UTC

Connection to kafka stalls

Hi,

I'm having trouble with some recurring stalling connections to kafka. What
I see as a symptom is that some consumers lag behind and most times
restarting the consumer doesn't help. (occasionally when some other
consumer tries to take the problematic partition it no longer fails, but
mostly even when it switches consumer it stalls shortly after).

Doing a thread dump of this situation I see that the call stalls in the
hasNext() method of the ConsumerIterator, although it has many messages to
consume and that particular partition for that topic is lagged.

"hermes-consumer-thread-1" #75 prio=5 os_prio=0 tid=0x00007fe430fde000
nid=0x7c01 waiting on condition [0x00007fe428ce1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000070932c870> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
        at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:65)
        at
kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
        at
kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
        at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)

Reading through the mailing list I've come accross old solutions for this
problem, including checking the consumer.timeout.ms (which i've added with
no results) and checking the size of the messages (if the message is bigger
than fetch.message.max.bytes it will stop like this) but my messages are
all under 300 bytes in size.

Have anyone had this problem? Any help would be appreciated

Thanks

Re: Connection to kafka stalls

Posted by Maximiliano Patricio Méndez <mm...@despegar.com>.
Another update.

The problem appeared again. The consumer is stalling at certain offsets.

Anyone has an idea of what can be happening?

Anything that I could add that might help, let me know

2016-02-12 10:29 GMT-03:00 Maximiliano Patricio Méndez <mmendez@despegar.com
>:

> Hi,
>
> An update about this.
>
> I've recreated the topic with different configuration and the problem
> doesn't seem to be happening anymore. I have 8 brokers. This topic was
> previously created (when the connections were stalling within the thread
> dump that I've attached) with 5 partitions, retetion policy of 5 days and
> replication factor of 2. The new configuration (which no longer causes this
> issue) has 8 partitions, same retention policy and same replication factor.
>
> What could have caused the connections to hang in the previous
> configuration?
>
> 2016-02-10 15:19 GMT-03:00 Maximiliano Patricio Méndez <
> mmendez@despegar.com>:
>
>> Sorry, I'm using kafka 0.8.2 and a ConsumerGroup similar to what it is in
>> the documentation.
>>
>> 2016-02-10 14:39 GMT-03:00 Maximiliano Patricio Méndez <
>> mmendez@despegar.com>:
>>
>>> Hi,
>>>
>>> I'm having trouble with some recurring stalling connections to kafka.
>>> What I see as a symptom is that some consumers lag behind and most times
>>> restarting the consumer doesn't help. (occasionally when some other
>>> consumer tries to take the problematic partition it no longer fails, but
>>> mostly even when it switches consumer it stalls shortly after).
>>>
>>> Doing a thread dump of this situation I see that the call stalls in the
>>> hasNext() method of the ConsumerIterator, although it has many messages to
>>> consume and that particular partition for that topic is lagged.
>>>
>>> "hermes-consumer-thread-1" #75 prio=5 os_prio=0 tid=0x00007fe430fde000
>>> nid=0x7c01 waiting on condition [0x00007fe428ce1000]
>>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>>         at sun.misc.Unsafe.park(Native Method)
>>>         - parking to wait for  <0x000000070932c870> (a
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>>         at
>>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>>         at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>>>         at
>>> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>>>         at
>>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:65)
>>>         at
>>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>>>         at
>>> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>>>         at
>>> kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>>>
>>> Reading through the mailing list I've come accross old solutions for
>>> this problem, including checking the consumer.timeout.ms (which i've
>>> added with no results) and checking the size of the messages (if the
>>> message is bigger than fetch.message.max.bytes it will stop like this) but
>>> my messages are all under 300 bytes in size.
>>>
>>> Have anyone had this problem? Any help would be appreciated
>>>
>>> Thanks
>>>
>>>
>>
>

Re: Connection to kafka stalls

Posted by Maximiliano Patricio Méndez <mm...@despegar.com>.
Hi,

An update about this.

I've recreated the topic with different configuration and the problem
doesn't seem to be happening anymore. I have 8 brokers. This topic was
previously created (when the connections were stalling within the thread
dump that I've attached) with 5 partitions, retetion policy of 5 days and
replication factor of 2. The new configuration (which no longer causes this
issue) has 8 partitions, same retention policy and same replication factor.

What could have caused the connections to hang in the previous
configuration?

2016-02-10 15:19 GMT-03:00 Maximiliano Patricio Méndez <mmendez@despegar.com
>:

> Sorry, I'm using kafka 0.8.2 and a ConsumerGroup similar to what it is in
> the documentation.
>
> 2016-02-10 14:39 GMT-03:00 Maximiliano Patricio Méndez <
> mmendez@despegar.com>:
>
>> Hi,
>>
>> I'm having trouble with some recurring stalling connections to kafka.
>> What I see as a symptom is that some consumers lag behind and most times
>> restarting the consumer doesn't help. (occasionally when some other
>> consumer tries to take the problematic partition it no longer fails, but
>> mostly even when it switches consumer it stalls shortly after).
>>
>> Doing a thread dump of this situation I see that the call stalls in the
>> hasNext() method of the ConsumerIterator, although it has many messages to
>> consume and that particular partition for that topic is lagged.
>>
>> "hermes-consumer-thread-1" #75 prio=5 os_prio=0 tid=0x00007fe430fde000
>> nid=0x7c01 waiting on condition [0x00007fe428ce1000]
>>    java.lang.Thread.State: TIMED_WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for  <0x000000070932c870> (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>         at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>>         at
>> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>>         at
>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:65)
>>         at
>> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>>         at
>> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>>         at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>>
>> Reading through the mailing list I've come accross old solutions for this
>> problem, including checking the consumer.timeout.ms (which i've added
>> with no results) and checking the size of the messages (if the message is
>> bigger than fetch.message.max.bytes it will stop like this) but my messages
>> are all under 300 bytes in size.
>>
>> Have anyone had this problem? Any help would be appreciated
>>
>> Thanks
>>
>>
>

Re: Connection to kafka stalls

Posted by Maximiliano Patricio Méndez <mm...@despegar.com>.
Sorry, I'm using kafka 0.8.2 and a ConsumerGroup similar to what it is in
the documentation.

2016-02-10 14:39 GMT-03:00 Maximiliano Patricio Méndez <mmendez@despegar.com
>:

> Hi,
>
> I'm having trouble with some recurring stalling connections to kafka. What
> I see as a symptom is that some consumers lag behind and most times
> restarting the consumer doesn't help. (occasionally when some other
> consumer tries to take the problematic partition it no longer fails, but
> mostly even when it switches consumer it stalls shortly after).
>
> Doing a thread dump of this situation I see that the call stalls in the
> hasNext() method of the ConsumerIterator, although it has many messages to
> consume and that particular partition for that topic is lagged.
>
> "hermes-consumer-thread-1" #75 prio=5 os_prio=0 tid=0x00007fe430fde000
> nid=0x7c01 waiting on condition [0x00007fe428ce1000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x000000070932c870> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>         at
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:65)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>         at
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>         at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>
> Reading through the mailing list I've come accross old solutions for this
> problem, including checking the consumer.timeout.ms (which i've added
> with no results) and checking the size of the messages (if the message is
> bigger than fetch.message.max.bytes it will stop like this) but my messages
> are all under 300 bytes in size.
>
> Have anyone had this problem? Any help would be appreciated
>
> Thanks
>
>