You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by pradeep s <sr...@gmail.com> on 2017/09/06 23:06:44 UTC

Storm offset rewind issue

Hi,
Can you please confirm whether the below bug is fixed in Stomr 1.1.0 version
https://issues.apache.org/jira/browse/STORM-1455

We are seeing that consumer offset is getting reset to earliest offset for
few topics in a group.

This is observed in prod environment and there was only info logs . There
were not much we can figure out from the logs.
Any suggestions on how to replicate the issue .

One issue noticed with Kafka cluster is that , we were getting producer
 errors like below

ERROR 2017-09-05 12:39:49,735 [kafka-producer-network-thread | producer-1]
A failure occurred sending a message to Kafka.
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server
is not the leader for that topic-partition.

Thanks
Pradeep

Re: Storm offset rewind issue

Posted by Stig Rohde Døssing <sr...@apache.org>.
Hi,

Aren't you using storm-kafka-client and not storm-kafka? If so STORM-1455
isn't relevant because it's for a different component than what you're
using.

You might consider enabling trace logging for the spout and KafkaConsumer.
It'll produce a lot of logging, but it will also let you see which requests
the consumer sends. More logging is probably your best bet for figuring out
what's going wrong.

For example after adding the following to storm/log4j2/worker.xml
<Logger name="org.apache.kafka" level="TRACE">
        <appender-ref ref="A1"/>
    </Logger>
I get these logs:
2017-09-07 19:02:21.772 o.a.k.c.c.i.ConsumerCoordinator
Thread-16-kafka_spout-executor[4, 4] [DEBUG] Group kafkaSpoutTestGroup
committed offset 2446 for partition kafka-spout-test-0
2017-09-07 19:02:21.772 o.a.k.c.c.i.ConsumerCoordinator
Thread-16-kafka_spout-executor[4, 4] [DEBUG] Group kafkaSpoutTestGroup
committed offset 2446 for partition kafka-spout-test-0
2017-09-07 19:02:21.779 o.a.k.c.c.i.Fetcher
Thread-16-kafka_spout-executor[4, 4] [TRACE] Returning fetched records at
offset 2447 for assigned partition kafka-spout-test-0 and update position
to 2448
2017-09-07 19:02:21.779 o.a.k.c.c.i.Fetcher
Thread-16-kafka_spout-executor[4, 4] [TRACE] Returning fetched records at
offset 2447 for assigned partition kafka-spout-test-0 and update position
to 2448
2017-09-07 19:02:21.779 o.a.k.c.c.i.Fetcher
Thread-16-kafka_spout-executor[4, 4] [TRACE] Added fetch request for
partition kafka-spout-test-1-0 at offset 2452
2017-09-07 19:02:21.779 o.a.k.c.c.i.Fetcher
Thread-16-kafka_spout-executor[4, 4] [TRACE] Added fetch request for
partition kafka-spout-test-1-0 at offset 2452

Regarding the NotLeaderForPartitionException, each partition has a leader
node, which is the broker responsible for reads and writes for that
partition. That exception is telling you that the producer sent a request
to the wrong Kafka node. This can happen if leadership for a partition
passed from one Kafka broker to another, e.g. because one of the brokers
was temporarily down/unavailable. The producer will try to figure out who
the new leader is after that exception. It's probably not a concern unless
the exceptions happen very frequently. Then you'd want to investigate why
leadership is changing so often.

2017-09-07 1:06 GMT+02:00 pradeep s <sr...@gmail.com>:

> Hi,
> Can you please confirm whether the below bug is fixed in Stomr 1.1.0
> version
> https://issues.apache.org/jira/browse/STORM-1455
>
> We are seeing that consumer offset is getting reset to earliest offset for
> few topics in a group.
>
> This is observed in prod environment and there was only info logs . There
> were not much we can figure out from the logs.
> Any suggestions on how to replicate the issue .
>
> One issue noticed with Kafka cluster is that , we were getting producer
>  errors like below
>
> ERROR 2017-09-05 12:39:49,735 [kafka-producer-network-thread | producer-1]
> A failure occurred sending a message to Kafka.
> org.apache.kafka.common.errors.NotLeaderForPartitionException: This
> server is not the leader for that topic-partition.
>
> Thanks
> Pradeep
>
>
>