You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Steven Wu <st...@gmail.com> on 2015/02/04 18:55:01 UTC

error handling with high-level consumer

Hi,

We have observed these two exceptions with consumer *iterator.next()*
recently. want to ask how should we handle them properly.

*1) CRC corruption*
Message is corrupt (stored crc = 433657556, computed crc = 3265543163)

I assume in this case we should just catch it and move on to the next msg?
any other iterator/consumer exception we should catch and handle?


*2) Unrecoverable consumer erorr with "Iterator is in failed state"*

yesterday, one of our kafka consumers got stuck with very large maxalg and
was throwing the following exception.

2015-02-04 08:35:19,841 ERROR KafkaConsumer-0 KafkaConsumer - Exception on
consuming kafka with topic: <topic_name>
java.lang.IllegalStateException: Iterator is in failed state
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
at kafka.consumer.ConsumerIterator.next(ConsumerIterator.scala:46)
at com.netflix.suro.input.kafka.KafkaConsumer$1.run(KafkaConsumer.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

we had a surge of traffic of <topic_name>, so I guess the traffic storm
caused the problem. I tried to restart a few consumer instances but after
rebalancing, another instance got assigned the problematic partitions and
got stuck again with the above errors.

We decided to drop messages, stop all consumer instances, reset all offset
by deleting zk entries and restarted them, the problem went away.

Producer version is kafka_2.8.2-0.8.1.1 with snappy-java-1.0.5
Consumer version is kafka_2.9.2-0.8.2-beta with snappy-java-1.1.1.6

We googled this issue but this was already fixed long time ago on 0.7.x.
any idea? is mismatched snappy version the culpit? is it a bug in
0.8.2-beta?


Thanks,
Steven

Re: error handling with high-level consumer

Posted by Steven Wu <st...@gmail.com>.
Jun,

we are already passing the retention period. so can't go back and do a
DumpLogSegment.
plus there are other factors make this exercise difficult:
1) this topic has very high traffic volume
2) we don't know the msg offset that is corrupted

anyhow, it doesn't happen often. but can you advise proper action/handling
in this case? any other exceptions from iterator.next() that we should
handle?

Thanks,
Steven



On Wed, Feb 4, 2015 at 9:33 PM, Jun Rao <ju...@confluent.io> wrote:

> 1) Does the corruption happen to console consumer as well? If so, could you
> run DumpLogSegment tool to see if the data is corrupted on disk?
>
> Thanks,
>
> Jun
>
>
> On Wed, Feb 4, 2015 at 9:55 AM, Steven Wu <st...@gmail.com> wrote:
>
> > Hi,
> >
> > We have observed these two exceptions with consumer *iterator.next()*
> > recently. want to ask how should we handle them properly.
> >
> > *1) CRC corruption*
> > Message is corrupt (stored crc = 433657556, computed crc = 3265543163)
> >
> > I assume in this case we should just catch it and move on to the next
> msg?
> > any other iterator/consumer exception we should catch and handle?
> >
> >
> > *2) Unrecoverable consumer erorr with "Iterator is in failed state"*
> >
> > yesterday, one of our kafka consumers got stuck with very large maxalg
> and
> > was throwing the following exception.
> >
> > 2015-02-04 08:35:19,841 ERROR KafkaConsumer-0 KafkaConsumer - Exception
> on
> > consuming kafka with topic: <topic_name>
> > java.lang.IllegalStateException: Iterator is in failed state
> > at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
> > at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
> > at kafka.consumer.ConsumerIterator.next(ConsumerIterator.scala:46)
> > at
> com.netflix.suro.input.kafka.KafkaConsumer$1.run(KafkaConsumer.java:103)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > we had a surge of traffic of <topic_name>, so I guess the traffic storm
> > caused the problem. I tried to restart a few consumer instances but after
> > rebalancing, another instance got assigned the problematic partitions and
> > got stuck again with the above errors.
> >
> > We decided to drop messages, stop all consumer instances, reset all
> offset
> > by deleting zk entries and restarted them, the problem went away.
> >
> > Producer version is kafka_2.8.2-0.8.1.1 with snappy-java-1.0.5
> > Consumer version is kafka_2.9.2-0.8.2-beta with snappy-java-1.1.1.6
> >
> > We googled this issue but this was already fixed long time ago on 0.7.x.
> > any idea? is mismatched snappy version the culpit? is it a bug in
> > 0.8.2-beta?
> >
> >
> > Thanks,
> > Steven
> >
>

Re: error handling with high-level consumer

Posted by Jun Rao <ju...@confluent.io>.
1) Does the corruption happen to console consumer as well? If so, could you
run DumpLogSegment tool to see if the data is corrupted on disk?

Thanks,

Jun


On Wed, Feb 4, 2015 at 9:55 AM, Steven Wu <st...@gmail.com> wrote:

> Hi,
>
> We have observed these two exceptions with consumer *iterator.next()*
> recently. want to ask how should we handle them properly.
>
> *1) CRC corruption*
> Message is corrupt (stored crc = 433657556, computed crc = 3265543163)
>
> I assume in this case we should just catch it and move on to the next msg?
> any other iterator/consumer exception we should catch and handle?
>
>
> *2) Unrecoverable consumer erorr with "Iterator is in failed state"*
>
> yesterday, one of our kafka consumers got stuck with very large maxalg and
> was throwing the following exception.
>
> 2015-02-04 08:35:19,841 ERROR KafkaConsumer-0 KafkaConsumer - Exception on
> consuming kafka with topic: <topic_name>
> java.lang.IllegalStateException: Iterator is in failed state
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
> at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
> at kafka.consumer.ConsumerIterator.next(ConsumerIterator.scala:46)
> at com.netflix.suro.input.kafka.KafkaConsumer$1.run(KafkaConsumer.java:103)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> we had a surge of traffic of <topic_name>, so I guess the traffic storm
> caused the problem. I tried to restart a few consumer instances but after
> rebalancing, another instance got assigned the problematic partitions and
> got stuck again with the above errors.
>
> We decided to drop messages, stop all consumer instances, reset all offset
> by deleting zk entries and restarted them, the problem went away.
>
> Producer version is kafka_2.8.2-0.8.1.1 with snappy-java-1.0.5
> Consumer version is kafka_2.9.2-0.8.2-beta with snappy-java-1.1.1.6
>
> We googled this issue but this was already fixed long time ago on 0.7.x.
> any idea? is mismatched snappy version the culpit? is it a bug in
> 0.8.2-beta?
>
>
> Thanks,
> Steven
>