You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "James Ranson (JIRA)" <ji...@apache.org> on 2016/05/31 15:30:12 UTC

[jira] [Updated] (KAFKA-3772) MirrorMaker crashes on Corrupted Message

     [ https://issues.apache.org/jira/browse/KAFKA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Ranson updated KAFKA-3772:
--------------------------------
    Description: 
We recently came across an issue where a message on our source kafka cluster became corrupted. When MirrorMaker tried to consume this message, the thread crashed and caused the entire process to also crash. Each time we attempted to restart MM, it crashed on the same message. There is no information in the MM logs about which message it was trying to consume (what topic, what offset, etc). So the only way we were able to get past the issue was to go into the zookeeper tree for our mirror consumer group and increment the offset for every partition on every topic until the MM process could start without crashing. This is not a tenable operational solution. MirrorMaker should gracefully skip corrupt messages since they will never be able to be replicated anyway.

{noformat}2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread failure due to
kafka.message.InvalidMessageException: Message is corrupt (stored crc = 33747148, computed crc = 3550736267)
	at kafka.message.Message.ensureValid(Message.scala:167)
	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
	at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
	at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
	at kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
	at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)

2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the whole mirror maker.{noformat}

  was:
We recently came across an issue where a message on our source kafka cluster became corrupted. When MirrorMaker tried to consume this message, the thread crashed and caused the entire process to also crash. Each time we attempted to restart MM, it crashed on the same message. There is no information in the MM logs about which message it was trying to consume (what topic, what offset, etc). So the only way we were able to get past the issue was to go into the zookeeper tree for our mirror consumer group and increment the offset for every partition on every topic until the MM process could start without crashing. This is not a tenable operational solution. MirrorMaker should gracefully skip corrupt messages since they will never be able to be replicated anyway.

```2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread failure due to
kafka.message.InvalidMessageException: Message is corrupt (stored crc = 33747148, computed crc = 3550736267)
	at kafka.message.Message.ensureValid(Message.scala:167)
	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
	at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
	at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
	at kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
	at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)

2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the whole mirror maker.```


> MirrorMaker crashes on Corrupted Message
> ----------------------------------------
>
>                 Key: KAFKA-3772
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3772
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.1
>            Reporter: James Ranson
>              Labels: mirror-maker
>
> We recently came across an issue where a message on our source kafka cluster became corrupted. When MirrorMaker tried to consume this message, the thread crashed and caused the entire process to also crash. Each time we attempted to restart MM, it crashed on the same message. There is no information in the MM logs about which message it was trying to consume (what topic, what offset, etc). So the only way we were able to get past the issue was to go into the zookeeper tree for our mirror consumer group and increment the offset for every partition on every topic until the MM process could start without crashing. This is not a tenable operational solution. MirrorMaker should gracefully skip corrupt messages since they will never be able to be replicated anyway.
> {noformat}2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread failure due to
> kafka.message.InvalidMessageException: Message is corrupt (stored crc = 33747148, computed crc = 3550736267)
> 	at kafka.message.Message.ensureValid(Message.scala:167)
> 	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
> 	at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
> 	at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
> 	at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
> 	at kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
> 	at kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)
> 2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}] [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the whole mirror maker.{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Updated] (KAFKA-3772) MirrorMaker crashes on Corrupted Message

Posted by Gerard Klijs <ge...@dizzit.com>.
Just had a look at the code, as I would like some way to prevent such a
scenario happening to us. It seems you can't prevent the mirror maker from
exiting.
In the 0.10 mirror maker, any exception which is not an
ConsumerTimeoutException, or an WakeupException, will cause the whole
mirror maker to shutdown, because the finally in the MirrorMakerThread will
be executed. I would however expect in
the commitOffsets(mirrorMakerConsumer) part the offset will be committed,
and at a second time it would start just after the corrupted record.
I think it would be better to include at least the InvalidMessageException
to the exceptions which are only logged, but this could possibly lead to
other problems, when there is a solvable way to prevent the
InvalidMessageException,
and you are losing those records, because they are just skipped.

On Tue, May 31, 2016 at 5:30 PM James Ranson (JIRA) <ji...@apache.org> wrote:

>
>      [
> https://issues.apache.org/jira/browse/KAFKA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> James Ranson updated KAFKA-3772:
> --------------------------------
>     Description:
> We recently came across an issue where a message on our source kafka
> cluster became corrupted. When MirrorMaker tried to consume this message,
> the thread crashed and caused the entire process to also crash. Each time
> we attempted to restart MM, it crashed on the same message. There is no
> information in the MM logs about which message it was trying to consume
> (what topic, what offset, etc). So the only way we were able to get past
> the issue was to go into the zookeeper tree for our mirror consumer group
> and increment the offset for every partition on every topic until the MM
> process could start without crashing. This is not a tenable operational
> solution. MirrorMaker should gracefully skip corrupt messages since they
> will never be able to be replicated anyway.
>
> {noformat}2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread -
> [{}] [mirrormaker-thread-3] Mirror maker thread failure due to
> kafka.message.InvalidMessageException: Message is corrupt (stored crc =
> 33747148, computed crc = 3550736267)
>         at kafka.message.Message.ensureValid(Message.scala:167)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>         at
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>         at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>         at
> kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
>         at
> kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)
>
> 2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}]
> [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the
> whole mirror maker.{noformat}
>
>   was:
> We recently came across an issue where a message on our source kafka
> cluster became corrupted. When MirrorMaker tried to consume this message,
> the thread crashed and caused the entire process to also crash. Each time
> we attempted to restart MM, it crashed on the same message. There is no
> information in the MM logs about which message it was trying to consume
> (what topic, what offset, etc). So the only way we were able to get past
> the issue was to go into the zookeeper tree for our mirror consumer group
> and increment the offset for every partition on every topic until the MM
> process could start without crashing. This is not a tenable operational
> solution. MirrorMaker should gracefully skip corrupt messages since they
> will never be able to be replicated anyway.
>
> ```2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread - [{}]
> [mirrormaker-thread-3] Mirror maker thread failure due to
> kafka.message.InvalidMessageException: Message is corrupt (stored crc =
> 33747148, computed crc = 3550736267)
>         at kafka.message.Message.ensureValid(Message.scala:167)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
>         at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
>         at
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
>         at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
>         at
> kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
>         at
> kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)
>
> 2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}]
> [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the
> whole mirror maker.```
>
>
> > MirrorMaker crashes on Corrupted Message
> > ----------------------------------------
> >
> >                 Key: KAFKA-3772
> >                 URL: https://issues.apache.org/jira/browse/KAFKA-3772
> >             Project: Kafka
> >          Issue Type: Bug
> >          Components: core
> >    Affects Versions: 0.9.0.1
> >            Reporter: James Ranson
> >              Labels: mirror-maker
> >
> > We recently came across an issue where a message on our source kafka
> cluster became corrupted. When MirrorMaker tried to consume this message,
> the thread crashed and caused the entire process to also crash. Each time
> we attempted to restart MM, it crashed on the same message. There is no
> information in the MM logs about which message it was trying to consume
> (what topic, what offset, etc). So the only way we were able to get past
> the issue was to go into the zookeeper tree for our mirror consumer group
> and increment the offset for every partition on every topic until the MM
> process could start without crashing. This is not a tenable operational
> solution. MirrorMaker should gracefully skip corrupt messages since they
> will never be able to be replicated anyway.
> > {noformat}2016-05-26 20:02:26,787 FATAL  MirrorMaker$MirrorMakerThread -
> [{}] [mirrormaker-thread-3] Mirror maker thread failure due to
> > kafka.message.InvalidMessageException: Message is corrupt (stored crc =
> 33747148, computed crc = 3550736267)
> >       at kafka.message.Message.ensureValid(Message.scala:167)
> >       at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101)
> >       at
> kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
> >       at
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
> >       at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
> >       at
> kafka.tools.MirrorMaker$MirrorMakerOldConsumer.hasData(MirrorMaker.scala:483)
> >       at
> kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:394)
> > 2016-05-26 20:02:27,580 FATAL  MirrorMaker$MirrorMakerThread - [{}]
> [mirrormaker-thread-3] Mirror maker thread exited abnormally, stopping the
> whole mirror maker.{noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>