You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Oleg Ruchovets <or...@gmail.com> on 2013/07/31 17:22:25 UTC

reprocessing messages in kafka

Hi ,

I just don't know which mail list is correct to post this question( storm
or kafka)? Sorry for cross post.

    I just read the documentation which describe guaranteed message
processing with storm -
https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing.

The question actually what will be with the message which was consumed by
storm and it is failed to process. In case I'll use anchoring technique ,
trying to process the message the second time:  will this be available in
kafka ( I am using storm-kafka spout)?

I mean Is it possible to consume the same message in kafka more then one
time with the same consumer?

Thanks
Oleg.

Re: reprocessing messages in kafka

Posted by Tejas Patil <te...@gmail.com>.
As @Milind said, it is possible that a consumer consumes the same message
more than once.
This happens when there is an unclean shutdown of the consumer and it is
not able to commit its latest offset to Zookeeper. When the failed consumer
comes up, it would fetch the stale offset from zookeeper thus re-consuming
a small set of messages.


On Wed, Jul 31, 2013 at 3:31 PM, Milind Parikh <mi...@gmail.com>wrote:

> It is possible to consume the same message more than once with the same
> consumer. However WHAT you actually do with the message (such as idempotent
> writes) is the tricker part.
>
> Regards
> Milind
>
>
>
> On Wed, Jul 31, 2013 at 8:22 AM, Oleg Ruchovets <oruchovets@gmail.com
> >wrote:
>
> > Hi ,
> >
> > I just don't know which mail list is correct to post this question( storm
> > or kafka)? Sorry for cross post.
> >
> >     I just read the documentation which describe guaranteed message
> > processing with storm -
> > https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
> .
> >
> > The question actually what will be with the message which was consumed by
> > storm and it is failed to process. In case I'll use anchoring technique ,
> > trying to process the message the second time:  will this be available in
> > kafka ( I am using storm-kafka spout)?
> >
> > I mean Is it possible to consume the same message in kafka more then one
> > time with the same consumer?
> >
> > Thanks
> > Oleg.
> >
>

Re: reprocessing messages in kafka

Posted by Oleg Ruchovets <or...@gmail.com>.
Hi ,

found this capabilities in storm Spout.
https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka

Another very useful config in the spout is the ability to force the spout
to rewind to a previous offset. You do forceStartOffsetTime on the spout
config, like so:

spoutConfig.forceStartOffsetTime(-2);



Thanks

Oleg.



On Thu, Aug 1, 2013 at 6:08 PM, Jun Rao <ju...@gmail.com> wrote:

> Kafka allows a consumer to rewind the consumption since messages are kept
> in the broker by a retention policy (defaults to 7 days). I am not exactly
> sure how Storm works. My guess is that it only checkpoints the consumer
> offset after all messages before that offset have been processed
> successfully. Could you confirm this from the Storm guys?
>
> Thanks,
>
> Jun
>
>
> On Thu, Aug 1, 2013 at 4:31 AM, Oleg Ruchovets <or...@gmail.com>
> wrote:
>
> > I try to resolve such behavior:
> >     suppose storm consumes messages from kafka. In case part of it's
> > consumers crashed for any reasons and as a result didn't succeed to
> process
> > the consumed messages. But if it is impossible after recover to reprocess
> > these messages the system will not be robust and it has data integrity
> > issues.
> >
> > That is why I try to understand what is the Kafka capabilities. I just
> > don't know what is the best practice to do it.
> > May be it is a matter of configuration ?
> >
> > Please advice.
> > Thanks
> > Oleg.
> >
> >
> > On Thu, Aug 1, 2013 at 1:31 AM, Milind Parikh <milindparikh@gmail.com
> > >wrote:
> >
> > > It is possible to consume the same message more than once with the same
> > > consumer. However WHAT you actually do with the message (such as
> > idempotent
> > > writes) is the tricker part.
> > >
> > > Regards
> > > Milind
> > >
> > >
> > >
> > > On Wed, Jul 31, 2013 at 8:22 AM, Oleg Ruchovets <oruchovets@gmail.com
> > > >wrote:
> > >
> > > > Hi ,
> > > >
> > > > I just don't know which mail list is correct to post this question(
> > storm
> > > > or kafka)? Sorry for cross post.
> > > >
> > > >     I just read the documentation which describe guaranteed message
> > > > processing with storm -
> > > >
> > https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
> > > .
> > > >
> > > > The question actually what will be with the message which was
> consumed
> > by
> > > > storm and it is failed to process. In case I'll use anchoring
> > technique ,
> > > > trying to process the message the second time:  will this be
> available
> > in
> > > > kafka ( I am using storm-kafka spout)?
> > > >
> > > > I mean Is it possible to consume the same message in kafka more then
> > one
> > > > time with the same consumer?
> > > >
> > > > Thanks
> > > > Oleg.
> > > >
> > >
> >
>

Re: reprocessing messages in kafka

Posted by Jun Rao <ju...@gmail.com>.
Kafka allows a consumer to rewind the consumption since messages are kept
in the broker by a retention policy (defaults to 7 days). I am not exactly
sure how Storm works. My guess is that it only checkpoints the consumer
offset after all messages before that offset have been processed
successfully. Could you confirm this from the Storm guys?

Thanks,

Jun


On Thu, Aug 1, 2013 at 4:31 AM, Oleg Ruchovets <or...@gmail.com> wrote:

> I try to resolve such behavior:
>     suppose storm consumes messages from kafka. In case part of it's
> consumers crashed for any reasons and as a result didn't succeed to process
> the consumed messages. But if it is impossible after recover to reprocess
> these messages the system will not be robust and it has data integrity
> issues.
>
> That is why I try to understand what is the Kafka capabilities. I just
> don't know what is the best practice to do it.
> May be it is a matter of configuration ?
>
> Please advice.
> Thanks
> Oleg.
>
>
> On Thu, Aug 1, 2013 at 1:31 AM, Milind Parikh <milindparikh@gmail.com
> >wrote:
>
> > It is possible to consume the same message more than once with the same
> > consumer. However WHAT you actually do with the message (such as
> idempotent
> > writes) is the tricker part.
> >
> > Regards
> > Milind
> >
> >
> >
> > On Wed, Jul 31, 2013 at 8:22 AM, Oleg Ruchovets <oruchovets@gmail.com
> > >wrote:
> >
> > > Hi ,
> > >
> > > I just don't know which mail list is correct to post this question(
> storm
> > > or kafka)? Sorry for cross post.
> > >
> > >     I just read the documentation which describe guaranteed message
> > > processing with storm -
> > >
> https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
> > .
> > >
> > > The question actually what will be with the message which was consumed
> by
> > > storm and it is failed to process. In case I'll use anchoring
> technique ,
> > > trying to process the message the second time:  will this be available
> in
> > > kafka ( I am using storm-kafka spout)?
> > >
> > > I mean Is it possible to consume the same message in kafka more then
> one
> > > time with the same consumer?
> > >
> > > Thanks
> > > Oleg.
> > >
> >
>

Re: reprocessing messages in kafka

Posted by Oleg Ruchovets <or...@gmail.com>.
I try to resolve such behavior:
    suppose storm consumes messages from kafka. In case part of it's
consumers crashed for any reasons and as a result didn't succeed to process
the consumed messages. But if it is impossible after recover to reprocess
these messages the system will not be robust and it has data integrity
issues.

That is why I try to understand what is the Kafka capabilities. I just
don't know what is the best practice to do it.
May be it is a matter of configuration ?

Please advice.
Thanks
Oleg.


On Thu, Aug 1, 2013 at 1:31 AM, Milind Parikh <mi...@gmail.com>wrote:

> It is possible to consume the same message more than once with the same
> consumer. However WHAT you actually do with the message (such as idempotent
> writes) is the tricker part.
>
> Regards
> Milind
>
>
>
> On Wed, Jul 31, 2013 at 8:22 AM, Oleg Ruchovets <oruchovets@gmail.com
> >wrote:
>
> > Hi ,
> >
> > I just don't know which mail list is correct to post this question( storm
> > or kafka)? Sorry for cross post.
> >
> >     I just read the documentation which describe guaranteed message
> > processing with storm -
> > https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing
> .
> >
> > The question actually what will be with the message which was consumed by
> > storm and it is failed to process. In case I'll use anchoring technique ,
> > trying to process the message the second time:  will this be available in
> > kafka ( I am using storm-kafka spout)?
> >
> > I mean Is it possible to consume the same message in kafka more then one
> > time with the same consumer?
> >
> > Thanks
> > Oleg.
> >
>

Re: reprocessing messages in kafka

Posted by Milind Parikh <mi...@gmail.com>.
It is possible to consume the same message more than once with the same
consumer. However WHAT you actually do with the message (such as idempotent
writes) is the tricker part.

Regards
Milind



On Wed, Jul 31, 2013 at 8:22 AM, Oleg Ruchovets <or...@gmail.com>wrote:

> Hi ,
>
> I just don't know which mail list is correct to post this question( storm
> or kafka)? Sorry for cross post.
>
>     I just read the documentation which describe guaranteed message
> processing with storm -
> https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing.
>
> The question actually what will be with the message which was consumed by
> storm and it is failed to process. In case I'll use anchoring technique ,
> trying to process the message the second time:  will this be available in
> kafka ( I am using storm-kafka spout)?
>
> I mean Is it possible to consume the same message in kafka more then one
> time with the same consumer?
>
> Thanks
> Oleg.
>