You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "sagarcasual ." <sa...@gmail.com> on 2016/05/24 19:07:17 UTC

Maintain kafka offset externally as Spark streaming processes records.

In spark streaming consuming kafka using KafkaUtils.createDirectStream,
there are examples of the kafka offset level ranges. However if
1. I would like periodically maintain offset level so that if needed I can
reprocess items from a offset. Is there any way I can retrieve offset of a
message in rdd while I am processing each message?
2. Also with offsetranges, I have start and end offset for the RDD, but
what if while processing each record of the RDD system encounters and error
and job ends. Now if I want to begin processing from the record that
failed, how do I first save the last successful offset so that I can start
with that when starting next time.

Appreciate your help.

Re: Maintain kafka offset externally as Spark streaming processes records.

Posted by Cody Koeninger <co...@koeninger.org>.
Have you looked at everything linked from

https://github.com/koeninger/kafka-exactly-once


On Tue, May 24, 2016 at 2:07 PM, sagarcasual . <sa...@gmail.com> wrote:
> In spark streaming consuming kafka using KafkaUtils.createDirectStream,
> there are examples of the kafka offset level ranges. However if
> 1. I would like periodically maintain offset level so that if needed I can
> reprocess items from a offset. Is there any way I can retrieve offset of a
> message in rdd while I am processing each message?
> 2. Also with offsetranges, I have start and end offset for the RDD, but what
> if while processing each record of the RDD system encounters and error and
> job ends. Now if I want to begin processing from the record that failed, how
> do I first save the last successful offset so that I can start with that
> when starting next time.
>
> Appreciate your help.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org