You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Campbell <mi...@gmail.com> on 2014/06/11 22:53:07 UTC

Kafka client - specify offsets?

Is there a way in the Apache Spark Kafka Utils to specify an offset to
start reading?  Specifically, from the start of the queue, or failing that,
a specific point?

Re: Kafka client - specify offsets?

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Michael,

apparently, the parameter "auto.offset.reset" has a different meaning
in Spark's Kafka implementation than what is described in the
documentation.

The Kafka docs at <https://kafka.apache.org/documentation.html>
specify the effect of "auto.offset.reset" as:
> What to do when there is no initial offset in ZooKeeper or if an offset is out of range:
> * smallest : automatically reset the offset to the smallest offset
> * largest : automatically reset the offset to the largest offset
> * anything else: throw exception to the consumer

However, Spark's implementation seems to drop the part "when there is
no initial offset", as can be seen in
https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaInputDStream.scala#L102
-- it will just wipe the stored offset from Zookeeper. I guess it's
actually a bug, because the parameter's effect is different than what
is documented, but then it's good for you (and me) because it allows
to specify "I want all that I can get" or "I want to start reading
right now", even if there is an offset stored in Zookeeper.

Tobias

On Sun, Jun 15, 2014 at 11:27 PM, Tobias Pfeiffer <tg...@preferred.jp> wrote:
> Hi,
>
> there are apparently helpers to tell you the offsets
> <https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example#id-0.8.0SimpleConsumerExample-FindingStartingOffsetforReads>,
> but I have no idea how to pass that to the Kafka stream consumer. I am
> interested in that as well.
>
> Tobias
>
> On Thu, Jun 12, 2014 at 5:53 AM, Michael Campbell
> <mi...@gmail.com> wrote:
>> Is there a way in the Apache Spark Kafka Utils to specify an offset to start
>> reading?  Specifically, from the start of the queue, or failing that, a
>> specific point?

Re: Kafka client - specify offsets?

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

there are apparently helpers to tell you the offsets
<https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example#id-0.8.0SimpleConsumerExample-FindingStartingOffsetforReads>,
but I have no idea how to pass that to the Kafka stream consumer. I am
interested in that as well.

Tobias

On Thu, Jun 12, 2014 at 5:53 AM, Michael Campbell
<mi...@gmail.com> wrote:
> Is there a way in the Apache Spark Kafka Utils to specify an offset to start
> reading?  Specifically, from the start of the queue, or failing that, a
> specific point?