You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pradeep Bhattiprolu <pb...@gmail.com> on 2016/04/09 07:44:46 UTC

Kafka Newbie question

Hi All

I am a newbie to kafka. I am using the new Consumer API in a thread acting
as a consumer for a topic in Kafka.
For my testing and other purposes I have read the queue multiple times
using console-consumer.sh script of kafka.

To start reading the message from the beginning in my java code , I have
set the value of the auto.offset.reset to "earliest".

However that property does not guarantee that i begin reading the messages
from start, it goes by the most recent smallest offset for the consumer
group.

Here is my question,
Is there a assured way of starting to read the messages from beginning from
Java based Kafka Consumer ?
Once I reset one of my consumers to zero, do i have to do offset management
myself for other consumer threads or does kafka automatically lower the
offset to the first threads read offset ?

Any information / material pointing to the solution are highly appreciated.

Thanks
Pradeep

Re: Kafka Newbie question

Posted by R Krishna <kr...@gmail.com>.
Also, a newbie, using 0.9.0.1, I think you meant
auto.offset.reset=earliest, did the OP have an intention to use his own
commit strategy/management by setting enable.auto.commit=false?

With the auto.offset.reset=earliest, a "new" consumer will get the earliest
partition offsets and commit them and start consuming, going forward, they
will use whatever was committed to continue where they left off.

You could also do this programatically as:

consumer.subscribe(Collections.singletonList(KafkaProperties.topic));
        consumer.poll(0);    // << this will update its current
subscriptions first

        // 2. Get these partitions
        consumer.seekToBeginning();    // << then seek to the beginning
        consumer.poll(0);

Feel free to correct me.

On Tue, Apr 12, 2016 at 10:10 PM, Pradeep Bhattiprolu <pb...@gmail.com>
wrote:

> I tried both the approaches stated above, with no luck :(.
> Let me give concrete examples of what i am trying to achieve here :
>
> 1) Kafka Producer adds multiple JSON messages to a particular topic in the
> message broker (done, this part works)
> 2) I want to have multiple consumers identified under a single consumer
> group to read these messages and perform an index into a search engine. To
>  have multiple consumers, I have created a process which spawns multiple
> java threads.
> 3) While creating these threads and adding them to the pool, I select one
> thread as a "leader" thread. The only difference between a leader thread
> and other threads is that the "leader" thread sets the offset to beginning
> by calling Consumer.seekToBeginning() (with no parameters).
> 4) Other threads pick out messages based on the last committed offsets of
> the "leader" thread and other consumer threads.
>
> The idea is to ensure that every thread group of consumers reading a single
> topic should always read from the start of the topic. This is a fair
> assumption to make within my application that each consumer group should
> read from the start.
>
> PS: I am using the new KafkaConsumer class and the latest 0.9.0.0 version
> of the kafka-clients dependency in my project.
>
> Any help / code samples to help me move forward are highly appreciated.
>
> Thanks
> Pradeep
>
> On Sun, Apr 10, 2016 at 2:43 AM, Liquan Pei <li...@gmail.com> wrote:
>
> > Hi Predeep,
> >
> > I think I misinterpreted your question. Are you trying to consume a topic
> > multiple times for each consumer instance or consume one topic with
> > multiple consumer instances?
> >
> > In case that you want to consume a topic multiple times with one consumer
> > instance, `seekToBeginning`` will reset the offset to the beginning in
> the
> > next ``poll``.
> >
> > In case that you want each thread to consume the same topic multiple
> times,
> > you need to use multiple consumer groups. Otherwise, only one consumer
> > instance will be will be consuming the topic.
> >
> > Thanks,
> > Liquan
> >
> >
> >
> > On Sun, Apr 10, 2016 at 12:21 AM, Harsha <ka...@harsha.io> wrote:
> >
> > > Pradeep,
> > >             How about
> > >
> > >
> >
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
> > >
> > > -Harsha
> > >
> > > On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > > > Liquan , thanks for the response.
> > > > By setting the auto commit to false do i have to manage queue offset
> > > > manually ?
> > > > I am running a multiple threads with each thread being a consumer, it
> > > > would
> > > > be complicated to manage offsets across threads, if i dont use
> kafka's
> > > > automatic consumer group abstraction.
> > > >
> > > > Thanks
> > > > Pradeep
> > > >
> > > > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei <li...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Pradeep,
> > > > >
> > > > > Can you try to set enable.auto.commit = false if you want to read
> to
> > > the
> > > > > earliest offset? According to the documentation, auto.offset.reset
> > > controls
> > > > > what to do when there is no initial offset in Kafka or if the
> current
> > > > > offset does not exist any more on the server (e.g. because that
> data
> > > has
> > > > > been deleted). In case that auto commit is enabled, the committed
> > > offset is
> > > > > available in some servers.
> > > > >
> > > > > Thanks,
> > > > > Liquan
> > > > >
> > > > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> > > pbhattip9@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All
> > > > > >
> > > > > > I am a newbie to kafka. I am using the new Consumer API in a
> thread
> > > > > acting
> > > > > > as a consumer for a topic in Kafka.
> > > > > > For my testing and other purposes I have read the queue multiple
> > > times
> > > > > > using console-consumer.sh script of kafka.
> > > > > >
> > > > > > To start reading the message from the beginning in my java code
> , I
> > > have
> > > > > > set the value of the auto.offset.reset to "earliest".
> > > > > >
> > > > > > However that property does not guarantee that i begin reading the
> > > > > messages
> > > > > > from start, it goes by the most recent smallest offset for the
> > > consumer
> > > > > > group.
> > > > > >
> > > > > > Here is my question,
> > > > > > Is there a assured way of starting to read the messages from
> > > beginning
> > > > > from
> > > > > > Java based Kafka Consumer ?
> > > > > > Once I reset one of my consumers to zero, do i have to do offset
> > > > > management
> > > > > > myself for other consumer threads or does kafka automatically
> lower
> > > the
> > > > > > offset to the first threads read offset ?
> > > > > >
> > > > > > Any information / material pointing to the solution are highly
> > > > > appreciated.
> > > > > >
> > > > > > Thanks
> > > > > > Pradeep
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Liquan Pei
> > > > > Software Engineer, Confluent Inc
> > > > >
> > >
> >
> >
> >
> > --
> > Liquan Pei
> > Software Engineer, Confluent Inc
> >
>



-- 
Radha Krishna, Proddaturi
253-234-5657

Re: Kafka Newbie question

Posted by Pradeep Bhattiprolu <pb...@gmail.com>.
I tried both the approaches stated above, with no luck :(.
Let me give concrete examples of what i am trying to achieve here :

1) Kafka Producer adds multiple JSON messages to a particular topic in the
message broker (done, this part works)
2) I want to have multiple consumers identified under a single consumer
group to read these messages and perform an index into a search engine. To
 have multiple consumers, I have created a process which spawns multiple
java threads.
3) While creating these threads and adding them to the pool, I select one
thread as a "leader" thread. The only difference between a leader thread
and other threads is that the "leader" thread sets the offset to beginning
by calling Consumer.seekToBeginning() (with no parameters).
4) Other threads pick out messages based on the last committed offsets of
the "leader" thread and other consumer threads.

The idea is to ensure that every thread group of consumers reading a single
topic should always read from the start of the topic. This is a fair
assumption to make within my application that each consumer group should
read from the start.

PS: I am using the new KafkaConsumer class and the latest 0.9.0.0 version
of the kafka-clients dependency in my project.

Any help / code samples to help me move forward are highly appreciated.

Thanks
Pradeep

On Sun, Apr 10, 2016 at 2:43 AM, Liquan Pei <li...@gmail.com> wrote:

> Hi Predeep,
>
> I think I misinterpreted your question. Are you trying to consume a topic
> multiple times for each consumer instance or consume one topic with
> multiple consumer instances?
>
> In case that you want to consume a topic multiple times with one consumer
> instance, `seekToBeginning`` will reset the offset to the beginning in the
> next ``poll``.
>
> In case that you want each thread to consume the same topic multiple times,
> you need to use multiple consumer groups. Otherwise, only one consumer
> instance will be will be consuming the topic.
>
> Thanks,
> Liquan
>
>
>
> On Sun, Apr 10, 2016 at 12:21 AM, Harsha <ka...@harsha.io> wrote:
>
> > Pradeep,
> >             How about
> >
> >
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
> >
> > -Harsha
> >
> > On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > > Liquan , thanks for the response.
> > > By setting the auto commit to false do i have to manage queue offset
> > > manually ?
> > > I am running a multiple threads with each thread being a consumer, it
> > > would
> > > be complicated to manage offsets across threads, if i dont use kafka's
> > > automatic consumer group abstraction.
> > >
> > > Thanks
> > > Pradeep
> > >
> > > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei <li...@gmail.com>
> wrote:
> > >
> > > > Hi Pradeep,
> > > >
> > > > Can you try to set enable.auto.commit = false if you want to read to
> > the
> > > > earliest offset? According to the documentation, auto.offset.reset
> > controls
> > > > what to do when there is no initial offset in Kafka or if the current
> > > > offset does not exist any more on the server (e.g. because that data
> > has
> > > > been deleted). In case that auto commit is enabled, the committed
> > offset is
> > > > available in some servers.
> > > >
> > > > Thanks,
> > > > Liquan
> > > >
> > > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> > pbhattip9@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > > > acting
> > > > > as a consumer for a topic in Kafka.
> > > > > For my testing and other purposes I have read the queue multiple
> > times
> > > > > using console-consumer.sh script of kafka.
> > > > >
> > > > > To start reading the message from the beginning in my java code , I
> > have
> > > > > set the value of the auto.offset.reset to "earliest".
> > > > >
> > > > > However that property does not guarantee that i begin reading the
> > > > messages
> > > > > from start, it goes by the most recent smallest offset for the
> > consumer
> > > > > group.
> > > > >
> > > > > Here is my question,
> > > > > Is there a assured way of starting to read the messages from
> > beginning
> > > > from
> > > > > Java based Kafka Consumer ?
> > > > > Once I reset one of my consumers to zero, do i have to do offset
> > > > management
> > > > > myself for other consumer threads or does kafka automatically lower
> > the
> > > > > offset to the first threads read offset ?
> > > > >
> > > > > Any information / material pointing to the solution are highly
> > > > appreciated.
> > > > >
> > > > > Thanks
> > > > > Pradeep
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Liquan Pei
> > > > Software Engineer, Confluent Inc
> > > >
> >
>
>
>
> --
> Liquan Pei
> Software Engineer, Confluent Inc
>

Re: Kafka Newbie question

Posted by Liquan Pei <li...@gmail.com>.
Hi Predeep,

I think I misinterpreted your question. Are you trying to consume a topic
multiple times for each consumer instance or consume one topic with
multiple consumer instances?

In case that you want to consume a topic multiple times with one consumer
instance, `seekToBeginning`` will reset the offset to the beginning in the
next ``poll``.

In case that you want each thread to consume the same topic multiple times,
you need to use multiple consumer groups. Otherwise, only one consumer
instance will be will be consuming the topic.

Thanks,
Liquan



On Sun, Apr 10, 2016 at 12:21 AM, Harsha <ka...@harsha.io> wrote:

> Pradeep,
>             How about
>
> https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29
>
> -Harsha
>
> On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> > Liquan , thanks for the response.
> > By setting the auto commit to false do i have to manage queue offset
> > manually ?
> > I am running a multiple threads with each thread being a consumer, it
> > would
> > be complicated to manage offsets across threads, if i dont use kafka's
> > automatic consumer group abstraction.
> >
> > Thanks
> > Pradeep
> >
> > On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei <li...@gmail.com> wrote:
> >
> > > Hi Pradeep,
> > >
> > > Can you try to set enable.auto.commit = false if you want to read to
> the
> > > earliest offset? According to the documentation, auto.offset.reset
> controls
> > > what to do when there is no initial offset in Kafka or if the current
> > > offset does not exist any more on the server (e.g. because that data
> has
> > > been deleted). In case that auto commit is enabled, the committed
> offset is
> > > available in some servers.
> > >
> > > Thanks,
> > > Liquan
> > >
> > > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <
> pbhattip9@gmail.com>
> > > wrote:
> > >
> > > > Hi All
> > > >
> > > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > > acting
> > > > as a consumer for a topic in Kafka.
> > > > For my testing and other purposes I have read the queue multiple
> times
> > > > using console-consumer.sh script of kafka.
> > > >
> > > > To start reading the message from the beginning in my java code , I
> have
> > > > set the value of the auto.offset.reset to "earliest".
> > > >
> > > > However that property does not guarantee that i begin reading the
> > > messages
> > > > from start, it goes by the most recent smallest offset for the
> consumer
> > > > group.
> > > >
> > > > Here is my question,
> > > > Is there a assured way of starting to read the messages from
> beginning
> > > from
> > > > Java based Kafka Consumer ?
> > > > Once I reset one of my consumers to zero, do i have to do offset
> > > management
> > > > myself for other consumer threads or does kafka automatically lower
> the
> > > > offset to the first threads read offset ?
> > > >
> > > > Any information / material pointing to the solution are highly
> > > appreciated.
> > > >
> > > > Thanks
> > > > Pradeep
> > > >
> > >
> > >
> > >
> > > --
> > > Liquan Pei
> > > Software Engineer, Confluent Inc
> > >
>



-- 
Liquan Pei
Software Engineer, Confluent Inc

Re: Kafka Newbie question

Posted by Harsha <ka...@harsha.io>.
Pradeep,
            How about
            https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#seekToBeginning%28org.apache.kafka.common.TopicPartition...%29

-Harsha

On Sat, Apr 9, 2016, at 09:48 PM, Pradeep Bhattiprolu wrote:
> Liquan , thanks for the response.
> By setting the auto commit to false do i have to manage queue offset
> manually ?
> I am running a multiple threads with each thread being a consumer, it
> would
> be complicated to manage offsets across threads, if i dont use kafka's
> automatic consumer group abstraction.
> 
> Thanks
> Pradeep
> 
> On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei <li...@gmail.com> wrote:
> 
> > Hi Pradeep,
> >
> > Can you try to set enable.auto.commit = false if you want to read to the
> > earliest offset? According to the documentation, auto.offset.reset controls
> > what to do when there is no initial offset in Kafka or if the current
> > offset does not exist any more on the server (e.g. because that data has
> > been deleted). In case that auto commit is enabled, the committed offset is
> > available in some servers.
> >
> > Thanks,
> > Liquan
> >
> > On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <pb...@gmail.com>
> > wrote:
> >
> > > Hi All
> > >
> > > I am a newbie to kafka. I am using the new Consumer API in a thread
> > acting
> > > as a consumer for a topic in Kafka.
> > > For my testing and other purposes I have read the queue multiple times
> > > using console-consumer.sh script of kafka.
> > >
> > > To start reading the message from the beginning in my java code , I have
> > > set the value of the auto.offset.reset to "earliest".
> > >
> > > However that property does not guarantee that i begin reading the
> > messages
> > > from start, it goes by the most recent smallest offset for the consumer
> > > group.
> > >
> > > Here is my question,
> > > Is there a assured way of starting to read the messages from beginning
> > from
> > > Java based Kafka Consumer ?
> > > Once I reset one of my consumers to zero, do i have to do offset
> > management
> > > myself for other consumer threads or does kafka automatically lower the
> > > offset to the first threads read offset ?
> > >
> > > Any information / material pointing to the solution are highly
> > appreciated.
> > >
> > > Thanks
> > > Pradeep
> > >
> >
> >
> >
> > --
> > Liquan Pei
> > Software Engineer, Confluent Inc
> >

Re: Kafka Newbie question

Posted by Pradeep Bhattiprolu <pb...@gmail.com>.
Liquan , thanks for the response.
By setting the auto commit to false do i have to manage queue offset
manually ?
I am running a multiple threads with each thread being a consumer, it would
be complicated to manage offsets across threads, if i dont use kafka's
automatic consumer group abstraction.

Thanks
Pradeep

On Sat, Apr 9, 2016 at 3:12 AM, Liquan Pei <li...@gmail.com> wrote:

> Hi Pradeep,
>
> Can you try to set enable.auto.commit = false if you want to read to the
> earliest offset? According to the documentation, auto.offset.reset controls
> what to do when there is no initial offset in Kafka or if the current
> offset does not exist any more on the server (e.g. because that data has
> been deleted). In case that auto commit is enabled, the committed offset is
> available in some servers.
>
> Thanks,
> Liquan
>
> On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <pb...@gmail.com>
> wrote:
>
> > Hi All
> >
> > I am a newbie to kafka. I am using the new Consumer API in a thread
> acting
> > as a consumer for a topic in Kafka.
> > For my testing and other purposes I have read the queue multiple times
> > using console-consumer.sh script of kafka.
> >
> > To start reading the message from the beginning in my java code , I have
> > set the value of the auto.offset.reset to "earliest".
> >
> > However that property does not guarantee that i begin reading the
> messages
> > from start, it goes by the most recent smallest offset for the consumer
> > group.
> >
> > Here is my question,
> > Is there a assured way of starting to read the messages from beginning
> from
> > Java based Kafka Consumer ?
> > Once I reset one of my consumers to zero, do i have to do offset
> management
> > myself for other consumer threads or does kafka automatically lower the
> > offset to the first threads read offset ?
> >
> > Any information / material pointing to the solution are highly
> appreciated.
> >
> > Thanks
> > Pradeep
> >
>
>
>
> --
> Liquan Pei
> Software Engineer, Confluent Inc
>

Re: Kafka Newbie question

Posted by Liquan Pei <li...@gmail.com>.
Hi Pradeep,

Can you try to set enable.auto.commit = false if you want to read to the
earliest offset? According to the documentation, auto.offset.reset controls
what to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data has
been deleted). In case that auto commit is enabled, the committed offset is
available in some servers.

Thanks,
Liquan

On Fri, Apr 8, 2016 at 10:44 PM, Pradeep Bhattiprolu <pb...@gmail.com>
wrote:

> Hi All
>
> I am a newbie to kafka. I am using the new Consumer API in a thread acting
> as a consumer for a topic in Kafka.
> For my testing and other purposes I have read the queue multiple times
> using console-consumer.sh script of kafka.
>
> To start reading the message from the beginning in my java code , I have
> set the value of the auto.offset.reset to "earliest".
>
> However that property does not guarantee that i begin reading the messages
> from start, it goes by the most recent smallest offset for the consumer
> group.
>
> Here is my question,
> Is there a assured way of starting to read the messages from beginning from
> Java based Kafka Consumer ?
> Once I reset one of my consumers to zero, do i have to do offset management
> myself for other consumer threads or does kafka automatically lower the
> offset to the first threads read offset ?
>
> Any information / material pointing to the solution are highly appreciated.
>
> Thanks
> Pradeep
>



-- 
Liquan Pei
Software Engineer, Confluent Inc