You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Rajib Deb <Ra...@infosys.com> on 2020/05/12 01:51:36 UTC

Offset Management...

Hi, I wanted to know if it is a good practice to develop a custom offset management method while consuming from Kafka. I am thinking to develop it as below.


  1.  Create a PartitionInfo named tuple as below

PartitionInfo("PartitionInfo",["header","custom writer","offset"]

  1.  Then populate the tuple with the header, writer and last offset details
  2.  Write the tuple in a file/database once the consumer commits the message
  3.  Next time when consumer starts, it checks the last offset and reads from there

Thanks
Rajib


RE: Offset Management...

Posted by Rajib Deb <Ra...@infosys.com>.
Thanks Bill, my apologies I did not elaborate my use case. 

In my use case, the data from Cassandra is pushed to Kafka and then we consume from Kafka to snowflake. Once we push the data to snowflake, we do not want to go back to the source(Cassandra) to pull the data again. There are occasions where we are asked to pull the data for a certain date and time. I thought storing the offset will help with that use case. The other item is our validation framework. We need to validate that I am processing all the rows that Cassandra is pushing to kafka. So the validation program needs to look at number of rows in Cassandra for a particular key and see if we have that many messages in Kafka and Snowflake for that key.


Thanks
Rajib

-----Original Message-----
From: Bill Bejeck <bi...@confluent.io> 
Sent: Tuesday, May 12, 2020 7:41 AM
To: users@kafka.apache.org
Subject: Re: Offset Management...

[**EXTERNAL EMAIL**]

Hi Rajib,

Generally, it's best to let Kafka handle the offset management.
Under normal circumstances, when you restart a consumer, it will start reading records from the last committed offset, there's no need for you to manage that process yourself.
If you need manually commit records vs. using auto-commit, then you can use one of the commit API methods commitSync <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitSync--&amp;data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&amp;sdata=Qnz%2BuwyN477ga76v6s4vjWU%2BVE0m%2FRXYILWK5CBF0wo%3D&amp;reserved=0>
 or commitAsync
<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkafka.apache.org%2F25%2Fjavadoc%2Forg%2Fapache%2Fkafka%2Fclients%2Fconsumer%2FKafkaConsumer.html%23commitAsync-org.apache.kafka.clients.consumer.OffsetCommitCallback-&amp;data=02%7C01%7CRajib_Deb%40infosys.com%7C1cc4ca606f1040f79a5508d7f68662ca%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C1%7C0%7C637248929371831115&amp;sdata=EGFjUW59I%2BPtNXsf2Cm02PWcXMjL3oBi0nLhgVE4KAg%3D&amp;reserved=0>
.

-Bill


On Mon, May 11, 2020 at 9:52 PM Rajib Deb <Ra...@infosys.com> wrote:

> Hi, I wanted to know if it is a good practice to develop a custom 
> offset management method while consuming from Kafka. I am thinking to 
> develop it as below.
>
>
>   1.  Create a PartitionInfo named tuple as below
>
> PartitionInfo("PartitionInfo",["header","custom writer","offset"]
>
>   1.  Then populate the tuple with the header, writer and last offset 
> details
>   2.  Write the tuple in a file/database once the consumer commits the 
> message
>   3.  Next time when consumer starts, it checks the last offset and 
> reads from there
>
> Thanks
> Rajib
>
>

Re: Offset Management...

Posted by Bill Bejeck <bi...@confluent.io>.
Hi Rajib,

Generally, it's best to let Kafka handle the offset management.
Under normal circumstances, when you restart a consumer, it will start
reading records from the last committed offset, there's no need for you to
manage that process yourself.
If you need manually commit records vs. using auto-commit, then you can use
one of the commit API methods
commitSync
<https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#commitSync-->
 or commitAsync
<https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#commitAsync-org.apache.kafka.clients.consumer.OffsetCommitCallback->
.

-Bill


On Mon, May 11, 2020 at 9:52 PM Rajib Deb <Ra...@infosys.com> wrote:

> Hi, I wanted to know if it is a good practice to develop a custom offset
> management method while consuming from Kafka. I am thinking to develop it
> as below.
>
>
>   1.  Create a PartitionInfo named tuple as below
>
> PartitionInfo("PartitionInfo",["header","custom writer","offset"]
>
>   1.  Then populate the tuple with the header, writer and last offset
> details
>   2.  Write the tuple in a file/database once the consumer commits the
> message
>   3.  Next time when consumer starts, it checks the last offset and reads
> from there
>
> Thanks
> Rajib
>
>