You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by John Bickerstaff <jo...@johnbickerstaff.com> on 2016/02/17 20:16:40 UTC

Resetting Kafka Offsets -- and What are offsets.... exactly?

*Use Case: Disaster Recovery & Re-indexing SOLR*

I'm using Kafka to hold messages from a service that prepares "documents"
for SOLR.

A second micro service (a consumer) requests these messages, does any final
processing, and fires them into SOLR.

The whole thing is (in part) designed to be used for disaster recovery -
allowing the rebuild of the SOLR index in the shortest possible time.

To do this (and to be able to use it for re-indexing SOLR while testing
relevancy) I need to be able to "play all messages from the beginning" at
will.

I find I can use the zkCli.sh tool to delete the Consumer Group Name like
this:
     rmr /kafka/consumers/myGroupName

After which my microservice will get all the messages again when it runs.

I was trying to find a way to do this programmatically without actually
using the "low level" consumer api since the high level one is so simple
and my code already works.  So I started playing with Zookeeper api for
duplicating "rmr /kafka/consumers/myGroupName"

*The Question: What does that offset actually represent?*

It was at this point that I discovered the offset must represent something
other than what I thought it would.  Things obviously work, but I'm
wondering what - exactly do the offsets represent?

To clarify - if I run this command on a zookeeper node, after the
microservice has run:
     get /kafka/consumers/myGroupName/offsets/myTopicName/0

I get the following:

30024
cZxid = 0x3600000355
ctime = Fri Feb 12 07:27:50 MST 2016
mZxid = 0x3600000357
mtime = Fri Feb 12 07:29:50 MST 2016
pZxid = 0x3600000355
cversion = 0
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0

Now - I have exactly 3500 messages in this Kafka topic.  I verify that by
running this command:
     bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
--topic myTopicName --from-beginning

When I hit Ctrl-C, it tells me it consumed 3500 messages.

So - what does that 30024 actually represent?  If I reset that number to 1
or 0 and re-run my consumer microservice, I get all the messages again -
and the number again goes to 30024.  However, I'm not comfortable to trust
that because my assumption that the number represents a simple count of
messages that have been sent to this consumer is obviously wrong.

(I reset the number like this -- to 1 -- and assume there's an API command
that will do it too.)
     set /kafka/consumers/myGroupName/offsets/myTopicName/0 1

Can someone help me clarify or point me at a doc that explains what is
getting counted here?  You can shoot me if you like for attempting the
hack-ish solution of re-setting the offset through the Zookeeper API, but I
would still like to understand what, exactly, is represented by that number
30024.

I need to hand off to IT for the Disaster Recovery portion and saying
"trust me, it just works" isn't going to fly very far...

Thanks.

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by Leo Lin <le...@brigade.com>.
Hi John,

Glad to help :) I ran into similar issues recently being confused by what
does the offsets mean as well so I understand you pain haha.

Best of luck,
Leo

On Tue, Feb 23, 2016 at 1:53 PM, John Bickerstaff <jo...@johnbickerstaff.com>
wrote:

> Thanks Leo!
>
> =========
> TL;DR summary:
>
> You're correct - I didn't absolutely need the offset.
> I had to provide Disaster Recovery advice and couldn't explain the offset
> numbers, which wouldn't fly
> Explanation for how I got myself confused in the text below -- in case it
> helps someone else later.
> Thanks for your reply!
> =========
>
> You're right.  Strictly speaking, I don't need the offset.  In my testing
> I've been issuing the rmr /kafka/consumers command from the Zookeeper
> zkCli.sh.
> I'm adding it to my microservice using the Zookeeper API this week - since
> that seems a lot easier than figuring out the low level Kafka API code and
> it works just as well.
>
> Being a developer, I just couldn't help trying to change the least
> significant thing required to get the job done - and the Zookeeper API does
> allow me to change that offset number...  Which led me to try to understand
> why that number wasn't matching my expectations...
>
> In addition, I'm building a SOLR / Kafka / Zookeeper infrastructure from
> scratch and part of my mandate is to provide a handoff to our (very capable
> and very careful) IT manager.  The handoff is to include plans and
> documentation for disaster recovery as well as how to build and manage the
> cluster.
>
> For both of those reasons, my curiosity was piqued and I wanted to find out
> exactly what was going on.  I could just imagine the look on our IT
> manager's face when I said "Trust me, the numbers don't line up, but it
> won't affect disaster recovery."
>
> In hindsight, I understand what I did that confused me.  Since I'm still in
> development "mode" I sent messages to the same topic repeatedly for weeks.
> Then instead of deleting the topic, I issued the following command to reset
> the retention of the messages like this:
>
> bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
> topicName --config retention.ms=1000
>
> Then I reset it once the messages were deleted, thus:
>
> bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
> topicName --delete-config retention.ms
>
> What I didn't realize is that (not unreasonably) the offset count isn't
> reset by changing the config retention setting.  As you said, it won't
> necessarily be 0.
>
> Sending the same set of messages repeatedly resulted in having a very large
> count in the offset - a count that bore no relation to the number of
> messages in the topic - which worried me because I couldn't explain it --
> and things I can't explain make me nervous in the context of disaster
> recovery...
>
> I appreciate your confirmation of my theory about what is going on.
>
> --JohnB (aka solrJohn)
>
> On Thu, Feb 18, 2016 at 12:19 PM, Leo Lin <le...@brigade.com> wrote:
>
> > Hi John,
> >
> > Kafka offsets are sequential id numbers that identify messages in each
> > partition. It might not be sequential within a topic (which can have
> > multiple partition).
> >
> > Offsets don't necessarily start at 0 since messages are deleted.
> >
> > .bin/kafka-run-class.sh kafka.tools.GetOffsetShell is pretty neat to look
> > at offsets in your topic
> >
> > I'm not sure why resetting offset is needed in your case. If you need to
> > read from the beginning using the high level consumer,
> > you just need to delete that consumer group in zookeeper and set
> > "auto.offset.reset"  to "smallest". (this will direct the consumer to
> look
> > for smallest offset if it doesnt find one in zookeeper)
> >
> > On Wed, Feb 17, 2016 at 1:06 PM, John Bickerstaff <
> > john@johnbickerstaff.com>
> > wrote:
> >
> > > Hmmm...  more info.
> > >
> > > So, inside /var/log/kafka-logs/myTopicName-0 I find two files
> > >
> > > 00000000000000026524.index  00000000000000026524.log
> > >
> > > Interestingly, they both bear the number of the "lowest" offset
> returned
> > by
> > > the command I mention above.
> > >
> > > If I "cat" the 000.....26524.log file, I get all my messages on the
> > > commandline as if I'd issued the --from-beginning command
> > >
> > > I'm not sure what the index has, it's unreadable by the simple tools
> I've
> > > tried....
> > >
> > > I'm still scratching my head a bit - as the link you sent for Kafka
> > > introduction says this:
> > >
> > > The messages in the partitions are each assigned a sequential id number
> > > called the *offset* that uniquely identifies each message within the
> > > partition.
> > > I see how that could be exactly what you said (the previous message(s)
> > byte
> > > count) -- but the picture implies that it's a linear progression -
> 1,2,3
> > > etc...  (and that could be an oversimplification for purposes of the
> > > introduction - I get that...)
> > >
> > > Feel free to comment or not - I'm going to keep digging into it as
> best I
> > > can - any clarifications will be gratefully accepted...
> > >
> > >
> > >
> > > On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <
> > > john@johnbickerstaff.com>
> > > wrote:
> > >
> > > > Thank you Christian -- I appreciate your taking the time to help me
> out
> > > on
> > > > this.
> > > >
> > > > Here's what I found while continuing to dig into this.
> > > >
> > > > If I take 30024 and subtract the number of messages I know I have in
> > > Kafka
> > > > (3500) I get 26524.
> > > >
> > > > If I reset thus:  set
> > /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > > 26524
> > > >
> > > > ... and then re-run my consumer - I get all 3500 messages again.
> > > >
> > > > If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > 26624
> > > >
> > > > In other words, I increase the offset number by 100 -- then I get
> > exactly
> > > > 3400 messages on my consumer --  exactly 100 less than before which I
> > > think
> > > > makes sense, since I started the offset 100 higher...
> > > >
> > > > This seems to suggest that each number between 26624 and 30024 in the
> > log
> > > > represents one of my 3500 messages on this topic, but what you say
> > > suggests
> > > > that they represent byte count of the actual messages and not "one
> > number
> > > > per message"...
> > > >
> > > > I also find that if I issue this command:
> > > >
> > > > bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> > > > --broker-list=192.168.56.3:9092  --time=-2
> > > >
> > > > I get back that same number -- 26524...
> > > >
> > > > Hmmmm....  A little confused still...  These messages are literally
> > > stored
> > > > in the Kafka logs, yes?  I think I'll go digging in there and see...
> > > >
> > > > Thanks again!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> > > > christian.posta@gmail.com> wrote:
> > > >
> > > >> The number is the log-ordered number of bytes. So really, the offset
> > is
> > > >> kinda like the "number of bytes" to begin reading from. 0 means read
> > the
> > > >> log from the beginning. The second message is 0 + size of message.
> So
> > > the
> > > >> message "ids" are really just the offset of the previous message
> > sizes.
> > > >>
> > > >> For example, if I have three messages of 10 bytes each, and set the
> > > >> consumer offset to 0, i'll read everything. If you set the offset to
> > 10,
> > > >> I'll read the second and third messages, and so on.
> > > >>
> > > >> see more here:
> > > >>
> > > >>
> > >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > > >> and here: http://kafka.apache.org/documentation.html#introduction
> > > >>
> > > >> HTH!
> > > >>
> > > >> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> > > >> john@johnbickerstaff.com
> > > >> > wrote:
> > > >>
> > > >> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> > > >> >
> > > >> > I'm using Kafka to hold messages from a service that prepares
> > > >> "documents"
> > > >> > for SOLR.
> > > >> >
> > > >> > A second micro service (a consumer) requests these messages, does
> > any
> > > >> final
> > > >> > processing, and fires them into SOLR.
> > > >> >
> > > >> > The whole thing is (in part) designed to be used for disaster
> > > recovery -
> > > >> > allowing the rebuild of the SOLR index in the shortest possible
> > time.
> > > >> >
> > > >> > To do this (and to be able to use it for re-indexing SOLR while
> > > testing
> > > >> > relevancy) I need to be able to "play all messages from the
> > beginning"
> > > >> at
> > > >> > will.
> > > >> >
> > > >> > I find I can use the zkCli.sh tool to delete the Consumer Group
> Name
> > > >> like
> > > >> > this:
> > > >> >      rmr /kafka/consumers/myGroupName
> > > >> >
> > > >> > After which my microservice will get all the messages again when
> it
> > > >> runs.
> > > >> >
> > > >> > I was trying to find a way to do this programmatically without
> > > actually
> > > >> > using the "low level" consumer api since the high level one is so
> > > simple
> > > >> > and my code already works.  So I started playing with Zookeeper
> api
> > > for
> > > >> > duplicating "rmr /kafka/consumers/myGroupName"
> > > >> >
> > > >> > *The Question: What does that offset actually represent?*
> > > >> >
> > > >> > It was at this point that I discovered the offset must represent
> > > >> something
> > > >> > other than what I thought it would.  Things obviously work, but
> I'm
> > > >> > wondering what - exactly do the offsets represent?
> > > >> >
> > > >> > To clarify - if I run this command on a zookeeper node, after the
> > > >> > microservice has run:
> > > >> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > >> >
> > > >> > I get the following:
> > > >> >
> > > >> > 30024
> > > >> > cZxid = 0x3600000355
> > > >> > ctime = Fri Feb 12 07:27:50 MST 2016
> > > >> > mZxid = 0x3600000357
> > > >> > mtime = Fri Feb 12 07:29:50 MST 2016
> > > >> > pZxid = 0x3600000355
> > > >> > cversion = 0
> > > >> > dataVersion = 2
> > > >> > aclVersion = 0
> > > >> > ephemeralOwner = 0x0
> > > >> > dataLength = 5
> > > >> > numChildren = 0
> > > >> >
> > > >> > Now - I have exactly 3500 messages in this Kafka topic.  I verify
> > that
> > > >> by
> > > >> > running this command:
> > > >> >      bin/kafka-console-consumer.sh --zookeeper
> > > 192.168.56.5:2181/kafka
> > > >> > --topic myTopicName --from-beginning
> > > >> >
> > > >> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> > > >> >
> > > >> > So - what does that 30024 actually represent?  If I reset that
> > number
> > > >> to 1
> > > >> > or 0 and re-run my consumer microservice, I get all the messages
> > > again -
> > > >> > and the number again goes to 30024.  However, I'm not comfortable
> to
> > > >> trust
> > > >> > that because my assumption that the number represents a simple
> count
> > > of
> > > >> > messages that have been sent to this consumer is obviously wrong.
> > > >> >
> > > >> > (I reset the number like this -- to 1 -- and assume there's an API
> > > >> command
> > > >> > that will do it too.)
> > > >> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> > > >> >
> > > >> > Can someone help me clarify or point me at a doc that explains
> what
> > is
> > > >> > getting counted here?  You can shoot me if you like for attempting
> > the
> > > >> > hack-ish solution of re-setting the offset through the Zookeeper
> > API,
> > > >> but I
> > > >> > would still like to understand what, exactly, is represented by
> that
> > > >> number
> > > >> > 30024.
> > > >> >
> > > >> > I need to hand off to IT for the Disaster Recovery portion and
> > saying
> > > >> > "trust me, it just works" isn't going to fly very far...
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *Christian Posta*
> > > >> twitter: @christianposta
> > > >> http://www.christianposta.com/blog
> > > >> http://fabric8.io
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > "Dream no small dreams for they have no power to move the hearts of men."
> >
> > Johann Wolfgang von Goethe
> >
>



-- 
"Dream no small dreams for they have no power to move the hearts of men."

Johann Wolfgang von Goethe

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thanks Leo!

=========
TL;DR summary:

You're correct - I didn't absolutely need the offset.
I had to provide Disaster Recovery advice and couldn't explain the offset
numbers, which wouldn't fly
Explanation for how I got myself confused in the text below -- in case it
helps someone else later.
Thanks for your reply!
=========

You're right.  Strictly speaking, I don't need the offset.  In my testing
I've been issuing the rmr /kafka/consumers command from the Zookeeper
zkCli.sh.
I'm adding it to my microservice using the Zookeeper API this week - since
that seems a lot easier than figuring out the low level Kafka API code and
it works just as well.

Being a developer, I just couldn't help trying to change the least
significant thing required to get the job done - and the Zookeeper API does
allow me to change that offset number...  Which led me to try to understand
why that number wasn't matching my expectations...

In addition, I'm building a SOLR / Kafka / Zookeeper infrastructure from
scratch and part of my mandate is to provide a handoff to our (very capable
and very careful) IT manager.  The handoff is to include plans and
documentation for disaster recovery as well as how to build and manage the
cluster.

For both of those reasons, my curiosity was piqued and I wanted to find out
exactly what was going on.  I could just imagine the look on our IT
manager's face when I said "Trust me, the numbers don't line up, but it
won't affect disaster recovery."

In hindsight, I understand what I did that confused me.  Since I'm still in
development "mode" I sent messages to the same topic repeatedly for weeks.
Then instead of deleting the topic, I issued the following command to reset
the retention of the messages like this:

bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
topicName --config retention.ms=1000

Then I reset it once the messages were deleted, thus:

bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
topicName --delete-config retention.ms

What I didn't realize is that (not unreasonably) the offset count isn't
reset by changing the config retention setting.  As you said, it won't
necessarily be 0.

Sending the same set of messages repeatedly resulted in having a very large
count in the offset - a count that bore no relation to the number of
messages in the topic - which worried me because I couldn't explain it --
and things I can't explain make me nervous in the context of disaster
recovery...

I appreciate your confirmation of my theory about what is going on.

--JohnB (aka solrJohn)

On Thu, Feb 18, 2016 at 12:19 PM, Leo Lin <le...@brigade.com> wrote:

> Hi John,
>
> Kafka offsets are sequential id numbers that identify messages in each
> partition. It might not be sequential within a topic (which can have
> multiple partition).
>
> Offsets don't necessarily start at 0 since messages are deleted.
>
> .bin/kafka-run-class.sh kafka.tools.GetOffsetShell is pretty neat to look
> at offsets in your topic
>
> I'm not sure why resetting offset is needed in your case. If you need to
> read from the beginning using the high level consumer,
> you just need to delete that consumer group in zookeeper and set
> "auto.offset.reset"  to "smallest". (this will direct the consumer to look
> for smallest offset if it doesnt find one in zookeeper)
>
> On Wed, Feb 17, 2016 at 1:06 PM, John Bickerstaff <
> john@johnbickerstaff.com>
> wrote:
>
> > Hmmm...  more info.
> >
> > So, inside /var/log/kafka-logs/myTopicName-0 I find two files
> >
> > 00000000000000026524.index  00000000000000026524.log
> >
> > Interestingly, they both bear the number of the "lowest" offset returned
> by
> > the command I mention above.
> >
> > If I "cat" the 000.....26524.log file, I get all my messages on the
> > commandline as if I'd issued the --from-beginning command
> >
> > I'm not sure what the index has, it's unreadable by the simple tools I've
> > tried....
> >
> > I'm still scratching my head a bit - as the link you sent for Kafka
> > introduction says this:
> >
> > The messages in the partitions are each assigned a sequential id number
> > called the *offset* that uniquely identifies each message within the
> > partition.
> > I see how that could be exactly what you said (the previous message(s)
> byte
> > count) -- but the picture implies that it's a linear progression - 1,2,3
> > etc...  (and that could be an oversimplification for purposes of the
> > introduction - I get that...)
> >
> > Feel free to comment or not - I'm going to keep digging into it as best I
> > can - any clarifications will be gratefully accepted...
> >
> >
> >
> > On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <
> > john@johnbickerstaff.com>
> > wrote:
> >
> > > Thank you Christian -- I appreciate your taking the time to help me out
> > on
> > > this.
> > >
> > > Here's what I found while continuing to dig into this.
> > >
> > > If I take 30024 and subtract the number of messages I know I have in
> > Kafka
> > > (3500) I get 26524.
> > >
> > > If I reset thus:  set
> /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > 26524
> > >
> > > ... and then re-run my consumer - I get all 3500 messages again.
> > >
> > > If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0
> > 26624
> > >
> > > In other words, I increase the offset number by 100 -- then I get
> exactly
> > > 3400 messages on my consumer --  exactly 100 less than before which I
> > think
> > > makes sense, since I started the offset 100 higher...
> > >
> > > This seems to suggest that each number between 26624 and 30024 in the
> log
> > > represents one of my 3500 messages on this topic, but what you say
> > suggests
> > > that they represent byte count of the actual messages and not "one
> number
> > > per message"...
> > >
> > > I also find that if I issue this command:
> > >
> > > bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> > > --broker-list=192.168.56.3:9092  --time=-2
> > >
> > > I get back that same number -- 26524...
> > >
> > > Hmmmm....  A little confused still...  These messages are literally
> > stored
> > > in the Kafka logs, yes?  I think I'll go digging in there and see...
> > >
> > > Thanks again!
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> > > christian.posta@gmail.com> wrote:
> > >
> > >> The number is the log-ordered number of bytes. So really, the offset
> is
> > >> kinda like the "number of bytes" to begin reading from. 0 means read
> the
> > >> log from the beginning. The second message is 0 + size of message. So
> > the
> > >> message "ids" are really just the offset of the previous message
> sizes.
> > >>
> > >> For example, if I have three messages of 10 bytes each, and set the
> > >> consumer offset to 0, i'll read everything. If you set the offset to
> 10,
> > >> I'll read the second and third messages, and so on.
> > >>
> > >> see more here:
> > >>
> > >>
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > >> and here: http://kafka.apache.org/documentation.html#introduction
> > >>
> > >> HTH!
> > >>
> > >> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> > >> john@johnbickerstaff.com
> > >> > wrote:
> > >>
> > >> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> > >> >
> > >> > I'm using Kafka to hold messages from a service that prepares
> > >> "documents"
> > >> > for SOLR.
> > >> >
> > >> > A second micro service (a consumer) requests these messages, does
> any
> > >> final
> > >> > processing, and fires them into SOLR.
> > >> >
> > >> > The whole thing is (in part) designed to be used for disaster
> > recovery -
> > >> > allowing the rebuild of the SOLR index in the shortest possible
> time.
> > >> >
> > >> > To do this (and to be able to use it for re-indexing SOLR while
> > testing
> > >> > relevancy) I need to be able to "play all messages from the
> beginning"
> > >> at
> > >> > will.
> > >> >
> > >> > I find I can use the zkCli.sh tool to delete the Consumer Group Name
> > >> like
> > >> > this:
> > >> >      rmr /kafka/consumers/myGroupName
> > >> >
> > >> > After which my microservice will get all the messages again when it
> > >> runs.
> > >> >
> > >> > I was trying to find a way to do this programmatically without
> > actually
> > >> > using the "low level" consumer api since the high level one is so
> > simple
> > >> > and my code already works.  So I started playing with Zookeeper api
> > for
> > >> > duplicating "rmr /kafka/consumers/myGroupName"
> > >> >
> > >> > *The Question: What does that offset actually represent?*
> > >> >
> > >> > It was at this point that I discovered the offset must represent
> > >> something
> > >> > other than what I thought it would.  Things obviously work, but I'm
> > >> > wondering what - exactly do the offsets represent?
> > >> >
> > >> > To clarify - if I run this command on a zookeeper node, after the
> > >> > microservice has run:
> > >> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> > >> >
> > >> > I get the following:
> > >> >
> > >> > 30024
> > >> > cZxid = 0x3600000355
> > >> > ctime = Fri Feb 12 07:27:50 MST 2016
> > >> > mZxid = 0x3600000357
> > >> > mtime = Fri Feb 12 07:29:50 MST 2016
> > >> > pZxid = 0x3600000355
> > >> > cversion = 0
> > >> > dataVersion = 2
> > >> > aclVersion = 0
> > >> > ephemeralOwner = 0x0
> > >> > dataLength = 5
> > >> > numChildren = 0
> > >> >
> > >> > Now - I have exactly 3500 messages in this Kafka topic.  I verify
> that
> > >> by
> > >> > running this command:
> > >> >      bin/kafka-console-consumer.sh --zookeeper
> > 192.168.56.5:2181/kafka
> > >> > --topic myTopicName --from-beginning
> > >> >
> > >> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> > >> >
> > >> > So - what does that 30024 actually represent?  If I reset that
> number
> > >> to 1
> > >> > or 0 and re-run my consumer microservice, I get all the messages
> > again -
> > >> > and the number again goes to 30024.  However, I'm not comfortable to
> > >> trust
> > >> > that because my assumption that the number represents a simple count
> > of
> > >> > messages that have been sent to this consumer is obviously wrong.
> > >> >
> > >> > (I reset the number like this -- to 1 -- and assume there's an API
> > >> command
> > >> > that will do it too.)
> > >> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> > >> >
> > >> > Can someone help me clarify or point me at a doc that explains what
> is
> > >> > getting counted here?  You can shoot me if you like for attempting
> the
> > >> > hack-ish solution of re-setting the offset through the Zookeeper
> API,
> > >> but I
> > >> > would still like to understand what, exactly, is represented by that
> > >> number
> > >> > 30024.
> > >> >
> > >> > I need to hand off to IT for the Disaster Recovery portion and
> saying
> > >> > "trust me, it just works" isn't going to fly very far...
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> *Christian Posta*
> > >> twitter: @christianposta
> > >> http://www.christianposta.com/blog
> > >> http://fabric8.io
> > >>
> > >
> > >
> >
>
>
>
> --
> "Dream no small dreams for they have no power to move the hearts of men."
>
> Johann Wolfgang von Goethe
>

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by Leo Lin <le...@brigade.com>.
Hi John,

Kafka offsets are sequential id numbers that identify messages in each
partition. It might not be sequential within a topic (which can have
multiple partition).

Offsets don't necessarily start at 0 since messages are deleted.

.bin/kafka-run-class.sh kafka.tools.GetOffsetShell is pretty neat to look
at offsets in your topic

I'm not sure why resetting offset is needed in your case. If you need to
read from the beginning using the high level consumer,
you just need to delete that consumer group in zookeeper and set
"auto.offset.reset"  to "smallest". (this will direct the consumer to look
for smallest offset if it doesnt find one in zookeeper)

On Wed, Feb 17, 2016 at 1:06 PM, John Bickerstaff <jo...@johnbickerstaff.com>
wrote:

> Hmmm...  more info.
>
> So, inside /var/log/kafka-logs/myTopicName-0 I find two files
>
> 00000000000000026524.index  00000000000000026524.log
>
> Interestingly, they both bear the number of the "lowest" offset returned by
> the command I mention above.
>
> If I "cat" the 000.....26524.log file, I get all my messages on the
> commandline as if I'd issued the --from-beginning command
>
> I'm not sure what the index has, it's unreadable by the simple tools I've
> tried....
>
> I'm still scratching my head a bit - as the link you sent for Kafka
> introduction says this:
>
> The messages in the partitions are each assigned a sequential id number
> called the *offset* that uniquely identifies each message within the
> partition.
> I see how that could be exactly what you said (the previous message(s) byte
> count) -- but the picture implies that it's a linear progression - 1,2,3
> etc...  (and that could be an oversimplification for purposes of the
> introduction - I get that...)
>
> Feel free to comment or not - I'm going to keep digging into it as best I
> can - any clarifications will be gratefully accepted...
>
>
>
> On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <
> john@johnbickerstaff.com>
> wrote:
>
> > Thank you Christian -- I appreciate your taking the time to help me out
> on
> > this.
> >
> > Here's what I found while continuing to dig into this.
> >
> > If I take 30024 and subtract the number of messages I know I have in
> Kafka
> > (3500) I get 26524.
> >
> > If I reset thus:  set /kafka/consumers/myGroupName/offsets/myTopicName/0
> > 26524
> >
> > ... and then re-run my consumer - I get all 3500 messages again.
> >
> > If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0
> 26624
> >
> > In other words, I increase the offset number by 100 -- then I get exactly
> > 3400 messages on my consumer --  exactly 100 less than before which I
> think
> > makes sense, since I started the offset 100 higher...
> >
> > This seems to suggest that each number between 26624 and 30024 in the log
> > represents one of my 3500 messages on this topic, but what you say
> suggests
> > that they represent byte count of the actual messages and not "one number
> > per message"...
> >
> > I also find that if I issue this command:
> >
> > bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> > --broker-list=192.168.56.3:9092  --time=-2
> >
> > I get back that same number -- 26524...
> >
> > Hmmmm....  A little confused still...  These messages are literally
> stored
> > in the Kafka logs, yes?  I think I'll go digging in there and see...
> >
> > Thanks again!
> >
> >
> >
> >
> >
> > On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> > christian.posta@gmail.com> wrote:
> >
> >> The number is the log-ordered number of bytes. So really, the offset is
> >> kinda like the "number of bytes" to begin reading from. 0 means read the
> >> log from the beginning. The second message is 0 + size of message. So
> the
> >> message "ids" are really just the offset of the previous message sizes.
> >>
> >> For example, if I have three messages of 10 bytes each, and set the
> >> consumer offset to 0, i'll read everything. If you set the offset to 10,
> >> I'll read the second and third messages, and so on.
> >>
> >> see more here:
> >>
> >>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >> and here: http://kafka.apache.org/documentation.html#introduction
> >>
> >> HTH!
> >>
> >> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> >> john@johnbickerstaff.com
> >> > wrote:
> >>
> >> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> >> >
> >> > I'm using Kafka to hold messages from a service that prepares
> >> "documents"
> >> > for SOLR.
> >> >
> >> > A second micro service (a consumer) requests these messages, does any
> >> final
> >> > processing, and fires them into SOLR.
> >> >
> >> > The whole thing is (in part) designed to be used for disaster
> recovery -
> >> > allowing the rebuild of the SOLR index in the shortest possible time.
> >> >
> >> > To do this (and to be able to use it for re-indexing SOLR while
> testing
> >> > relevancy) I need to be able to "play all messages from the beginning"
> >> at
> >> > will.
> >> >
> >> > I find I can use the zkCli.sh tool to delete the Consumer Group Name
> >> like
> >> > this:
> >> >      rmr /kafka/consumers/myGroupName
> >> >
> >> > After which my microservice will get all the messages again when it
> >> runs.
> >> >
> >> > I was trying to find a way to do this programmatically without
> actually
> >> > using the "low level" consumer api since the high level one is so
> simple
> >> > and my code already works.  So I started playing with Zookeeper api
> for
> >> > duplicating "rmr /kafka/consumers/myGroupName"
> >> >
> >> > *The Question: What does that offset actually represent?*
> >> >
> >> > It was at this point that I discovered the offset must represent
> >> something
> >> > other than what I thought it would.  Things obviously work, but I'm
> >> > wondering what - exactly do the offsets represent?
> >> >
> >> > To clarify - if I run this command on a zookeeper node, after the
> >> > microservice has run:
> >> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> >> >
> >> > I get the following:
> >> >
> >> > 30024
> >> > cZxid = 0x3600000355
> >> > ctime = Fri Feb 12 07:27:50 MST 2016
> >> > mZxid = 0x3600000357
> >> > mtime = Fri Feb 12 07:29:50 MST 2016
> >> > pZxid = 0x3600000355
> >> > cversion = 0
> >> > dataVersion = 2
> >> > aclVersion = 0
> >> > ephemeralOwner = 0x0
> >> > dataLength = 5
> >> > numChildren = 0
> >> >
> >> > Now - I have exactly 3500 messages in this Kafka topic.  I verify that
> >> by
> >> > running this command:
> >> >      bin/kafka-console-consumer.sh --zookeeper
> 192.168.56.5:2181/kafka
> >> > --topic myTopicName --from-beginning
> >> >
> >> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> >> >
> >> > So - what does that 30024 actually represent?  If I reset that number
> >> to 1
> >> > or 0 and re-run my consumer microservice, I get all the messages
> again -
> >> > and the number again goes to 30024.  However, I'm not comfortable to
> >> trust
> >> > that because my assumption that the number represents a simple count
> of
> >> > messages that have been sent to this consumer is obviously wrong.
> >> >
> >> > (I reset the number like this -- to 1 -- and assume there's an API
> >> command
> >> > that will do it too.)
> >> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> >> >
> >> > Can someone help me clarify or point me at a doc that explains what is
> >> > getting counted here?  You can shoot me if you like for attempting the
> >> > hack-ish solution of re-setting the offset through the Zookeeper API,
> >> but I
> >> > would still like to understand what, exactly, is represented by that
> >> number
> >> > 30024.
> >> >
> >> > I need to hand off to IT for the Disaster Recovery portion and saying
> >> > "trust me, it just works" isn't going to fly very far...
> >> >
> >> > Thanks.
> >> >
> >>
> >>
> >>
> >> --
> >> *Christian Posta*
> >> twitter: @christianposta
> >> http://www.christianposta.com/blog
> >> http://fabric8.io
> >>
> >
> >
>



-- 
"Dream no small dreams for they have no power to move the hearts of men."

Johann Wolfgang von Goethe

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Hmmm...  more info.

So, inside /var/log/kafka-logs/myTopicName-0 I find two files

00000000000000026524.index  00000000000000026524.log

Interestingly, they both bear the number of the "lowest" offset returned by
the command I mention above.

If I "cat" the 000.....26524.log file, I get all my messages on the
commandline as if I'd issued the --from-beginning command

I'm not sure what the index has, it's unreadable by the simple tools I've
tried....

I'm still scratching my head a bit - as the link you sent for Kafka
introduction says this:

The messages in the partitions are each assigned a sequential id number
called the *offset* that uniquely identifies each message within the
partition.
I see how that could be exactly what you said (the previous message(s) byte
count) -- but the picture implies that it's a linear progression - 1,2,3
etc...  (and that could be an oversimplification for purposes of the
introduction - I get that...)

Feel free to comment or not - I'm going to keep digging into it as best I
can - any clarifications will be gratefully accepted...



On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <jo...@johnbickerstaff.com>
wrote:

> Thank you Christian -- I appreciate your taking the time to help me out on
> this.
>
> Here's what I found while continuing to dig into this.
>
> If I take 30024 and subtract the number of messages I know I have in Kafka
> (3500) I get 26524.
>
> If I reset thus:  set /kafka/consumers/myGroupName/offsets/myTopicName/0
> 26524
>
> ... and then re-run my consumer - I get all 3500 messages again.
>
> If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0 26624
>
> In other words, I increase the offset number by 100 -- then I get exactly
> 3400 messages on my consumer --  exactly 100 less than before which I think
> makes sense, since I started the offset 100 higher...
>
> This seems to suggest that each number between 26624 and 30024 in the log
> represents one of my 3500 messages on this topic, but what you say suggests
> that they represent byte count of the actual messages and not "one number
> per message"...
>
> I also find that if I issue this command:
>
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> --broker-list=192.168.56.3:9092  --time=-2
>
> I get back that same number -- 26524...
>
> Hmmmm....  A little confused still...  These messages are literally stored
> in the Kafka logs, yes?  I think I'll go digging in there and see...
>
> Thanks again!
>
>
>
>
>
> On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> christian.posta@gmail.com> wrote:
>
>> The number is the log-ordered number of bytes. So really, the offset is
>> kinda like the "number of bytes" to begin reading from. 0 means read the
>> log from the beginning. The second message is 0 + size of message. So the
>> message "ids" are really just the offset of the previous message sizes.
>>
>> For example, if I have three messages of 10 bytes each, and set the
>> consumer offset to 0, i'll read everything. If you set the offset to 10,
>> I'll read the second and third messages, and so on.
>>
>> see more here:
>>
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> and here: http://kafka.apache.org/documentation.html#introduction
>>
>> HTH!
>>
>> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
>> john@johnbickerstaff.com
>> > wrote:
>>
>> > *Use Case: Disaster Recovery & Re-indexing SOLR*
>> >
>> > I'm using Kafka to hold messages from a service that prepares
>> "documents"
>> > for SOLR.
>> >
>> > A second micro service (a consumer) requests these messages, does any
>> final
>> > processing, and fires them into SOLR.
>> >
>> > The whole thing is (in part) designed to be used for disaster recovery -
>> > allowing the rebuild of the SOLR index in the shortest possible time.
>> >
>> > To do this (and to be able to use it for re-indexing SOLR while testing
>> > relevancy) I need to be able to "play all messages from the beginning"
>> at
>> > will.
>> >
>> > I find I can use the zkCli.sh tool to delete the Consumer Group Name
>> like
>> > this:
>> >      rmr /kafka/consumers/myGroupName
>> >
>> > After which my microservice will get all the messages again when it
>> runs.
>> >
>> > I was trying to find a way to do this programmatically without actually
>> > using the "low level" consumer api since the high level one is so simple
>> > and my code already works.  So I started playing with Zookeeper api for
>> > duplicating "rmr /kafka/consumers/myGroupName"
>> >
>> > *The Question: What does that offset actually represent?*
>> >
>> > It was at this point that I discovered the offset must represent
>> something
>> > other than what I thought it would.  Things obviously work, but I'm
>> > wondering what - exactly do the offsets represent?
>> >
>> > To clarify - if I run this command on a zookeeper node, after the
>> > microservice has run:
>> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
>> >
>> > I get the following:
>> >
>> > 30024
>> > cZxid = 0x3600000355
>> > ctime = Fri Feb 12 07:27:50 MST 2016
>> > mZxid = 0x3600000357
>> > mtime = Fri Feb 12 07:29:50 MST 2016
>> > pZxid = 0x3600000355
>> > cversion = 0
>> > dataVersion = 2
>> > aclVersion = 0
>> > ephemeralOwner = 0x0
>> > dataLength = 5
>> > numChildren = 0
>> >
>> > Now - I have exactly 3500 messages in this Kafka topic.  I verify that
>> by
>> > running this command:
>> >      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
>> > --topic myTopicName --from-beginning
>> >
>> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
>> >
>> > So - what does that 30024 actually represent?  If I reset that number
>> to 1
>> > or 0 and re-run my consumer microservice, I get all the messages again -
>> > and the number again goes to 30024.  However, I'm not comfortable to
>> trust
>> > that because my assumption that the number represents a simple count of
>> > messages that have been sent to this consumer is obviously wrong.
>> >
>> > (I reset the number like this -- to 1 -- and assume there's an API
>> command
>> > that will do it too.)
>> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
>> >
>> > Can someone help me clarify or point me at a doc that explains what is
>> > getting counted here?  You can shoot me if you like for attempting the
>> > hack-ish solution of re-setting the offset through the Zookeeper API,
>> but I
>> > would still like to understand what, exactly, is represented by that
>> number
>> > 30024.
>> >
>> > I need to hand off to IT for the Disaster Recovery portion and saying
>> > "trust me, it just works" isn't going to fly very far...
>> >
>> > Thanks.
>> >
>>
>>
>>
>> --
>> *Christian Posta*
>> twitter: @christianposta
>> http://www.christianposta.com/blog
>> http://fabric8.io
>>
>
>

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thank you Christian -- I appreciate your taking the time to help me out on
this.

Here's what I found while continuing to dig into this.

If I take 30024 and subtract the number of messages I know I have in Kafka
(3500) I get 26524.

If I reset thus:  set /kafka/consumers/myGroupName/offsets/myTopicName/0
26524

... and then re-run my consumer - I get all 3500 messages again.

If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0 26624

In other words, I increase the offset number by 100 -- then I get exactly
3400 messages on my consumer --  exactly 100 less than before which I think
makes sense, since I started the offset 100 higher...

This seems to suggest that each number between 26624 and 30024 in the log
represents one of my 3500 messages on this topic, but what you say suggests
that they represent byte count of the actual messages and not "one number
per message"...

I also find that if I issue this command:

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
--broker-list=192.168.56.3:9092  --time=-2

I get back that same number -- 26524...

Hmmmm....  A little confused still...  These messages are literally stored
in the Kafka logs, yes?  I think I'll go digging in there and see...

Thanks again!





On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <christian.posta@gmail.com
> wrote:

> The number is the log-ordered number of bytes. So really, the offset is
> kinda like the "number of bytes" to begin reading from. 0 means read the
> log from the beginning. The second message is 0 + size of message. So the
> message "ids" are really just the offset of the previous message sizes.
>
> For example, if I have three messages of 10 bytes each, and set the
> consumer offset to 0, i'll read everything. If you set the offset to 10,
> I'll read the second and third messages, and so on.
>
> see more here:
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> and here: http://kafka.apache.org/documentation.html#introduction
>
> HTH!
>
> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> john@johnbickerstaff.com
> > wrote:
>
> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> >
> > I'm using Kafka to hold messages from a service that prepares "documents"
> > for SOLR.
> >
> > A second micro service (a consumer) requests these messages, does any
> final
> > processing, and fires them into SOLR.
> >
> > The whole thing is (in part) designed to be used for disaster recovery -
> > allowing the rebuild of the SOLR index in the shortest possible time.
> >
> > To do this (and to be able to use it for re-indexing SOLR while testing
> > relevancy) I need to be able to "play all messages from the beginning" at
> > will.
> >
> > I find I can use the zkCli.sh tool to delete the Consumer Group Name like
> > this:
> >      rmr /kafka/consumers/myGroupName
> >
> > After which my microservice will get all the messages again when it runs.
> >
> > I was trying to find a way to do this programmatically without actually
> > using the "low level" consumer api since the high level one is so simple
> > and my code already works.  So I started playing with Zookeeper api for
> > duplicating "rmr /kafka/consumers/myGroupName"
> >
> > *The Question: What does that offset actually represent?*
> >
> > It was at this point that I discovered the offset must represent
> something
> > other than what I thought it would.  Things obviously work, but I'm
> > wondering what - exactly do the offsets represent?
> >
> > To clarify - if I run this command on a zookeeper node, after the
> > microservice has run:
> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> >
> > I get the following:
> >
> > 30024
> > cZxid = 0x3600000355
> > ctime = Fri Feb 12 07:27:50 MST 2016
> > mZxid = 0x3600000357
> > mtime = Fri Feb 12 07:29:50 MST 2016
> > pZxid = 0x3600000355
> > cversion = 0
> > dataVersion = 2
> > aclVersion = 0
> > ephemeralOwner = 0x0
> > dataLength = 5
> > numChildren = 0
> >
> > Now - I have exactly 3500 messages in this Kafka topic.  I verify that by
> > running this command:
> >      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
> > --topic myTopicName --from-beginning
> >
> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> >
> > So - what does that 30024 actually represent?  If I reset that number to
> 1
> > or 0 and re-run my consumer microservice, I get all the messages again -
> > and the number again goes to 30024.  However, I'm not comfortable to
> trust
> > that because my assumption that the number represents a simple count of
> > messages that have been sent to this consumer is obviously wrong.
> >
> > (I reset the number like this -- to 1 -- and assume there's an API
> command
> > that will do it too.)
> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> >
> > Can someone help me clarify or point me at a doc that explains what is
> > getting counted here?  You can shoot me if you like for attempting the
> > hack-ish solution of re-setting the offset through the Zookeeper API,
> but I
> > would still like to understand what, exactly, is represented by that
> number
> > 30024.
> >
> > I need to hand off to IT for the Disaster Recovery portion and saying
> > "trust me, it just works" isn't going to fly very far...
> >
> > Thanks.
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>

Re: Resetting Kafka Offsets -- and What are offsets.... exactly?

Posted by Christian Posta <ch...@gmail.com>.
The number is the log-ordered number of bytes. So really, the offset is
kinda like the "number of bytes" to begin reading from. 0 means read the
log from the beginning. The second message is 0 + size of message. So the
message "ids" are really just the offset of the previous message sizes.

For example, if I have three messages of 10 bytes each, and set the
consumer offset to 0, i'll read everything. If you set the offset to 10,
I'll read the second and third messages, and so on.

see more here:
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
and here: http://kafka.apache.org/documentation.html#introduction

HTH!

On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <john@johnbickerstaff.com
> wrote:

> *Use Case: Disaster Recovery & Re-indexing SOLR*
>
> I'm using Kafka to hold messages from a service that prepares "documents"
> for SOLR.
>
> A second micro service (a consumer) requests these messages, does any final
> processing, and fires them into SOLR.
>
> The whole thing is (in part) designed to be used for disaster recovery -
> allowing the rebuild of the SOLR index in the shortest possible time.
>
> To do this (and to be able to use it for re-indexing SOLR while testing
> relevancy) I need to be able to "play all messages from the beginning" at
> will.
>
> I find I can use the zkCli.sh tool to delete the Consumer Group Name like
> this:
>      rmr /kafka/consumers/myGroupName
>
> After which my microservice will get all the messages again when it runs.
>
> I was trying to find a way to do this programmatically without actually
> using the "low level" consumer api since the high level one is so simple
> and my code already works.  So I started playing with Zookeeper api for
> duplicating "rmr /kafka/consumers/myGroupName"
>
> *The Question: What does that offset actually represent?*
>
> It was at this point that I discovered the offset must represent something
> other than what I thought it would.  Things obviously work, but I'm
> wondering what - exactly do the offsets represent?
>
> To clarify - if I run this command on a zookeeper node, after the
> microservice has run:
>      get /kafka/consumers/myGroupName/offsets/myTopicName/0
>
> I get the following:
>
> 30024
> cZxid = 0x3600000355
> ctime = Fri Feb 12 07:27:50 MST 2016
> mZxid = 0x3600000357
> mtime = Fri Feb 12 07:29:50 MST 2016
> pZxid = 0x3600000355
> cversion = 0
> dataVersion = 2
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 5
> numChildren = 0
>
> Now - I have exactly 3500 messages in this Kafka topic.  I verify that by
> running this command:
>      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
> --topic myTopicName --from-beginning
>
> When I hit Ctrl-C, it tells me it consumed 3500 messages.
>
> So - what does that 30024 actually represent?  If I reset that number to 1
> or 0 and re-run my consumer microservice, I get all the messages again -
> and the number again goes to 30024.  However, I'm not comfortable to trust
> that because my assumption that the number represents a simple count of
> messages that have been sent to this consumer is obviously wrong.
>
> (I reset the number like this -- to 1 -- and assume there's an API command
> that will do it too.)
>      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
>
> Can someone help me clarify or point me at a doc that explains what is
> getting counted here?  You can shoot me if you like for attempting the
> hack-ish solution of re-setting the offset through the Zookeeper API, but I
> would still like to understand what, exactly, is represented by that number
> 30024.
>
> I need to hand off to IT for the Disaster Recovery portion and saying
> "trust me, it just works" isn't going to fly very far...
>
> Thanks.
>



-- 
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io