You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Fabio Pardi <f....@portavita.eu> on 2020/06/08 08:42:36 UTC

reliable way to count number of messages

Hi there,

I have one topic with one partition and i want to know how many messages are there in the topic.

I noticed that if i run:

kafka-console-consumer --topic mytopic  --bootstrap-server [..]:9092 --from-beginning

[..]
Processed a total of 23626 messages


If I instead run:

 kafka.tools.GetOffsetShell --broker-list [..]:9092 --topic mytopic --time -1

mytopic:0:47252


So the 2 commands return different numbers and the first returns exactly half the amount the second does.

Why the 2 commands do not return the same amount and which one is right?


kafka-console-consumer --version
5.4.1-ccs (Commit:fd1e543386b47352)

kafka-run-class -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-linux64) (build 25.212-b04, mixed mode)

regards,

fabio pardi

Re: reliable way to count number of messages

Posted by Fabio Pardi <f....@portavita.eu>.
Solved.

For the future us: the reason why offsets are 2 times the messages is to be found in how (our) producer works.

The producer commits the message and the transaction, thus the offset is incremented by 2 for each sent message.

regards,

fabio pardi

On 08/06/2020 13:45, Fabio Pardi wrote:
> Hello Liam,
>
> thanks for your reply.
>
> I m still in the process of consolidating my Kafka knowledge so I might have overlooked something in the current configuration or in the investigation of the current problem.
>
>
> About the problem, the strange thing is that the earliest offset is actually 0. My question was triggered because i disabled log compaction passing 'log.cleaner.enable: "false"' to the brokers. Sorry for not mentioning it before.
>
>
> kafka-run-class kafka.tools.GetOffsetShell --broker-list [...]:9092 --topic pgo.fhir3.resource --time -2
> mytopic:0:0
>
> What sounds to me suspicious besides the offset and the number of messages not being identical, is that the former is exactly 2 times the latter.
>
>
> regards,
>
> fabio pardi
>
>
>
>
> On 08/06/2020 12:26, Liam Clarke-Hutchinson wrote:
>> Hi Fabio,
>>
>> -1 is shorthand for latest when passed as --time to GetOffsetShell (-2 is
>> earliest), so the output is telling you that the latest offset of partition
>> 0 of the topic is 47252.
>>
>> However, the earliest offset in the topic may not be zero - as topic
>> retention times are hit and messages removed, offsets aren't changed.
>>
>> So likely you'll find the earliest offset is 23626 or similar if you run
>> GetOffsetShell with --time -2.
>>
>> Cheers,
>>
>> Liam Clarke-Hutchinson
>>
>> On Mon, 8 Jun. 2020, 8:42 pm Fabio Pardi, <f....@portavita.eu> wrote:
>>
>>> Hi there,
>>>
>>> I have one topic with one partition and i want to know how many messages
>>> are there in the topic.
>>>
>>> I noticed that if i run:
>>>
>>> kafka-console-consumer --topic mytopic  --bootstrap-server [..]:9092
>>> --from-beginning
>>>
>>> [..]
>>> Processed a total of 23626 messages
>>>
>>>
>>> If I instead run:
>>>
>>>  kafka.tools.GetOffsetShell --broker-list [..]:9092 --topic mytopic --time
>>> -1
>>>
>>> mytopic:0:47252
>>>
>>>
>>> So the 2 commands return different numbers and the first returns exactly
>>> half the amount the second does.
>>>
>>> Why the 2 commands do not return the same amount and which one is right?
>>>
>>>
>>> kafka-console-consumer --version
>>> 5.4.1-ccs (Commit:fd1e543386b47352)
>>>
>>> kafka-run-class -version
>>> openjdk version "1.8.0_212"
>>> OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (build
>>> 1.8.0_212-b04)
>>> OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-linux64) (build 25.212-b04,
>>> mixed mode)
>>>
>>> regards,
>>>
>>> fabio pardi
>>>
>


Re: reliable way to count number of messages

Posted by Fabio Pardi <f....@portavita.eu>.
Hello Liam,

thanks for your reply.

I m still in the process of consolidating my Kafka knowledge so I might have overlooked something in the current configuration or in the investigation of the current problem.


About the problem, the strange thing is that the earliest offset is actually 0. My question was triggered because i disabled log compaction passing 'log.cleaner.enable: "false"' to the brokers. Sorry for not mentioning it before.


kafka-run-class kafka.tools.GetOffsetShell --broker-list [...]:9092 --topic pgo.fhir3.resource --time -2
mytopic:0:0

What sounds to me suspicious besides the offset and the number of messages not being identical, is that the former is exactly 2 times the latter.


regards,

fabio pardi




On 08/06/2020 12:26, Liam Clarke-Hutchinson wrote:
> Hi Fabio,
>
> -1 is shorthand for latest when passed as --time to GetOffsetShell (-2 is
> earliest), so the output is telling you that the latest offset of partition
> 0 of the topic is 47252.
>
> However, the earliest offset in the topic may not be zero - as topic
> retention times are hit and messages removed, offsets aren't changed.
>
> So likely you'll find the earliest offset is 23626 or similar if you run
> GetOffsetShell with --time -2.
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
> On Mon, 8 Jun. 2020, 8:42 pm Fabio Pardi, <f....@portavita.eu> wrote:
>
>> Hi there,
>>
>> I have one topic with one partition and i want to know how many messages
>> are there in the topic.
>>
>> I noticed that if i run:
>>
>> kafka-console-consumer --topic mytopic  --bootstrap-server [..]:9092
>> --from-beginning
>>
>> [..]
>> Processed a total of 23626 messages
>>
>>
>> If I instead run:
>>
>>  kafka.tools.GetOffsetShell --broker-list [..]:9092 --topic mytopic --time
>> -1
>>
>> mytopic:0:47252
>>
>>
>> So the 2 commands return different numbers and the first returns exactly
>> half the amount the second does.
>>
>> Why the 2 commands do not return the same amount and which one is right?
>>
>>
>> kafka-console-consumer --version
>> 5.4.1-ccs (Commit:fd1e543386b47352)
>>
>> kafka-run-class -version
>> openjdk version "1.8.0_212"
>> OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (build
>> 1.8.0_212-b04)
>> OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-linux64) (build 25.212-b04,
>> mixed mode)
>>
>> regards,
>>
>> fabio pardi
>>


Re: reliable way to count number of messages

Posted by Liam Clarke-Hutchinson <li...@adscale.co.nz>.
Hi Fabio,

-1 is shorthand for latest when passed as --time to GetOffsetShell (-2 is
earliest), so the output is telling you that the latest offset of partition
0 of the topic is 47252.

However, the earliest offset in the topic may not be zero - as topic
retention times are hit and messages removed, offsets aren't changed.

So likely you'll find the earliest offset is 23626 or similar if you run
GetOffsetShell with --time -2.

Cheers,

Liam Clarke-Hutchinson

On Mon, 8 Jun. 2020, 8:42 pm Fabio Pardi, <f....@portavita.eu> wrote:

> Hi there,
>
> I have one topic with one partition and i want to know how many messages
> are there in the topic.
>
> I noticed that if i run:
>
> kafka-console-consumer --topic mytopic  --bootstrap-server [..]:9092
> --from-beginning
>
> [..]
> Processed a total of 23626 messages
>
>
> If I instead run:
>
>  kafka.tools.GetOffsetShell --broker-list [..]:9092 --topic mytopic --time
> -1
>
> mytopic:0:47252
>
>
> So the 2 commands return different numbers and the first returns exactly
> half the amount the second does.
>
> Why the 2 commands do not return the same amount and which one is right?
>
>
> kafka-console-consumer --version
> 5.4.1-ccs (Commit:fd1e543386b47352)
>
> kafka-run-class -version
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (build
> 1.8.0_212-b04)
> OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-linux64) (build 25.212-b04,
> mixed mode)
>
> regards,
>
> fabio pardi
>