You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2014/02/08 23:11:29 UTC

The consumer performance test seems to be totally broken

It loops on
  for (messageAndMetadata <- stream if messagesRead < config.numMessages) {
   // count message
}

That means it loops over *all* messages in the stream (eventually blocking
when all messages are exhausted) but only counts the first N as determined
by the config.

As a result this command seems to always hang forever. Seems like someone
"improved" this without ever trying to run it :-(

-Jay

Re: The consumer performance test seems to be totally broken

Posted by Neha Narkhede <ne...@gmail.com>.
Synced up offline about this. Makes more sense now.


On Thu, Feb 13, 2014 at 10:15 AM, Jay Kreps <ja...@gmail.com> wrote:

> Read my email again :-)
>
> -Jay
>
>
> On Thu, Feb 13, 2014 at 9:43 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> >  Did you try this on an otherwise empty topic?
> >
> > Yes and it blocks until --messages are read. But again, my interpretation
> > of that config has been - "read exactly --messages # of messages, not
> less
> > not more".
> >
> > Thanks,
> > Neha
> >
> >
> > On Mon, Feb 10, 2014 at 9:31 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Hmm, I could be wrong... Did you try this on an otherwise empty topic?
> > Try
> > > this:
> > > 1. Produce 100GB of messages
> > > 2. Run consumer performance test with --messages 1
> > > 3. Go get some coffee
> > > 4. Observe performance tests results with a time taken to consume 100GB
> > of
> > > messages but only recording a single message consumed
> > >
> > > :-)
> > >
> > > Here is what I mean:
> > > scala> val l = List(1,2,3,4,5,6,7,8,9,10)
> > > scala> val iter = l.iterator
> > > scala> var count = 0
> > > scala> for(i <- iter if count < 5) {
> > > scala> iter.next
> > > java.util.NoSuchElementException: next on empty iterator
> > > ...
> > >
> > > This is the exact equivalent of
> > > val l = List(1,2,3,4,5,6,7,8,9,10)
> > > val iter = l.iterator
> > > var count = 0
> > > while(iter.hasNext) {
> > >   if(iter.next < 5)
> > >     count += 1
> > > }
> > >
> > > -Jay
> > >
> > >
> > > On Sun, Feb 9, 2014 at 9:43 PM, Neha Narkhede <neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > I ran ConsumerPerformance with different number of messages config
> for
> > a
> > > > topic that has 100s of messages in the log. It did not hang in any of
> > > these
> > > > tests -
> > > >
> > > > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > > > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic
> test
> > > > *--messages
> > > > 10* --threads 1
> > > > start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec,
> > > > data.consumed.in.nMsg, nMsg.sec
> > > > 1 messages read
> > > > 2 messages read
> > > > 3 messages read
> > > > 4 messages read
> > > > 5 messages read
> > > > 6 messages read
> > > > 7 messages read
> > > > 8 messages read
> > > > 9 messages read
> > > > 10 messages read
> > > > *2014-02-09 21:39:47:385, 2014-02-09 21:39:52:641, 1048576, 0.0015,
> > > 0.0060,
> > > > 10, 39.0625*  (test ends with this)
> > > > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > > > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic
> test
> > > > *--messages
> > > > 2* --threads 1
> > > > 1 messages read
> > > > 2 messages read
> > > > *2014-02-09 21:39:59:853, 2014-02-09 21:40:05:100, 1048576, 0.0003,
> > > 0.0011,
> > > > 2, 8.0972*      (test ends with this)
> > > >
> > > > The way I read the --messages config is "read exactly --messages # of
> > > > messages, not less not more". If the topic has less than --messages
> > > > records, it will hang until it gets more data, else it exits
> properly.
> > > > Seems like expected behavior.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > >
> > > > On Sat, Feb 8, 2014 at 2:11 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > > >
> > > > > It loops on
> > > > >   for (messageAndMetadata <- stream if messagesRead <
> > > > config.numMessages) {
> > > > >    // count message
> > > > > }
> > > > >
> > > > > That means it loops over *all* messages in the stream (eventually
> > > > blocking
> > > > > when all messages are exhausted) but only counts the first N as
> > > > determined
> > > > > by the config.
> > > > >
> > > > > As a result this command seems to always hang forever. Seems like
> > > someone
> > > > > "improved" this without ever trying to run it :-(
> > > > >
> > > > > -Jay
> > > > >
> > > >
> > >
> >
>

Re: The consumer performance test seems to be totally broken

Posted by Jay Kreps <ja...@gmail.com>.
Read my email again :-)

-Jay


On Thu, Feb 13, 2014 at 9:43 AM, Neha Narkhede <ne...@gmail.com>wrote:

>  Did you try this on an otherwise empty topic?
>
> Yes and it blocks until --messages are read. But again, my interpretation
> of that config has been - "read exactly --messages # of messages, not less
> not more".
>
> Thanks,
> Neha
>
>
> On Mon, Feb 10, 2014 at 9:31 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hmm, I could be wrong... Did you try this on an otherwise empty topic?
> Try
> > this:
> > 1. Produce 100GB of messages
> > 2. Run consumer performance test with --messages 1
> > 3. Go get some coffee
> > 4. Observe performance tests results with a time taken to consume 100GB
> of
> > messages but only recording a single message consumed
> >
> > :-)
> >
> > Here is what I mean:
> > scala> val l = List(1,2,3,4,5,6,7,8,9,10)
> > scala> val iter = l.iterator
> > scala> var count = 0
> > scala> for(i <- iter if count < 5) {
> > scala> iter.next
> > java.util.NoSuchElementException: next on empty iterator
> > ...
> >
> > This is the exact equivalent of
> > val l = List(1,2,3,4,5,6,7,8,9,10)
> > val iter = l.iterator
> > var count = 0
> > while(iter.hasNext) {
> >   if(iter.next < 5)
> >     count += 1
> > }
> >
> > -Jay
> >
> >
> > On Sun, Feb 9, 2014 at 9:43 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > I ran ConsumerPerformance with different number of messages config for
> a
> > > topic that has 100s of messages in the log. It did not hang in any of
> > these
> > > tests -
> > >
> > > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> > > *--messages
> > > 10* --threads 1
> > > start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec,
> > > data.consumed.in.nMsg, nMsg.sec
> > > 1 messages read
> > > 2 messages read
> > > 3 messages read
> > > 4 messages read
> > > 5 messages read
> > > 6 messages read
> > > 7 messages read
> > > 8 messages read
> > > 9 messages read
> > > 10 messages read
> > > *2014-02-09 21:39:47:385, 2014-02-09 21:39:52:641, 1048576, 0.0015,
> > 0.0060,
> > > 10, 39.0625*  (test ends with this)
> > > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> > > *--messages
> > > 2* --threads 1
> > > 1 messages read
> > > 2 messages read
> > > *2014-02-09 21:39:59:853, 2014-02-09 21:40:05:100, 1048576, 0.0003,
> > 0.0011,
> > > 2, 8.0972*      (test ends with this)
> > >
> > > The way I read the --messages config is "read exactly --messages # of
> > > messages, not less not more". If the topic has less than --messages
> > > records, it will hang until it gets more data, else it exits properly.
> > > Seems like expected behavior.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > >
> > > On Sat, Feb 8, 2014 at 2:11 PM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > It loops on
> > > >   for (messageAndMetadata <- stream if messagesRead <
> > > config.numMessages) {
> > > >    // count message
> > > > }
> > > >
> > > > That means it loops over *all* messages in the stream (eventually
> > > blocking
> > > > when all messages are exhausted) but only counts the first N as
> > > determined
> > > > by the config.
> > > >
> > > > As a result this command seems to always hang forever. Seems like
> > someone
> > > > "improved" this without ever trying to run it :-(
> > > >
> > > > -Jay
> > > >
> > >
> >
>

Re: The consumer performance test seems to be totally broken

Posted by Neha Narkhede <ne...@gmail.com>.
 Did you try this on an otherwise empty topic?

Yes and it blocks until --messages are read. But again, my interpretation
of that config has been - "read exactly --messages # of messages, not less
not more".

Thanks,
Neha


On Mon, Feb 10, 2014 at 9:31 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hmm, I could be wrong... Did you try this on an otherwise empty topic? Try
> this:
> 1. Produce 100GB of messages
> 2. Run consumer performance test with --messages 1
> 3. Go get some coffee
> 4. Observe performance tests results with a time taken to consume 100GB of
> messages but only recording a single message consumed
>
> :-)
>
> Here is what I mean:
> scala> val l = List(1,2,3,4,5,6,7,8,9,10)
> scala> val iter = l.iterator
> scala> var count = 0
> scala> for(i <- iter if count < 5) {
> scala> iter.next
> java.util.NoSuchElementException: next on empty iterator
> ...
>
> This is the exact equivalent of
> val l = List(1,2,3,4,5,6,7,8,9,10)
> val iter = l.iterator
> var count = 0
> while(iter.hasNext) {
>   if(iter.next < 5)
>     count += 1
> }
>
> -Jay
>
>
> On Sun, Feb 9, 2014 at 9:43 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > I ran ConsumerPerformance with different number of messages config for a
> > topic that has 100s of messages in the log. It did not hang in any of
> these
> > tests -
> >
> > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> > *--messages
> > 10* --threads 1
> > start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec,
> > data.consumed.in.nMsg, nMsg.sec
> > 1 messages read
> > 2 messages read
> > 3 messages read
> > 4 messages read
> > 5 messages read
> > 6 messages read
> > 7 messages read
> > 8 messages read
> > 9 messages read
> > 10 messages read
> > *2014-02-09 21:39:47:385, 2014-02-09 21:39:52:641, 1048576, 0.0015,
> 0.0060,
> > 10, 39.0625*  (test ends with this)
> > nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> > kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> > *--messages
> > 2* --threads 1
> > 1 messages read
> > 2 messages read
> > *2014-02-09 21:39:59:853, 2014-02-09 21:40:05:100, 1048576, 0.0003,
> 0.0011,
> > 2, 8.0972*      (test ends with this)
> >
> > The way I read the --messages config is "read exactly --messages # of
> > messages, not less not more". If the topic has less than --messages
> > records, it will hang until it gets more data, else it exits properly.
> > Seems like expected behavior.
> >
> > Thanks,
> > Neha
> >
> >
> >
> > On Sat, Feb 8, 2014 at 2:11 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > It loops on
> > >   for (messageAndMetadata <- stream if messagesRead <
> > config.numMessages) {
> > >    // count message
> > > }
> > >
> > > That means it loops over *all* messages in the stream (eventually
> > blocking
> > > when all messages are exhausted) but only counts the first N as
> > determined
> > > by the config.
> > >
> > > As a result this command seems to always hang forever. Seems like
> someone
> > > "improved" this without ever trying to run it :-(
> > >
> > > -Jay
> > >
> >
>

Re: The consumer performance test seems to be totally broken

Posted by Jay Kreps <ja...@gmail.com>.
Hmm, I could be wrong... Did you try this on an otherwise empty topic? Try
this:
1. Produce 100GB of messages
2. Run consumer performance test with --messages 1
3. Go get some coffee
4. Observe performance tests results with a time taken to consume 100GB of
messages but only recording a single message consumed

:-)

Here is what I mean:
scala> val l = List(1,2,3,4,5,6,7,8,9,10)
scala> val iter = l.iterator
scala> var count = 0
scala> for(i <- iter if count < 5) {
scala> iter.next
java.util.NoSuchElementException: next on empty iterator
...

This is the exact equivalent of
val l = List(1,2,3,4,5,6,7,8,9,10)
val iter = l.iterator
var count = 0
while(iter.hasNext) {
  if(iter.next < 5)
    count += 1
}

-Jay


On Sun, Feb 9, 2014 at 9:43 PM, Neha Narkhede <ne...@gmail.com>wrote:

> I ran ConsumerPerformance with different number of messages config for a
> topic that has 100s of messages in the log. It did not hang in any of these
> tests -
>
> nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> *--messages
> 10* --threads 1
> start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec,
> data.consumed.in.nMsg, nMsg.sec
> 1 messages read
> 2 messages read
> 3 messages read
> 4 messages read
> 5 messages read
> 6 messages read
> 7 messages read
> 8 messages read
> 9 messages read
> 10 messages read
> *2014-02-09 21:39:47:385, 2014-02-09 21:39:52:641, 1048576, 0.0015, 0.0060,
> 10, 39.0625*  (test ends with this)
> nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
> kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
> *--messages
> 2* --threads 1
> 1 messages read
> 2 messages read
> *2014-02-09 21:39:59:853, 2014-02-09 21:40:05:100, 1048576, 0.0003, 0.0011,
> 2, 8.0972*      (test ends with this)
>
> The way I read the --messages config is "read exactly --messages # of
> messages, not less not more". If the topic has less than --messages
> records, it will hang until it gets more data, else it exits properly.
> Seems like expected behavior.
>
> Thanks,
> Neha
>
>
>
> On Sat, Feb 8, 2014 at 2:11 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > It loops on
> >   for (messageAndMetadata <- stream if messagesRead <
> config.numMessages) {
> >    // count message
> > }
> >
> > That means it loops over *all* messages in the stream (eventually
> blocking
> > when all messages are exhausted) but only counts the first N as
> determined
> > by the config.
> >
> > As a result this command seems to always hang forever. Seems like someone
> > "improved" this without ever trying to run it :-(
> >
> > -Jay
> >
>

Re: The consumer performance test seems to be totally broken

Posted by Neha Narkhede <ne...@gmail.com>.
I ran ConsumerPerformance with different number of messages config for a
topic that has 100s of messages in the log. It did not hang in any of these
tests -

nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
*--messages
10* --threads 1
start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec,
data.consumed.in.nMsg, nMsg.sec
1 messages read
2 messages read
3 messages read
4 messages read
5 messages read
6 messages read
7 messages read
8 messages read
9 messages read
10 messages read
*2014-02-09 21:39:47:385, 2014-02-09 21:39:52:641, 1048576, 0.0015, 0.0060,
10, 39.0625*  (test ends with this)
nnarkhed-mn1:kafka-git-idea nnarkhed$ ./bin/kafka-run-class.sh
kafka.perf.ConsumerPerformance --zookeeper localhost:2181 --topic test
*--messages
2* --threads 1
1 messages read
2 messages read
*2014-02-09 21:39:59:853, 2014-02-09 21:40:05:100, 1048576, 0.0003, 0.0011,
2, 8.0972*      (test ends with this)

The way I read the --messages config is "read exactly --messages # of
messages, not less not more". If the topic has less than --messages
records, it will hang until it gets more data, else it exits properly.
Seems like expected behavior.

Thanks,
Neha



On Sat, Feb 8, 2014 at 2:11 PM, Jay Kreps <ja...@gmail.com> wrote:

> It loops on
>   for (messageAndMetadata <- stream if messagesRead < config.numMessages) {
>    // count message
> }
>
> That means it loops over *all* messages in the stream (eventually blocking
> when all messages are exhausted) but only counts the first N as determined
> by the config.
>
> As a result this command seems to always hang forever. Seems like someone
> "improved" this without ever trying to run it :-(
>
> -Jay
>