You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Prabhjot Bharaj <pr...@gmail.com> on 2015/09/03 16:56:12 UTC

Slow ISR catch-up

Hi Folks,

Request your expertise on my doubt here.

*My setup:-*

5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
I've modified ProducerPerformance.scala to send custom ASCII data, instead
of Byte Array of Zeroes

*server.properties:-*

broker.id=0

log.cleaner.enable=false

log.dirs=/tmp/kafka-logs

log.retention.check.interval.ms=300000

log.retention.hours=168

log.segment.bytes=1073741824

num.io.threads=8

num.network.threads=3

num.partitions=1

num.recovery.threads.per.data.dir=1

*num.replica.fetchers=4*

port=9092

socket.receive.buffer.bytes=1048576

socket.request.max.bytes=104857600

socket.send.buffer.bytes=1048576

zookeeper.connect=localhost:2181

zookeeper.connection.timeout.ms=6000


*This is how I run the producer perf test:-*

kafka-producer-perf-test.sh --broker-list
a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092 --messages
100000 --message-size 500 --topics temp --show-detailed-stats  --threads 5
--request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
--compression-codec 0

*Problem:-*

This test completes in under 15 seconds for me

But, after this test, if I try writing to another topic which has 2
partitions and 3 replicas, it is dead slow and the same script seems never
to finish because the slow ISR catch-up is still going on.

*My inference:-*
I have noticed that for a topic with 1 partition and 3 replicas, the ISR
shows only 1 broker id.

Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:

Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5


I think it is because the data from the leader is not received in broker
ids 1 and 2
Also, I could confirm it from the data directory sizes for this topic.
Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB

*Doubts:-*
1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
means that all data must have been committed to all replicas. But, with the
replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
producer.

Am I missing something ?

Regards,
Prabhjot

Re: Slow ISR catch-up

Posted by Prabhjot Bharaj <pr...@gmail.com>.

Hello friends,

Request your expertise on this problem I'm facing

Thanks
On Sep 4, 2015 8:09 PM, "Prabhjot Bharaj" <pr...@gmail.com> wrote:

> Hi,
>
> I am experiencing super slow throughput when using acks=-1
> Some further progress in continuation to the test in my previous email:-
>
> *Topic details -*
>
> Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
>
> Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5,2,1
> *This is the command I'm running - *
>
> kafka-producer-perf-test.sh --broker-list 96.7.250.122:9092,
> 96.17.183.53:9092,96.17.183.54:9092,96.7.250.117:9092,96.7.250.118:9092
> --messages 100000 --message-size 500 --topics temp --show-detailed-stats
> --threads 30 --request-num-acks -1 --batch-size 1000 --request-timeout-ms
> 10000
>
> *Server.properties:-*
>
> broker.id=0
>
> port=9092
>
> *num.network.threads=6*
>
> num.io.threads=8
>
> *socket.send.buffer.bytes=10485760*
>
> *socket.receive.buffer.bytes=10485760*
>
> socket.request.max.bytes=104857600
>
> log.dirs=/tmp/kafka-logs
>
> num.partitions=1
>
> num.recovery.threads.per.data.dir=1
>
> log.retention.hours=168
>
> log.segment.bytes=1073741824
>
> log.retention.check.interval.ms=300000
>
> log.cleaner.enable=false
>
> zookeeper.connect=localhost:2181
>
> zookeeper.connection.timeout.ms=6000
>
> *num.replica.fetchers=6*
> *Observation:-*
>
> I have also noticed that if I use acks=1 (without --sync) and immediately
> use acks=-1 (without --sync), the test completes very quickly. Also, after
> running  this, if I describe the topic, it is still not in sync, which
> means acks=-1 is treated as 1 only
>
> Also, when running on a freshly created topic with just acks=-1, it takes
> 8 minutes to complete with --sync
>
> time kafka-producer-perf-test.sh --broker-list 96.7.250.122:9092,
> 96.17.183.53:9092,96.17.183.54:9092,96.7.250.117:9092,96.7.250.118:9092
> --messages 100000 --message-size 500 --topics temp --show-detailed-stats
> --threads 30 --request-num-acks -1 --batch-size 1000 --request-timeout-ms
> 10000 --compression-codec 2 --sync
>
> start.time, end.time, compression, message.size, batch.size,
> total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
>
> 2015-09-04 12:19:36:223, 2015-09-04 12:27:41:775, 2, 500, 1000, 47.68,
> 0.0982, 99990, 205.9306
>
> real 8m6.563s
>
> user 0m19.787s
>
> sys 0m5.601s
>
> If I use --sync, it is taking way longer.
>
> Where am I doing wrong?
>
> Thanks,
> Prabhjot
>
> On Fri, Sep 4, 2015 at 1:45 AM, Gwen Shapira <gw...@confluent.io> wrote:
>
>> Yes, this should work. Expect lower throughput though.
>>
>> On Thu, Sep 3, 2015 at 12:52 PM, Prabhjot Bharaj <pr...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Can I use sync for acks = -1?
>> >
>> > Regards,
>> > Prabhjot
>> > On Sep 3, 2015 11:49 PM, "Gwen Shapira" <gw...@confluent.io> wrote:
>> >
>> > > The test uses the old producer (we should fix that), and since you
>> don't
>> > > specify --sync, it runs async.
>> > > The old async producer simply sends data and doesn't wait for acks,
>> so it
>> > > is possible that the messages were never acked...
>> > >
>> > > On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj <
>> prabhbharaj@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Folks,
>> > > >
>> > > > Request your expertise on my doubt here.
>> > > >
>> > > > *My setup:-*
>> > > >
>> > > > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
>> > > > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
>> > > > I've modified ProducerPerformance.scala to send custom ASCII data,
>> > > instead
>> > > > of Byte Array of Zeroes
>> > > >
>> > > > *server.properties:-*
>> > > >
>> > > > broker.id=0
>> > > >
>> > > > log.cleaner.enable=false
>> > > >
>> > > > log.dirs=/tmp/kafka-logs
>> > > >
>> > > > log.retention.check.interval.ms=300000
>> > > >
>> > > > log.retention.hours=168
>> > > >
>> > > > log.segment.bytes=1073741824
>> > > >
>> > > > num.io.threads=8
>> > > >
>> > > > num.network.threads=3
>> > > >
>> > > > num.partitions=1
>> > > >
>> > > > num.recovery.threads.per.data.dir=1
>> > > >
>> > > > *num.replica.fetchers=4*
>> > > >
>> > > > port=9092
>> > > >
>> > > > socket.receive.buffer.bytes=1048576
>> > > >
>> > > > socket.request.max.bytes=104857600
>> > > >
>> > > > socket.send.buffer.bytes=1048576
>> > > >
>> > > > zookeeper.connect=localhost:2181
>> > > >
>> > > > zookeeper.connection.timeout.ms=6000
>> > > >
>> > > >
>> > > > *This is how I run the producer perf test:-*
>> > > >
>> > > > kafka-producer-perf-test.sh --broker-list
>> > > > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
>> > > --messages
>> > > > 100000 --message-size 500 --topics temp --show-detailed-stats
>> > --threads
>> > > 5
>> > > > --request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
>> > > > --compression-codec 0
>> > > >
>> > > > *Problem:-*
>> > > >
>> > > > This test completes in under 15 seconds for me
>> > > >
>> > > > But, after this test, if I try writing to another topic which has 2
>> > > > partitions and 3 replicas, it is dead slow and the same script seems
>> > > never
>> > > > to finish because the slow ISR catch-up is still going on.
>> > > >
>> > > > *My inference:-*
>> > > > I have noticed that for a topic with 1 partition and 3 replicas, the
>> > ISR
>> > > > shows only 1 broker id.
>> > > >
>> > > > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
>> > > >
>> > > > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
>> > > >
>> > > >
>> > > > I think it is because the data from the leader is not received in
>> > broker
>> > > > ids 1 and 2
>> > > > Also, I could confirm it from the data directory sizes for this
>> topic.
>> > > > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
>> > > >
>> > > > *Doubts:-*
>> > > > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1,
>> > which
>> > > > means that all data must have been committed to all replicas. But,
>> with
>> > > the
>> > > > replicas still at 7GB, it doesnt seem that acks=-1 is considered by
>> the
>> > > > producer.
>> > > >
>> > > > Am I missing something ?
>> > > >
>> > > > Regards,
>> > > > Prabhjot
>> > > >
>> > >
>> >
>>
>
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>

Re: Slow ISR catch-up

Posted by Prabhjot Bharaj <pr...@gmail.com>.

Hi,

I am experiencing super slow throughput when using acks=-1
Some further progress in continuation to the test in my previous email:-

*Topic details -*

Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:

Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5,2,1
*This is the command I'm running - *

kafka-producer-perf-test.sh --broker-list 96.7.250.122:9092,
96.17.183.53:9092,96.17.183.54:9092,96.7.250.117:9092,96.7.250.118:9092
--messages 100000 --message-size 500 --topics temp --show-detailed-stats
--threads 30 --request-num-acks -1 --batch-size 1000 --request-timeout-ms
10000

*Server.properties:-*

broker.id=0

port=9092

*num.network.threads=6*

num.io.threads=8

*socket.send.buffer.bytes=10485760*

*socket.receive.buffer.bytes=10485760*

socket.request.max.bytes=104857600

log.dirs=/tmp/kafka-logs

num.partitions=1

num.recovery.threads.per.data.dir=1

log.retention.hours=168

log.segment.bytes=1073741824

log.retention.check.interval.ms=300000

log.cleaner.enable=false

zookeeper.connect=localhost:2181

zookeeper.connection.timeout.ms=6000

*num.replica.fetchers=6*
*Observation:-*

I have also noticed that if I use acks=1 (without --sync) and immediately
use acks=-1 (without --sync), the test completes very quickly. Also, after
running  this, if I describe the topic, it is still not in sync, which
means acks=-1 is treated as 1 only

Also, when running on a freshly created topic with just acks=-1, it takes 8
minutes to complete with --sync

time kafka-producer-perf-test.sh --broker-list 96.7.250.122:9092,
96.17.183.53:9092,96.17.183.54:9092,96.7.250.117:9092,96.7.250.118:9092
--messages 100000 --message-size 500 --topics temp --show-detailed-stats
--threads 30 --request-num-acks -1 --batch-size 1000 --request-timeout-ms
10000 --compression-codec 2 --sync

start.time, end.time, compression, message.size, batch.size,
total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec

2015-09-04 12:19:36:223, 2015-09-04 12:27:41:775, 2, 500, 1000, 47.68,
0.0982, 99990, 205.9306

real 8m6.563s

user 0m19.787s

sys 0m5.601s

If I use --sync, it is taking way longer.

Where am I doing wrong?

Thanks,
Prabhjot

On Fri, Sep 4, 2015 at 1:45 AM, Gwen Shapira <gw...@confluent.io> wrote:

> Yes, this should work. Expect lower throughput though.
>
> On Thu, Sep 3, 2015 at 12:52 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Can I use sync for acks = -1?
> >
> > Regards,
> > Prabhjot
> > On Sep 3, 2015 11:49 PM, "Gwen Shapira" <gw...@confluent.io> wrote:
> >
> > > The test uses the old producer (we should fix that), and since you
> don't
> > > specify --sync, it runs async.
> > > The old async producer simply sends data and doesn't wait for acks, so
> it
> > > is possible that the messages were never acked...
> > >
> > > On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj <prabhbharaj@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > Request your expertise on my doubt here.
> > > >
> > > > *My setup:-*
> > > >
> > > > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> > > > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> > > > I've modified ProducerPerformance.scala to send custom ASCII data,
> > > instead
> > > > of Byte Array of Zeroes
> > > >
> > > > *server.properties:-*
> > > >
> > > > broker.id=0
> > > >
> > > > log.cleaner.enable=false
> > > >
> > > > log.dirs=/tmp/kafka-logs
> > > >
> > > > log.retention.check.interval.ms=300000
> > > >
> > > > log.retention.hours=168
> > > >
> > > > log.segment.bytes=1073741824
> > > >
> > > > num.io.threads=8
> > > >
> > > > num.network.threads=3
> > > >
> > > > num.partitions=1
> > > >
> > > > num.recovery.threads.per.data.dir=1
> > > >
> > > > *num.replica.fetchers=4*
> > > >
> > > > port=9092
> > > >
> > > > socket.receive.buffer.bytes=1048576
> > > >
> > > > socket.request.max.bytes=104857600
> > > >
> > > > socket.send.buffer.bytes=1048576
> > > >
> > > > zookeeper.connect=localhost:2181
> > > >
> > > > zookeeper.connection.timeout.ms=6000
> > > >
> > > >
> > > > *This is how I run the producer perf test:-*
> > > >
> > > > kafka-producer-perf-test.sh --broker-list
> > > > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
> > > --messages
> > > > 100000 --message-size 500 --topics temp --show-detailed-stats
> > --threads
> > > 5
> > > > --request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
> > > > --compression-codec 0
> > > >
> > > > *Problem:-*
> > > >
> > > > This test completes in under 15 seconds for me
> > > >
> > > > But, after this test, if I try writing to another topic which has 2
> > > > partitions and 3 replicas, it is dead slow and the same script seems
> > > never
> > > > to finish because the slow ISR catch-up is still going on.
> > > >
> > > > *My inference:-*
> > > > I have noticed that for a topic with 1 partition and 3 replicas, the
> > ISR
> > > > shows only 1 broker id.
> > > >
> > > > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
> > > >
> > > > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
> > > >
> > > >
> > > > I think it is because the data from the leader is not received in
> > broker
> > > > ids 1 and 2
> > > > Also, I could confirm it from the data directory sizes for this
> topic.
> > > > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
> > > >
> > > > *Doubts:-*
> > > > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1,
> > which
> > > > means that all data must have been committed to all replicas. But,
> with
> > > the
> > > > replicas still at 7GB, it doesnt seem that acks=-1 is considered by
> the
> > > > producer.
> > > >
> > > > Am I missing something ?
> > > >
> > > > Regards,
> > > > Prabhjot
> > > >
> > >
> >
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Slow ISR catch-up

Posted by Gwen Shapira <gw...@confluent.io>.

Yes, this should work. Expect lower throughput though.

On Thu, Sep 3, 2015 at 12:52 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> Can I use sync for acks = -1?
>
> Regards,
> Prabhjot
> On Sep 3, 2015 11:49 PM, "Gwen Shapira" <gw...@confluent.io> wrote:
>
> > The test uses the old producer (we should fix that), and since you don't
> > specify --sync, it runs async.
> > The old async producer simply sends data and doesn't wait for acks, so it
> > is possible that the messages were never acked...
> >
> > On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj <pr...@gmail.com>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > Request your expertise on my doubt here.
> > >
> > > *My setup:-*
> > >
> > > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> > > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> > > I've modified ProducerPerformance.scala to send custom ASCII data,
> > instead
> > > of Byte Array of Zeroes
> > >
> > > *server.properties:-*
> > >
> > > broker.id=0
> > >
> > > log.cleaner.enable=false
> > >
> > > log.dirs=/tmp/kafka-logs
> > >
> > > log.retention.check.interval.ms=300000
> > >
> > > log.retention.hours=168
> > >
> > > log.segment.bytes=1073741824
> > >
> > > num.io.threads=8
> > >
> > > num.network.threads=3
> > >
> > > num.partitions=1
> > >
> > > num.recovery.threads.per.data.dir=1
> > >
> > > *num.replica.fetchers=4*
> > >
> > > port=9092
> > >
> > > socket.receive.buffer.bytes=1048576
> > >
> > > socket.request.max.bytes=104857600
> > >
> > > socket.send.buffer.bytes=1048576
> > >
> > > zookeeper.connect=localhost:2181
> > >
> > > zookeeper.connection.timeout.ms=6000
> > >
> > >
> > > *This is how I run the producer perf test:-*
> > >
> > > kafka-producer-perf-test.sh --broker-list
> > > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
> > --messages
> > > 100000 --message-size 500 --topics temp --show-detailed-stats
> --threads
> > 5
> > > --request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
> > > --compression-codec 0
> > >
> > > *Problem:-*
> > >
> > > This test completes in under 15 seconds for me
> > >
> > > But, after this test, if I try writing to another topic which has 2
> > > partitions and 3 replicas, it is dead slow and the same script seems
> > never
> > > to finish because the slow ISR catch-up is still going on.
> > >
> > > *My inference:-*
> > > I have noticed that for a topic with 1 partition and 3 replicas, the
> ISR
> > > shows only 1 broker id.
> > >
> > > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
> > >
> > > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
> > >
> > >
> > > I think it is because the data from the leader is not received in
> broker
> > > ids 1 and 2
> > > Also, I could confirm it from the data directory sizes for this topic.
> > > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
> > >
> > > *Doubts:-*
> > > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1,
> which
> > > means that all data must have been committed to all replicas. But, with
> > the
> > > replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> > > producer.
> > >
> > > Am I missing something ?
> > >
> > > Regards,
> > > Prabhjot
> > >
> >
>

Re: Slow ISR catch-up

Posted by Prabhjot Bharaj <pr...@gmail.com>.

Hi,

Can I use sync for acks = -1?

Regards,
Prabhjot
On Sep 3, 2015 11:49 PM, "Gwen Shapira" <gw...@confluent.io> wrote:

> The test uses the old producer (we should fix that), and since you don't
> specify --sync, it runs async.
> The old async producer simply sends data and doesn't wait for acks, so it
> is possible that the messages were never acked...
>
> On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
> > Hi Folks,
> >
> > Request your expertise on my doubt here.
> >
> > *My setup:-*
> >
> > 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> > Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> > I've modified ProducerPerformance.scala to send custom ASCII data,
> instead
> > of Byte Array of Zeroes
> >
> > *server.properties:-*
> >
> > broker.id=0
> >
> > log.cleaner.enable=false
> >
> > log.dirs=/tmp/kafka-logs
> >
> > log.retention.check.interval.ms=300000
> >
> > log.retention.hours=168
> >
> > log.segment.bytes=1073741824
> >
> > num.io.threads=8
> >
> > num.network.threads=3
> >
> > num.partitions=1
> >
> > num.recovery.threads.per.data.dir=1
> >
> > *num.replica.fetchers=4*
> >
> > port=9092
> >
> > socket.receive.buffer.bytes=1048576
> >
> > socket.request.max.bytes=104857600
> >
> > socket.send.buffer.bytes=1048576
> >
> > zookeeper.connect=localhost:2181
> >
> > zookeeper.connection.timeout.ms=6000
> >
> >
> > *This is how I run the producer perf test:-*
> >
> > kafka-producer-perf-test.sh --broker-list
> > a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092
> --messages
> > 100000 --message-size 500 --topics temp --show-detailed-stats  --threads
> 5
> > --request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
> > --compression-codec 0
> >
> > *Problem:-*
> >
> > This test completes in under 15 seconds for me
> >
> > But, after this test, if I try writing to another topic which has 2
> > partitions and 3 replicas, it is dead slow and the same script seems
> never
> > to finish because the slow ISR catch-up is still going on.
> >
> > *My inference:-*
> > I have noticed that for a topic with 1 partition and 3 replicas, the ISR
> > shows only 1 broker id.
> >
> > Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
> >
> > Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
> >
> >
> > I think it is because the data from the leader is not received in broker
> > ids 1 and 2
> > Also, I could confirm it from the data directory sizes for this topic.
> > Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
> >
> > *Doubts:-*
> > 1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
> > means that all data must have been committed to all replicas. But, with
> the
> > replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> > producer.
> >
> > Am I missing something ?
> >
> > Regards,
> > Prabhjot
> >
>

Re: Slow ISR catch-up

Posted by Gwen Shapira <gw...@confluent.io>.

The test uses the old producer (we should fix that), and since you don't
specify --sync, it runs async.
The old async producer simply sends data and doesn't wait for acks, so it
is possible that the messages were never acked...

On Thu, Sep 3, 2015 at 7:56 AM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi Folks,
>
> Request your expertise on my doubt here.
>
> *My setup:-*
>
> 5 node kafka cluster (4 cores, 8GB RAM) on RAID-6 (500 GB)
> Using Kafka 0.8.2.1 with modified ProducerPerformance.scala
> I've modified ProducerPerformance.scala to send custom ASCII data, instead
> of Byte Array of Zeroes
>
> *server.properties:-*
>
> broker.id=0
>
> log.cleaner.enable=false
>
> log.dirs=/tmp/kafka-logs
>
> log.retention.check.interval.ms=300000
>
> log.retention.hours=168
>
> log.segment.bytes=1073741824
>
> num.io.threads=8
>
> num.network.threads=3
>
> num.partitions=1
>
> num.recovery.threads.per.data.dir=1
>
> *num.replica.fetchers=4*
>
> port=9092
>
> socket.receive.buffer.bytes=1048576
>
> socket.request.max.bytes=104857600
>
> socket.send.buffer.bytes=1048576
>
> zookeeper.connect=localhost:2181
>
> zookeeper.connection.timeout.ms=6000
>
>
> *This is how I run the producer perf test:-*
>
> kafka-producer-perf-test.sh --broker-list
> a.a.a.a:9092,b.b.b.b:9092,c.c.c.c:9092,d.d.d.d:9092,e.e.e.e:9092 --messages
> 100000 --message-size 500 --topics temp --show-detailed-stats  --threads 5
> --request-num-acks -1 --batch-size 200 --request-timeout-ms 10000
> --compression-codec 0
>
> *Problem:-*
>
> This test completes in under 15 seconds for me
>
> But, after this test, if I try writing to another topic which has 2
> partitions and 3 replicas, it is dead slow and the same script seems never
> to finish because the slow ISR catch-up is still going on.
>
> *My inference:-*
> I have noticed that for a topic with 1 partition and 3 replicas, the ISR
> shows only 1 broker id.
>
> Topic:temp PartitionCount:1 ReplicationFactor:3 Configs:
>
> Topic: temp Partition: 0 Leader: 5 Replicas: 5,1,2 Isr: 5
>
>
> I think it is because the data from the leader is not received in broker
> ids 1 and 2
> Also, I could confirm it from the data directory sizes for this topic.
> Leader (5) has 20GB but replicas - 1 and 2 are still at 7GB
>
> *Doubts:-*
> 1. But, I was running the kafka-producer-perf-test.sh with acks=-1, which
> means that all data must have been committed to all replicas. But, with the
> replicas still at 7GB, it doesnt seem that acks=-1 is considered by the
> producer.
>
> Am I missing something ?
>
> Regards,
> Prabhjot
>