You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Json Tu <ka...@126.com> on 2017/11/09 13:17:19 UTC

Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Hi,
    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and 16G memory on each broker’s machine, and we have about 1600 topics in the cluster,about 1700 partitions’ leader and 1600 partitions' replica on each broker.
    when we restart a normal broke,  we find that there are 500+ partitions shrink and expand frequently when restart the broker,
there are many logs as below.

   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
…


    and repeat shrink and expand after 30 minutes which is the default value of leader.imbalance.check.interval.seconds, and at that time
we can find the log of controller’s auto rebalance,which can leads some partition’s leader change to this restarted broker.
    we have no shrink and expand when our cluster is running except when we restart it,so replica.fetch.thread.num is 1,and it seems enough.

    we can reproduce it at each restart,can someone give some suggestions. thanks before.
    
    
	
    




Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
Yep, the team here, including Ismael, pointed me in the right direction,
which was much appreciated. :)

On Thu, Nov 9, 2017 at 10:02 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> I'm happy that it's solved :)
>
> On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:
>
> > Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> > heap was due to OOM errors that were being thrown when I upgraded from
> > 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> > 0.9.0.1 because we needed to support the older clients and the
> > corresponding format. once I set the message format to 0.9.0.1, the
> memory
> > requirements went WAY down, I reset the memory heap to 6 GB, and our
> Kafka
> > cluster has been awesome since.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> > wrote:
> >
> > > Hi Json.
> > >
> > > John might have a point. It is not reasonable to have more than 6-8GB
> of
> > > heap provided for the JVM that's running Kafka. One of the reason is GC
> > > time and the other is that Kafka relies heavily on the OS' disk
> > read/write
> > > in-memory caching.
> > > Also there were a few synchronization bugs in 0.9 which caused similar
> > > problems. I would recommend you to upgrade to 1.0.0 if that is
> feasible.
> > >
> > > Viktor
> > >
> > >
> > > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com>
> wrote:
> > >
> > > > I've seen this before and it was due to long GC pauses due in large
> > part
> > > to
> > > > a memory heap > 8 GB.
> > > >
> > > > --John
> > > >
> > > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > > >
> > > > > Hi,
> > > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> > and
> > > > > 16G memory on each broker’s machine, and we have about 1600 topics
> in
> > > the
> > > > > cluster,about 1700 partitions’ leader and 1600 partitions' replica
> on
> > > > each
> > > > > broker.
> > > > >     when we restart a normal broke,  we find that there are 500+
> > > > > partitions shrink and expand frequently when restart the broker,
> > > > > there are many logs as below.
> > > > >
> > > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> > 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > …
> > > > >
> > > > >
> > > > >     and repeat shrink and expand after 30 minutes which is the
> > default
> > > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > > we can find the log of controller’s auto rebalance,which can leads
> > some
> > > > > partition’s leader change to this restarted broker.
> > > > >     we have no shrink and expand when our cluster is running except
> > > when
> > > > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > > > >
> > > > >     we can reproduce it at each restart,can someone give some
> > > > suggestions.
> > > > > thanks before.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
Yep, the team here, including Ismael, pointed me in the right direction,
which was much appreciated. :)

On Thu, Nov 9, 2017 at 10:02 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> I'm happy that it's solved :)
>
> On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:
>
> > Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> > heap was due to OOM errors that were being thrown when I upgraded from
> > 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> > 0.9.0.1 because we needed to support the older clients and the
> > corresponding format. once I set the message format to 0.9.0.1, the
> memory
> > requirements went WAY down, I reset the memory heap to 6 GB, and our
> Kafka
> > cluster has been awesome since.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> > wrote:
> >
> > > Hi Json.
> > >
> > > John might have a point. It is not reasonable to have more than 6-8GB
> of
> > > heap provided for the JVM that's running Kafka. One of the reason is GC
> > > time and the other is that Kafka relies heavily on the OS' disk
> > read/write
> > > in-memory caching.
> > > Also there were a few synchronization bugs in 0.9 which caused similar
> > > problems. I would recommend you to upgrade to 1.0.0 if that is
> feasible.
> > >
> > > Viktor
> > >
> > >
> > > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com>
> wrote:
> > >
> > > > I've seen this before and it was due to long GC pauses due in large
> > part
> > > to
> > > > a memory heap > 8 GB.
> > > >
> > > > --John
> > > >
> > > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > > >
> > > > > Hi,
> > > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> > and
> > > > > 16G memory on each broker’s machine, and we have about 1600 topics
> in
> > > the
> > > > > cluster,about 1700 partitions’ leader and 1600 partitions' replica
> on
> > > > each
> > > > > broker.
> > > > >     when we restart a normal broke,  we find that there are 500+
> > > > > partitions shrink and expand frequently when restart the broker,
> > > > > there are many logs as below.
> > > > >
> > > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> > 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > …
> > > > >
> > > > >
> > > > >     and repeat shrink and expand after 30 minutes which is the
> > default
> > > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > > we can find the log of controller’s auto rebalance,which can leads
> > some
> > > > > partition’s leader change to this restarted broker.
> > > > >     we have no shrink and expand when our cluster is running except
> > > when
> > > > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > > > >
> > > > >     we can reproduce it at each restart,can someone give some
> > > > suggestions.
> > > > > thanks before.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.
I'm happy that it's solved :)

On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:

> Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> heap was due to OOM errors that were being thrown when I upgraded from
> 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> 0.9.0.1 because we needed to support the older clients and the
> corresponding format. once I set the message format to 0.9.0.1, the memory
> requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
> cluster has been awesome since.
>
> --John
>
> On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> wrote:
>
> > Hi Json.
> >
> > John might have a point. It is not reasonable to have more than 6-8GB of
> > heap provided for the JVM that's running Kafka. One of the reason is GC
> > time and the other is that Kafka relies heavily on the OS' disk
> read/write
> > in-memory caching.
> > Also there were a few synchronization bugs in 0.9 which caused similar
> > problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
> >
> > Viktor
> >
> >
> > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
> >
> > > I've seen this before and it was due to long GC pauses due in large
> part
> > to
> > > a memory heap > 8 GB.
> > >
> > > --John
> > >
> > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > >
> > > > Hi,
> > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> and
> > > > 16G memory on each broker’s machine, and we have about 1600 topics in
> > the
> > > > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> > > each
> > > > broker.
> > > >     when we restart a normal broke,  we find that there are 500+
> > > > partitions shrink and expand frequently when restart the broker,
> > > > there are many logs as below.
> > > >
> > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > …
> > > >
> > > >
> > > >     and repeat shrink and expand after 30 minutes which is the
> default
> > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > we can find the log of controller’s auto rebalance,which can leads
> some
> > > > partition’s leader change to this restarted broker.
> > > >     we have no shrink and expand when our cluster is running except
> > when
> > > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > > >
> > > >     we can reproduce it at each restart,can someone give some
> > > suggestions.
> > > > thanks before.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.
I'm happy that it's solved :)

On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:

> Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> heap was due to OOM errors that were being thrown when I upgraded from
> 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> 0.9.0.1 because we needed to support the older clients and the
> corresponding format. once I set the message format to 0.9.0.1, the memory
> requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
> cluster has been awesome since.
>
> --John
>
> On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> wrote:
>
> > Hi Json.
> >
> > John might have a point. It is not reasonable to have more than 6-8GB of
> > heap provided for the JVM that's running Kafka. One of the reason is GC
> > time and the other is that Kafka relies heavily on the OS' disk
> read/write
> > in-memory caching.
> > Also there were a few synchronization bugs in 0.9 which caused similar
> > problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
> >
> > Viktor
> >
> >
> > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
> >
> > > I've seen this before and it was due to long GC pauses due in large
> part
> > to
> > > a memory heap > 8 GB.
> > >
> > > --John
> > >
> > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > >
> > > > Hi,
> > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> and
> > > > 16G memory on each broker’s machine, and we have about 1600 topics in
> > the
> > > > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> > > each
> > > > broker.
> > > >     when we restart a normal broke,  we find that there are 500+
> > > > partitions shrink and expand frequently when restart the broker,
> > > > there are many logs as below.
> > > >
> > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > …
> > > >
> > > >
> > > >     and repeat shrink and expand after 30 minutes which is the
> default
> > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > we can find the log of controller’s auto rebalance,which can leads
> some
> > > > partition’s leader change to this restarted broker.
> > > >     we have no shrink and expand when our cluster is running except
> > when
> > > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > > >
> > > >     we can reproduce it at each restart,can someone give some
> > > suggestions.
> > > > thanks before.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
heap was due to OOM errors that were being thrown when I upgraded from
0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
0.9.0.1 because we needed to support the older clients and the
corresponding format. once I set the message format to 0.9.0.1, the memory
requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
cluster has been awesome since.

--John

On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> Hi Json.
>
> John might have a point. It is not reasonable to have more than 6-8GB of
> heap provided for the JVM that's running Kafka. One of the reason is GC
> time and the other is that Kafka relies heavily on the OS' disk read/write
> in-memory caching.
> Also there were a few synchronization bugs in 0.9 which caused similar
> problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
>
> Viktor
>
>
> On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
>
> > I've seen this before and it was due to long GC pauses due in large part
> to
> > a memory heap > 8 GB.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> >
> > > Hi,
> > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > > 16G memory on each broker’s machine, and we have about 1600 topics in
> the
> > > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> > each
> > > broker.
> > >     when we restart a normal broke,  we find that there are 500+
> > > partitions shrink and expand frequently when restart the broker,
> > > there are many logs as below.
> > >
> > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > …
> > >
> > >
> > >     and repeat shrink and expand after 30 minutes which is the default
> > > value of leader.imbalance.check.interval.seconds, and at that time
> > > we can find the log of controller’s auto rebalance,which can leads some
> > > partition’s leader change to this restarted broker.
> > >     we have no shrink and expand when our cluster is running except
> when
> > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > >
> > >     we can reproduce it at each restart,can someone give some
> > suggestions.
> > > thanks before.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
heap was due to OOM errors that were being thrown when I upgraded from
0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
0.9.0.1 because we needed to support the older clients and the
corresponding format. once I set the message format to 0.9.0.1, the memory
requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
cluster has been awesome since.

--John

On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> Hi Json.
>
> John might have a point. It is not reasonable to have more than 6-8GB of
> heap provided for the JVM that's running Kafka. One of the reason is GC
> time and the other is that Kafka relies heavily on the OS' disk read/write
> in-memory caching.
> Also there were a few synchronization bugs in 0.9 which caused similar
> problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
>
> Viktor
>
>
> On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
>
> > I've seen this before and it was due to long GC pauses due in large part
> to
> > a memory heap > 8 GB.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> >
> > > Hi,
> > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > > 16G memory on each broker’s machine, and we have about 1600 topics in
> the
> > > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> > each
> > > broker.
> > >     when we restart a normal broke,  we find that there are 500+
> > > partitions shrink and expand frequently when restart the broker,
> > > there are many logs as below.
> > >
> > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > …
> > >
> > >
> > >     and repeat shrink and expand after 30 minutes which is the default
> > > value of leader.imbalance.check.interval.seconds, and at that time
> > > we can find the log of controller’s auto rebalance,which can leads some
> > > partition’s leader change to this restarted broker.
> > >     we have no shrink and expand when our cluster is running except
> when
> > > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> > >
> > >     we can reproduce it at each restart,can someone give some
> > suggestions.
> > > thanks before.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.
Hi Json.

John might have a point. It is not reasonable to have more than 6-8GB of
heap provided for the JVM that's running Kafka. One of the reason is GC
time and the other is that Kafka relies heavily on the OS' disk read/write
in-memory caching.
Also there were a few synchronization bugs in 0.9 which caused similar
problems. I would recommend you to upgrade to 1.0.0 if that is feasible.

Viktor


On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:

> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
>
> --John
>
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
>
> > Hi,
> >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > 16G memory on each broker’s machine, and we have about 1600 topics in the
> > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> each
> > broker.
> >     when we restart a normal broke,  we find that there are 500+
> > partitions shrink and expand frequently when restart the broker,
> > there are many logs as below.
> >
> >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > …
> >
> >
> >     and repeat shrink and expand after 30 minutes which is the default
> > value of leader.imbalance.check.interval.seconds, and at that time
> > we can find the log of controller’s auto rebalance,which can leads some
> > partition’s leader change to this restarted broker.
> >     we have no shrink and expand when our cluster is running except when
> > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >
> >     we can reproduce it at each restart,can someone give some
> suggestions.
> > thanks before.
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
The broker with broker id 4759750 is just restart,and there are 500+ replica partitions shrink and expand frequently,and there leader partition is distributed in the other 5 brokers. the log is pulled from one broker,and extract logs related to 1 partition.

> 在 2017年11月10日,下午12:06,Hu Xi <hu...@hotmail.com> 写道:
> 
> Seems broker `4759750` was always removed for partition [Yelp, 5] every round of ISR shrinking. Did you check if everything works alright for this broker?
> 
> 
> 发件人: Json Tu <ka...@126.com>
> 发送时间: 2017年11月10日 11:08
> 收件人: users@kafka.apache.org
> 抄送: dev@kafka.apache.org; Guozhang Wang
> 主题: Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker
>  
> I‘m so sorry for my poor english.
> 
> what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
> 
> we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
> when we restart other clusters,  there is no such phenomenon.
> 
> may be some metrics or logs can leads to find root cause of this phenomenon.
> Looking forward to more suggestions.
> 
> 
> > 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
> > 
> > I've seen this before and it was due to long GC pauses due in large part to
> > a memory heap > 8 GB.
> > 
> > --John
> > 
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > 
> >> Hi,
> >>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> >> 16G memory on each broker’s machine, and we have about 1600 topics in the
> >> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> >> broker.
> >>    when we restart a normal broke,  we find that there are 500+
> >> partitions shrink and expand frequently when restart the broker,
> >> there are many logs as below.
> >> 
> >>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> …
> >> 
> >> 
> >>    and repeat shrink and expand after 30 minutes which is the default
> >> value of leader.imbalance.check.interval.seconds, and at that time
> >> we can find the log of controller’s auto rebalance,which can leads some
> >> partition’s leader change to this restarted broker.
> >>    we have no shrink and expand when our cluster is running except when
> >> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >> 
> >>    we can reproduce it at each restart,can someone give some suggestions.
> >> thanks before.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
The broker with broker id 4759750 is just restart,and there are 500+ replica partitions shrink and expand frequently,and there leader partition is distributed in the other 5 brokers. the log is pulled from one broker,and extract logs related to 1 partition.

> 在 2017年11月10日,下午12:06,Hu Xi <hu...@hotmail.com> 写道:
> 
> Seems broker `4759750` was always removed for partition [Yelp, 5] every round of ISR shrinking. Did you check if everything works alright for this broker?
> 
> 
> 发件人: Json Tu <ka...@126.com>
> 发送时间: 2017年11月10日 11:08
> 收件人: users@kafka.apache.org
> 抄送: dev@kafka.apache.org; Guozhang Wang
> 主题: Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker
>  
> I‘m so sorry for my poor english.
> 
> what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
> 
> we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
> when we restart other clusters,  there is no such phenomenon.
> 
> may be some metrics or logs can leads to find root cause of this phenomenon.
> Looking forward to more suggestions.
> 
> 
> > 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
> > 
> > I've seen this before and it was due to long GC pauses due in large part to
> > a memory heap > 8 GB.
> > 
> > --John
> > 
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > 
> >> Hi,
> >>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> >> 16G memory on each broker’s machine, and we have about 1600 topics in the
> >> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> >> broker.
> >>    when we restart a normal broke,  we find that there are 500+
> >> partitions shrink and expand frequently when restart the broker,
> >> there are many logs as below.
> >> 
> >>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> …
> >> 
> >> 
> >>    and repeat shrink and expand after 30 minutes which is the default
> >> value of leader.imbalance.check.interval.seconds, and at that time
> >> we can find the log of controller’s auto rebalance,which can leads some
> >> partition’s leader change to this restarted broker.
> >>    we have no shrink and expand when our cluster is running except when
> >> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >> 
> >>    we can reproduce it at each restart,can someone give some suggestions.
> >> thanks before.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 


答复: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Hu Xi <hu...@hotmail.com>.
Seems broker `4759750` was always removed for partition [Yelp, 5] every round of ISR shrinking. Did you check if everything works alright for this broker?


________________________________
发件人: Json Tu <ka...@126.com>
发送时间: 2017年11月10日 11:08
收件人: users@kafka.apache.org
抄送: dev@kafka.apache.org; Guozhang Wang
主题: Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

I‘m so sorry for my poor english.

what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.

we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
when we restart other clusters,  there is no such phenomenon.

may be some metrics or logs can leads to find root cause of this phenomenon.
Looking forward to more suggestions.


> 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
>
> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
>
> --John
>
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
>
>> Hi,
>>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
>> 16G memory on each broker’s machine, and we have about 1600 topics in the
>> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
>> broker.
>>    when we restart a normal broke,  we find that there are 500+
>> partitions shrink and expand frequently when restart the broker,
>> there are many logs as below.
>>
>>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> …
>>
>>
>>    and repeat shrink and expand after 30 minutes which is the default
>> value of leader.imbalance.check.interval.seconds, and at that time
>> we can find the log of controller’s auto rebalance,which can leads some
>> partition’s leader change to this restarted broker.
>>    we have no shrink and expand when our cluster is running except when
>> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>>
>>    we can reproduce it at each restart,can someone give some suggestions.
>> thanks before.
>>
>>
>>
>>
>>
>>
>>
>>


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
someone can help to analysis it?

> 在 2017年11月10日,上午11:08,Json Tu <ka...@126.com> 写道:
> 
> I‘m so sorry for my poor english.
> 
> what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
> 
> we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
> when we restart other clusters,  there is no such phenomenon.
> 
> may be some metrics or logs can leads to find root cause of this phenomenon.
> Looking forward to more suggestions.
> 
> 
>> 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
>> 
>> I've seen this before and it was due to long GC pauses due in large part to
>> a memory heap > 8 GB.
>> 
>> --John
>> 
>> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
>> 
>>> Hi,
>>>   we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
>>> 16G memory on each broker’s machine, and we have about 1600 topics in the
>>> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
>>> broker.
>>>   when we restart a normal broke,  we find that there are 500+
>>> partitions shrink and expand frequently when restart the broker,
>>> there are many logs as below.
>>> 
>>>  [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> …
>>> 
>>> 
>>>   and repeat shrink and expand after 30 minutes which is the default
>>> value of leader.imbalance.check.interval.seconds, and at that time
>>> we can find the log of controller’s auto rebalance,which can leads some
>>> partition’s leader change to this restarted broker.
>>>   we have no shrink and expand when our cluster is running except when
>>> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>>> 
>>>   we can reproduce it at each restart,can someone give some suggestions.
>>> thanks before.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
someone can help to analysis it?

> 在 2017年11月10日,上午11:08,Json Tu <ka...@126.com> 写道:
> 
> I‘m so sorry for my poor english.
> 
> what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
> 
> we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
> when we restart other clusters,  there is no such phenomenon.
> 
> may be some metrics or logs can leads to find root cause of this phenomenon.
> Looking forward to more suggestions.
> 
> 
>> 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
>> 
>> I've seen this before and it was due to long GC pauses due in large part to
>> a memory heap > 8 GB.
>> 
>> --John
>> 
>> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
>> 
>>> Hi,
>>>   we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
>>> 16G memory on each broker’s machine, and we have about 1600 topics in the
>>> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
>>> broker.
>>>   when we restart a normal broke,  we find that there are 500+
>>> partitions shrink and expand frequently when restart the broker,
>>> there are many logs as below.
>>> 
>>>  [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
>>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>>> (kafka.cluster.Partition)
>>> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
>>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>>> (kafka.cluster.Partition)
>>> …
>>> 
>>> 
>>>   and repeat shrink and expand after 30 minutes which is the default
>>> value of leader.imbalance.check.interval.seconds, and at that time
>>> we can find the log of controller’s auto rebalance,which can leads some
>>> partition’s leader change to this restarted broker.
>>>   we have no shrink and expand when our cluster is running except when
>>> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>>> 
>>>   we can reproduce it at each restart,can someone give some suggestions.
>>> thanks before.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
I‘m so sorry for my poor english.

what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.

we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
when we restart other clusters,  there is no such phenomenon.

may be some metrics or logs can leads to find root cause of this phenomenon.
Looking forward to more suggestions.


> 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
> 
> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
> 
> --John
> 
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> 
>> Hi,
>>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
>> 16G memory on each broker’s machine, and we have about 1600 topics in the
>> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
>> broker.
>>    when we restart a normal broke,  we find that there are 500+
>> partitions shrink and expand frequently when restart the broker,
>> there are many logs as below.
>> 
>>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> …
>> 
>> 
>>    and repeat shrink and expand after 30 minutes which is the default
>> value of leader.imbalance.check.interval.seconds, and at that time
>> we can find the log of controller’s auto rebalance,which can leads some
>> partition’s leader change to this restarted broker.
>>    we have no shrink and expand when our cluster is running except when
>> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>> 
>>    we can reproduce it at each restart,can someone give some suggestions.
>> thanks before.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.
Hi Json.

John might have a point. It is not reasonable to have more than 6-8GB of
heap provided for the JVM that's running Kafka. One of the reason is GC
time and the other is that Kafka relies heavily on the OS' disk read/write
in-memory caching.
Also there were a few synchronization bugs in 0.9 which caused similar
problems. I would recommend you to upgrade to 1.0.0 if that is feasible.

Viktor


On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:

> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
>
> --John
>
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
>
> > Hi,
> >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > 16G memory on each broker’s machine, and we have about 1600 topics in the
> > cluster,about 1700 partitions’ leader and 1600 partitions' replica on
> each
> > broker.
> >     when we restart a normal broke,  we find that there are 500+
> > partitions shrink and expand frequently when restart the broker,
> > there are many logs as below.
> >
> >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > (kafka.cluster.Partition)
> > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > (kafka.cluster.Partition)
> > …
> >
> >
> >     and repeat shrink and expand after 30 minutes which is the default
> > value of leader.imbalance.check.interval.seconds, and at that time
> > we can find the log of controller’s auto rebalance,which can leads some
> > partition’s leader change to this restarted broker.
> >     we have no shrink and expand when our cluster is running except when
> > we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >
> >     we can reproduce it at each restart,can someone give some
> suggestions.
> > thanks before.
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Json Tu <ka...@126.com>.
I‘m so sorry for my poor english.

what I really means is my broker machine is configured as 8 core 16G. but my jvm configure is as below.
java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.

we have 30+ clusters with this jvm configure, and are deployed on the machine which configured as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions than other clusters.
when we restart other clusters,  there is no such phenomenon.

may be some metrics or logs can leads to find root cause of this phenomenon.
Looking forward to more suggestions.


> 在 2017年11月9日,下午9:59,John Yost <ho...@gmail.com> 写道:
> 
> I've seen this before and it was due to long GC pauses due in large part to
> a memory heap > 8 GB.
> 
> --John
> 
> On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> 
>> Hi,
>>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
>> 16G memory on each broker’s machine, and we have about 1600 topics in the
>> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
>> broker.
>>    when we restart a normal broke,  we find that there are 500+
>> partitions shrink and expand frequently when restart the broker,
>> there are many logs as below.
>> 
>>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
>> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
>> (kafka.cluster.Partition)
>> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
>> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
>> (kafka.cluster.Partition)
>> …
>> 
>> 
>>    and repeat shrink and expand after 30 minutes which is the default
>> value of leader.imbalance.check.interval.seconds, and at that time
>> we can find the log of controller’s auto rebalance,which can leads some
>> partition’s leader change to this restarted broker.
>>    we have no shrink and expand when our cluster is running except when
>> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>> 
>>    we can reproduce it at each restart,can someone give some suggestions.
>> thanks before.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
I've seen this before and it was due to long GC pauses due in large part to
a memory heap > 8 GB.

--John

On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:

> Hi,
>     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> 16G memory on each broker’s machine, and we have about 1600 topics in the
> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> broker.
>     when we restart a normal broke,  we find that there are 500+
> partitions shrink and expand frequently when restart the broker,
> there are many logs as below.
>
>    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> …
>
>
>     and repeat shrink and expand after 30 minutes which is the default
> value of leader.imbalance.check.interval.seconds, and at that time
> we can find the log of controller’s auto rebalance,which can leads some
> partition’s leader change to this restarted broker.
>     we have no shrink and expand when our cluster is running except when
> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>
>     we can reproduce it at each restart,can someone give some suggestions.
> thanks before.
>
>
>
>
>
>
>
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.
I've seen this before and it was due to long GC pauses due in large part to
a memory heap > 8 GB.

--John

On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:

> Hi,
>     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> 16G memory on each broker’s machine, and we have about 1600 topics in the
> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> broker.
>     when we restart a normal broke,  we find that there are 500+
> partitions shrink and expand frequently when restart the broker,
> there are many logs as below.
>
>    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> (kafka.cluster.Partition)
> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> (kafka.cluster.Partition)
> …
>
>
>     and repeat shrink and expand after 30 minutes which is the default
> value of leader.imbalance.check.interval.seconds, and at that time
> we can find the log of controller’s auto rebalance,which can leads some
> partition’s leader change to this restarted broker.
>     we have no shrink and expand when our cluster is running except when
> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
>
>     we can reproduce it at each restart,can someone give some suggestions.
> thanks before.
>
>
>
>
>
>
>
>