You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Json Tu <ka...@126.com> on 2017/11/09 13:17:19 UTC

Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Hi,
    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and 16G memory on each broker’s machine, and we have about 1600 topics in the cluster，about 1700 partitions’ leader and 1600 partitions' replica on each broker.
    when we restart a normal broke,  we find that there are 500+ partitions shrink and expand frequently when restart the broker，
there are many logs as below.

   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
[2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition)
[2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition)
…


    and repeat shrink and expand after 30 minutes which is the default value of leader.imbalance.check.interval.seconds, and at that time
we can find the log of controller’s auto rebalance，which can leads some partition’s leader change to this restarted broker.
    we have no shrink and expand when our cluster is running except when we restart it，so replica.fetch.thread.num is 1，and it seems enough.

    we can reproduce it at each restart，can someone give some suggestions. thanks before.

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.

Yep, the team here, including Ismael, pointed me in the right direction,
which was much appreciated. :)

On Thu, Nov 9, 2017 at 10:02 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> I'm happy that it's solved :)
>
> On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:
>
> > Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> > heap was due to OOM errors that were being thrown when I upgraded from
> > 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> > 0.9.0.1 because we needed to support the older clients and the
> > corresponding format. once I set the message format to 0.9.0.1, the
> memory
> > requirements went WAY down, I reset the memory heap to 6 GB, and our
> Kafka
> > cluster has been awesome since.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> > wrote:
> >
> > > Hi Json.
> > >
> > > John might have a point. It is not reasonable to have more than 6-8GB
> of
> > > heap provided for the JVM that's running Kafka. One of the reason is GC
> > > time and the other is that Kafka relies heavily on the OS' disk
> > read/write
> > > in-memory caching.
> > > Also there were a few synchronization bugs in 0.9 which caused similar
> > > problems. I would recommend you to upgrade to 1.0.0 if that is
> feasible.
> > >
> > > Viktor
> > >
> > >
> > > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com>
> wrote:
> > >
> > > > I've seen this before and it was due to long GC pauses due in large
> > part
> > > to
> > > > a memory heap > 8 GB.
> > > >
> > > > --John
> > > >
> > > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > > >
> > > > > Hi,
> > > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> > and
> > > > > 16G memory on each broker’s machine, and we have about 1600 topics
> in
> > > the
> > > > > cluster，about 1700 partitions’ leader and 1600 partitions' replica
> on
> > > > each
> > > > > broker.
> > > > >     when we restart a normal broke,  we find that there are 500+
> > > > > partitions shrink and expand frequently when restart the broker，
> > > > > there are many logs as below.
> > > > >
> > > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> > 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > …
> > > > >
> > > > >
> > > > >     and repeat shrink and expand after 30 minutes which is the
> > default
> > > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > > we can find the log of controller’s auto rebalance，which can leads
> > some
> > > > > partition’s leader change to this restarted broker.
> > > > >     we have no shrink and expand when our cluster is running except
> > > when
> > > > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > > > >
> > > > >     we can reproduce it at each restart，can someone give some
> > > > suggestions.
> > > > > thanks before.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.

Yep, the team here, including Ismael, pointed me in the right direction,
which was much appreciated. :)

On Thu, Nov 9, 2017 at 10:02 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> I'm happy that it's solved :)
>
> On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:
>
> > Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> > heap was due to OOM errors that were being thrown when I upgraded from
> > 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> > 0.9.0.1 because we needed to support the older clients and the
> > corresponding format. once I set the message format to 0.9.0.1, the
> memory
> > requirements went WAY down, I reset the memory heap to 6 GB, and our
> Kafka
> > cluster has been awesome since.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> > wrote:
> >
> > > Hi Json.
> > >
> > > John might have a point. It is not reasonable to have more than 6-8GB
> of
> > > heap provided for the JVM that's running Kafka. One of the reason is GC
> > > time and the other is that Kafka relies heavily on the OS' disk
> > read/write
> > > in-memory caching.
> > > Also there were a few synchronization bugs in 0.9 which caused similar
> > > problems. I would recommend you to upgrade to 1.0.0 if that is
> feasible.
> > >
> > > Viktor
> > >
> > >
> > > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com>
> wrote:
> > >
> > > > I've seen this before and it was due to long GC pauses due in large
> > part
> > > to
> > > > a memory heap > 8 GB.
> > > >
> > > > --John
> > > >
> > > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > > >
> > > > > Hi,
> > > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> > and
> > > > > 16G memory on each broker’s machine, and we have about 1600 topics
> in
> > > the
> > > > > cluster，about 1700 partitions’ leader and 1600 partitions' replica
> on
> > > > each
> > > > > broker.
> > > > >     when we restart a normal broke,  we find that there are 500+
> > > > > partitions shrink and expand frequently when restart the broker，
> > > > > there are many logs as below.
> > > > >
> > > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> > 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to
> 4759726
> > > > > (kafka.cluster.Partition)
> > > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > > Expanding ISR for partition [Yelp,5] from 4759726 to
> 4759726,4759750
> > > > > (kafka.cluster.Partition)
> > > > > …
> > > > >
> > > > >
> > > > >     and repeat shrink and expand after 30 minutes which is the
> > default
> > > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > > we can find the log of controller’s auto rebalance，which can leads
> > some
> > > > > partition’s leader change to this restarted broker.
> > > > >     we have no shrink and expand when our cluster is running except
> > > when
> > > > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > > > >
> > > > >     we can reproduce it at each restart，can someone give some
> > > > suggestions.
> > > > > thanks before.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.

I'm happy that it's solved :)

On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:

> Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> heap was due to OOM errors that were being thrown when I upgraded from
> 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> 0.9.0.1 because we needed to support the older clients and the
> corresponding format. once I set the message format to 0.9.0.1, the memory
> requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
> cluster has been awesome since.
>
> --John
>
> On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> wrote:
>
> > Hi Json.
> >
> > John might have a point. It is not reasonable to have more than 6-8GB of
> > heap provided for the JVM that's running Kafka. One of the reason is GC
> > time and the other is that Kafka relies heavily on the OS' disk
> read/write
> > in-memory caching.
> > Also there were a few synchronization bugs in 0.9 which caused similar
> > problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
> >
> > Viktor
> >
> >
> > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
> >
> > > I've seen this before and it was due to long GC pauses due in large
> part
> > to
> > > a memory heap > 8 GB.
> > >
> > > --John
> > >
> > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > >
> > > > Hi,
> > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> and
> > > > 16G memory on each broker’s machine, and we have about 1600 topics in
> > the
> > > > cluster，about 1700 partitions’ leader and 1600 partitions' replica on
> > > each
> > > > broker.
> > > >     when we restart a normal broke,  we find that there are 500+
> > > > partitions shrink and expand frequently when restart the broker，
> > > > there are many logs as below.
> > > >
> > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > …
> > > >
> > > >
> > > >     and repeat shrink and expand after 30 minutes which is the
> default
> > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > we can find the log of controller’s auto rebalance，which can leads
> some
> > > > partition’s leader change to this restarted broker.
> > > >     we have no shrink and expand when our cluster is running except
> > when
> > > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > > >
> > > >     we can reproduce it at each restart，can someone give some
> > > suggestions.
> > > > thanks before.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by Viktor Somogyi <vi...@gmail.com>.

I'm happy that it's solved :)

On Thu, Nov 9, 2017 at 3:32 PM, John Yost <ho...@gmail.com> wrote:

> Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
> heap was due to OOM errors that were being thrown when I upgraded from
> 0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
> 0.9.0.1 because we needed to support the older clients and the
> corresponding format. once I set the message format to 0.9.0.1, the memory
> requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
> cluster has been awesome since.
>
> --John
>
> On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
> wrote:
>
> > Hi Json.
> >
> > John might have a point. It is not reasonable to have more than 6-8GB of
> > heap provided for the JVM that's running Kafka. One of the reason is GC
> > time and the other is that Kafka relies heavily on the OS' disk
> read/write
> > in-memory caching.
> > Also there were a few synchronization bugs in 0.9 which caused similar
> > problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
> >
> > Viktor
> >
> >
> > On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
> >
> > > I've seen this before and it was due to long GC pauses due in large
> part
> > to
> > > a memory heap > 8 GB.
> > >
> > > --John
> > >
> > > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> > >
> > > > Hi,
> > > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu
> and
> > > > 16G memory on each broker’s machine, and we have about 1600 topics in
> > the
> > > > cluster，about 1700 partitions’ leader and 1600 partitions' replica on
> > > each
> > > > broker.
> > > >     when we restart a normal broke,  we find that there are 500+
> > > > partitions shrink and expand frequently when restart the broker，
> > > > there are many logs as below.
> > > >
> > > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker
> 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > > (kafka.cluster.Partition)
> > > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > > (kafka.cluster.Partition)
> > > > …
> > > >
> > > >
> > > >     and repeat shrink and expand after 30 minutes which is the
> default
> > > > value of leader.imbalance.check.interval.seconds, and at that time
> > > > we can find the log of controller’s auto rebalance，which can leads
> some
> > > > partition’s leader change to this restarted broker.
> > > >     we have no shrink and expand when our cluster is running except
> > when
> > > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > > >
> > > >     we can reproduce it at each restart，can someone give some
> > > suggestions.
> > > > thanks before.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.

Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
heap was due to OOM errors that were being thrown when I upgraded from
0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
0.9.0.1 because we needed to support the older clients and the
corresponding format. once I set the message format to 0.9.0.1, the memory
requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
cluster has been awesome since.

--John

On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> Hi Json.
>
> John might have a point. It is not reasonable to have more than 6-8GB of
> heap provided for the JVM that's running Kafka. One of the reason is GC
> time and the other is that Kafka relies heavily on the OS' disk read/write
> in-memory caching.
> Also there were a few synchronization bugs in 0.9 which caused similar
> problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
>
> Viktor
>
>
> On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
>
> > I've seen this before and it was due to long GC pauses due in large part
> to
> > a memory heap > 8 GB.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> >
> > > Hi,
> > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > > 16G memory on each broker’s machine, and we have about 1600 topics in
> the
> > > cluster，about 1700 partitions’ leader and 1600 partitions' replica on
> > each
> > > broker.
> > >     when we restart a normal broke,  we find that there are 500+
> > > partitions shrink and expand frequently when restart the broker，
> > > there are many logs as below.
> > >
> > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > …
> > >
> > >
> > >     and repeat shrink and expand after 30 minutes which is the default
> > > value of leader.imbalance.check.interval.seconds, and at that time
> > > we can find the log of controller’s auto rebalance，which can leads some
> > > partition’s leader change to this restarted broker.
> > >     we have no shrink and expand when our cluster is running except
> when
> > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > >
> > >     we can reproduce it at each restart，can someone give some
> > suggestions.
> > > thanks before.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker

Posted by John Yost <ho...@gmail.com>.

Excellent points Viktor! Also, the reason I mistakenly went > 8 GB memory
heap was due to OOM errors that were being thrown when I upgraded from
0.9.0.1 to 0.10.0.0 and forgot to explicitly set the message format to
0.9.0.1 because we needed to support the older clients and the
corresponding format. once I set the message format to 0.9.0.1, the memory
requirements went WAY down, I reset the memory heap to 6 GB, and our Kafka
cluster has been awesome since.

--John

On Thu, Nov 9, 2017 at 9:09 AM, Viktor Somogyi <vi...@gmail.com>
wrote:

> Hi Json.
>
> John might have a point. It is not reasonable to have more than 6-8GB of
> heap provided for the JVM that's running Kafka. One of the reason is GC
> time and the other is that Kafka relies heavily on the OS' disk read/write
> in-memory caching.
> Also there were a few synchronization bugs in 0.9 which caused similar
> problems. I would recommend you to upgrade to 1.0.0 if that is feasible.
>
> Viktor
>
>
> On Thu, Nov 9, 2017 at 2:59 PM, John Yost <ho...@gmail.com> wrote:
>
> > I've seen this before and it was due to long GC pauses due in large part
> to
> > a memory heap > 8 GB.
> >
> > --John
> >
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <ka...@126.com> wrote:
> >
> > > Hi,
> > >     we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> > > 16G memory on each broker’s machine, and we have about 1600 topics in
> the
> > > cluster，about 1700 partitions’ leader and 1600 partitions' replica on
> > each
> > > broker.
> > >     when we restart a normal broke,  we find that there are 500+
> > > partitions shrink and expand frequently when restart the broker，
> > > there are many logs as below.
> > >
> > >    [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> > > Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> > > (kafka.cluster.Partition)
> > > [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> > > Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> > > (kafka.cluster.Partition)
> > > …
> > >
> > >
> > >     and repeat shrink and expand after 30 minutes which is the default
> > > value of leader.imbalance.check.interval.seconds, and at that time
> > > we can find the log of controller’s auto rebalance，which can leads some
> > > partition’s leader change to this restarted broker.
> > >     we have no shrink and expand when our cluster is running except
> when
> > > we restart it，so replica.fetch.thread.num is 1，and it seems enough.
> > >
> > >     we can reproduce it at each restart，can someone give some
> > suggestions.
> > > thanks before.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>