You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ashutosh singh <ge...@gmail.com> on 2019/11/13 12:40:30 UTC

Partition Reassignment is getting stuck

Hi,

All of a  sudden I see under replicated partition in our Kafka cluster and
it is not getting replicated.  It seems it is getting stuck somewhere. In
sync replica is missing only form one of the broker it seems there is some
issue with that broker but other hand there are many others topic on that
node and they are working fine.  I have tried rolling restart of all the
nodes in cluster but that didn't help.
I tried manual reassignment of that particular topic but that is getting
stuck forever.  So I had to kill the reassignment by deleting
/admin/reassign_partitions  node.  I restarted zookeeper so that leader
gets change and then tried to reassign partitions but again it is getting
stuck.

I really appreciate if someone can help to understand the issue.

No of nodes : 8
Version : 2.1.1

-- 
Thanks
Ashu

Re: Partition Reassignment is getting stuck

Posted by Ashutosh singh <ge...@gmail.com>.
just restart of broker didn't help.  I  deleted couple of random partitions
from the data directory which were under replicated. I also noticed that
their timestamp was 4 days old. After deleting them and restarting the
broker all of the other topics got synced up.

May be it was the case of offline directory.  I will check this metric
offlineLogDirectoryCount and probably put a monitoring on this.
Thank you all.


Thanks
Ashu



On Thu, Nov 14, 2019 at 3:07 AM Liam Clarke <li...@adscale.co.nz>
wrote:

> If only one broker isn't in sync, it can caused by a dead replica fetcher
> thread in my experience. I fixed it by restarting the affected broker, but
> this was on 0.11, so YMMV.
>
>
>
> On Thu, Nov 14, 2019 at 9:35 AM Koushik Chitta
> <kc...@microsoft.com.invalid> wrote:
>
> > The topic partition having the ISR issue might be on a offline directory.
> > Look into the metric "offlineLogDirectoryCount" or use  kafka-log-dirs.sh
> > to understand the issue with that directory. In most cases, it would be
> the
> > a KafkaStorage Exception.
> > The partition reassignment would also be stuck/waiting because of this,
> > when the reassignment json contains an offline directory .
> >
> >
> > -----Original Message-----
> > From: M. Manna <ma...@gmail.com>
> > Sent: Wednesday, November 13, 2019 5:23 AM
> > To: Kafka Users <us...@kafka.apache.org>
> > Subject: Re: Partition Reassignment is getting stuck
> >
> > On Wed, 13 Nov 2019 at 13:10, Ashutosh singh <ge...@gmail.com> wrote:
> >
> > > Yeah, Although it wouldn't have any impact but I will have to try this
> > > tonight as it is peak business hours now.
> > >  Instead deleting all data I will try to delete topic partitions which
> > > are having issues and then restart the broker.  I believe it should
> > > catch up but I will let you know.
> > >
> >
> >  Since you're doing it OOB hours, it should be fine. The issue you're
> > mentioning here is not uncommon, but such occurrence should be close to
> > minuscule. As long as you have >=3 replicas you should be able to do this
> > comfortably.
> >
> > Thanks,
> >
> > >
> > >
> > >
> > > On Wed, Nov 13, 2019 at 6:23 PM M. Manna <ma...@gmail.com> wrote:
> > >
> > > > On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > All of a  sudden I see under replicated partition in our Kafka
> > > > > cluster
> > > > and
> > > > > it is not getting replicated.  It seems it is getting stuck
> > somewhere.
> > > In
> > > > > sync replica is missing only form one of the broker it seems there
> > > > > is
> > > > some
> > > > > issue with that broker but other hand there are many others topic
> > > > > on
> > > that
> > > > > node and they are working fine.  I have tried rolling restart of
> > > > > all
> > > the
> > > > > nodes in cluster but that didn't help.
> > > > > I tried manual reassignment of that particular topic but that is
> > > getting
> > > > > stuck forever.  So I had to kill the reassignment by deleting
> > > > > /admin/reassign_partitions  node.  I restarted zookeeper so that
> > > > > leader gets change and then tried to reassign partitions but again
> > > > > it is
> > > getting
> > > > > stuck.
> > > > >
> > > > > I really appreciate if someone can help to understand the issue.
> > > > >
> > > >
> > > > If all you have is 1 broker not in sync - can you please try to stop
> > > > that broker, delete all the data files on that broker, and restart?
> > > > It should catch up.
> > > >
> > > >
> > > > >
> > > > > No of nodes : 8
> > > > > Version : 2.1.1
> > > > >
> > > > > --
> > > > > Thanks
> > > > > Ashu
> > > > >
> > > >
> > >
> > >
> > > --
> > > Thanx & Regard
> > > Ashutosh Singh
> > > 08151945559
> > >
> >
>


-- 
Thanx & Regard
Ashutosh Singh
08151945559

Re: Partition Reassignment is getting stuck

Posted by Liam Clarke <li...@adscale.co.nz>.
If only one broker isn't in sync, it can caused by a dead replica fetcher
thread in my experience. I fixed it by restarting the affected broker, but
this was on 0.11, so YMMV.



On Thu, Nov 14, 2019 at 9:35 AM Koushik Chitta
<kc...@microsoft.com.invalid> wrote:

> The topic partition having the ISR issue might be on a offline directory.
> Look into the metric "offlineLogDirectoryCount" or use  kafka-log-dirs.sh
> to understand the issue with that directory. In most cases, it would be the
> a KafkaStorage Exception.
> The partition reassignment would also be stuck/waiting because of this,
> when the reassignment json contains an offline directory .
>
>
> -----Original Message-----
> From: M. Manna <ma...@gmail.com>
> Sent: Wednesday, November 13, 2019 5:23 AM
> To: Kafka Users <us...@kafka.apache.org>
> Subject: Re: Partition Reassignment is getting stuck
>
> On Wed, 13 Nov 2019 at 13:10, Ashutosh singh <ge...@gmail.com> wrote:
>
> > Yeah, Although it wouldn't have any impact but I will have to try this
> > tonight as it is peak business hours now.
> >  Instead deleting all data I will try to delete topic partitions which
> > are having issues and then restart the broker.  I believe it should
> > catch up but I will let you know.
> >
>
>  Since you're doing it OOB hours, it should be fine. The issue you're
> mentioning here is not uncommon, but such occurrence should be close to
> minuscule. As long as you have >=3 replicas you should be able to do this
> comfortably.
>
> Thanks,
>
> >
> >
> >
> > On Wed, Nov 13, 2019 at 6:23 PM M. Manna <ma...@gmail.com> wrote:
> >
> > > On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > All of a  sudden I see under replicated partition in our Kafka
> > > > cluster
> > > and
> > > > it is not getting replicated.  It seems it is getting stuck
> somewhere.
> > In
> > > > sync replica is missing only form one of the broker it seems there
> > > > is
> > > some
> > > > issue with that broker but other hand there are many others topic
> > > > on
> > that
> > > > node and they are working fine.  I have tried rolling restart of
> > > > all
> > the
> > > > nodes in cluster but that didn't help.
> > > > I tried manual reassignment of that particular topic but that is
> > getting
> > > > stuck forever.  So I had to kill the reassignment by deleting
> > > > /admin/reassign_partitions  node.  I restarted zookeeper so that
> > > > leader gets change and then tried to reassign partitions but again
> > > > it is
> > getting
> > > > stuck.
> > > >
> > > > I really appreciate if someone can help to understand the issue.
> > > >
> > >
> > > If all you have is 1 broker not in sync - can you please try to stop
> > > that broker, delete all the data files on that broker, and restart?
> > > It should catch up.
> > >
> > >
> > > >
> > > > No of nodes : 8
> > > > Version : 2.1.1
> > > >
> > > > --
> > > > Thanks
> > > > Ashu
> > > >
> > >
> >
> >
> > --
> > Thanx & Regard
> > Ashutosh Singh
> > 08151945559
> >
>

RE: Partition Reassignment is getting stuck

Posted by Koushik Chitta <kc...@microsoft.com.INVALID>.
The topic partition having the ISR issue might be on a offline directory. Look into the metric "offlineLogDirectoryCount" or use  kafka-log-dirs.sh to understand the issue with that directory. In most cases, it would be the a KafkaStorage Exception. 
The partition reassignment would also be stuck/waiting because of this, when the reassignment json contains an offline directory .


-----Original Message-----
From: M. Manna <ma...@gmail.com> 
Sent: Wednesday, November 13, 2019 5:23 AM
To: Kafka Users <us...@kafka.apache.org>
Subject: Re: Partition Reassignment is getting stuck

On Wed, 13 Nov 2019 at 13:10, Ashutosh singh <ge...@gmail.com> wrote:

> Yeah, Although it wouldn't have any impact but I will have to try this 
> tonight as it is peak business hours now.
>  Instead deleting all data I will try to delete topic partitions which 
> are having issues and then restart the broker.  I believe it should 
> catch up but I will let you know.
>

 Since you're doing it OOB hours, it should be fine. The issue you're mentioning here is not uncommon, but such occurrence should be close to minuscule. As long as you have >=3 replicas you should be able to do this comfortably.

Thanks,

>
>
>
> On Wed, Nov 13, 2019 at 6:23 PM M. Manna <ma...@gmail.com> wrote:
>
> > On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > All of a  sudden I see under replicated partition in our Kafka 
> > > cluster
> > and
> > > it is not getting replicated.  It seems it is getting stuck somewhere.
> In
> > > sync replica is missing only form one of the broker it seems there 
> > > is
> > some
> > > issue with that broker but other hand there are many others topic 
> > > on
> that
> > > node and they are working fine.  I have tried rolling restart of 
> > > all
> the
> > > nodes in cluster but that didn't help.
> > > I tried manual reassignment of that particular topic but that is
> getting
> > > stuck forever.  So I had to kill the reassignment by deleting 
> > > /admin/reassign_partitions  node.  I restarted zookeeper so that 
> > > leader gets change and then tried to reassign partitions but again 
> > > it is
> getting
> > > stuck.
> > >
> > > I really appreciate if someone can help to understand the issue.
> > >
> >
> > If all you have is 1 broker not in sync - can you please try to stop 
> > that broker, delete all the data files on that broker, and restart? 
> > It should catch up.
> >
> >
> > >
> > > No of nodes : 8
> > > Version : 2.1.1
> > >
> > > --
> > > Thanks
> > > Ashu
> > >
> >
>
>
> --
> Thanx & Regard
> Ashutosh Singh
> 08151945559
>

Re: Partition Reassignment is getting stuck

Posted by "M. Manna" <ma...@gmail.com>.
On Wed, 13 Nov 2019 at 13:10, Ashutosh singh <ge...@gmail.com> wrote:

> Yeah, Although it wouldn't have any impact but I will have to try this
> tonight as it is peak business hours now.
>  Instead deleting all data I will try to delete topic partitions which are
> having issues and then restart the broker.  I believe it should catch up
> but I will let you know.
>

 Since you're doing it OOB hours, it should be fine. The issue you're
mentioning here is not uncommon, but such occurrence should be close to
minuscule. As long as you have >=3 replicas you should be able to do this
comfortably.

Thanks,

>
>
>
> On Wed, Nov 13, 2019 at 6:23 PM M. Manna <ma...@gmail.com> wrote:
>
> > On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > All of a  sudden I see under replicated partition in our Kafka cluster
> > and
> > > it is not getting replicated.  It seems it is getting stuck somewhere.
> In
> > > sync replica is missing only form one of the broker it seems there is
> > some
> > > issue with that broker but other hand there are many others topic on
> that
> > > node and they are working fine.  I have tried rolling restart of all
> the
> > > nodes in cluster but that didn't help.
> > > I tried manual reassignment of that particular topic but that is
> getting
> > > stuck forever.  So I had to kill the reassignment by deleting
> > > /admin/reassign_partitions  node.  I restarted zookeeper so that leader
> > > gets change and then tried to reassign partitions but again it is
> getting
> > > stuck.
> > >
> > > I really appreciate if someone can help to understand the issue.
> > >
> >
> > If all you have is 1 broker not in sync - can you please try to stop that
> > broker, delete all the data files on that broker, and restart? It should
> > catch up.
> >
> >
> > >
> > > No of nodes : 8
> > > Version : 2.1.1
> > >
> > > --
> > > Thanks
> > > Ashu
> > >
> >
>
>
> --
> Thanx & Regard
> Ashutosh Singh
> 08151945559
>

Re: Partition Reassignment is getting stuck

Posted by Ashutosh singh <ge...@gmail.com>.
Yeah, Although it wouldn't have any impact but I will have to try this
tonight as it is peak business hours now.
 Instead deleting all data I will try to delete topic partitions which are
having issues and then restart the broker.  I believe it should catch up
but I will let you know.



On Wed, Nov 13, 2019 at 6:23 PM M. Manna <ma...@gmail.com> wrote:

> On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com> wrote:
>
> > Hi,
> >
> > All of a  sudden I see under replicated partition in our Kafka cluster
> and
> > it is not getting replicated.  It seems it is getting stuck somewhere. In
> > sync replica is missing only form one of the broker it seems there is
> some
> > issue with that broker but other hand there are many others topic on that
> > node and they are working fine.  I have tried rolling restart of all the
> > nodes in cluster but that didn't help.
> > I tried manual reassignment of that particular topic but that is getting
> > stuck forever.  So I had to kill the reassignment by deleting
> > /admin/reassign_partitions  node.  I restarted zookeeper so that leader
> > gets change and then tried to reassign partitions but again it is getting
> > stuck.
> >
> > I really appreciate if someone can help to understand the issue.
> >
>
> If all you have is 1 broker not in sync - can you please try to stop that
> broker, delete all the data files on that broker, and restart? It should
> catch up.
>
>
> >
> > No of nodes : 8
> > Version : 2.1.1
> >
> > --
> > Thanks
> > Ashu
> >
>


-- 
Thanx & Regard
Ashutosh Singh
08151945559

Re: Partition Reassignment is getting stuck

Posted by "M. Manna" <ma...@gmail.com>.
On Wed, 13 Nov 2019 at 12:41, Ashutosh singh <ge...@gmail.com> wrote:

> Hi,
>
> All of a  sudden I see under replicated partition in our Kafka cluster and
> it is not getting replicated.  It seems it is getting stuck somewhere. In
> sync replica is missing only form one of the broker it seems there is some
> issue with that broker but other hand there are many others topic on that
> node and they are working fine.  I have tried rolling restart of all the
> nodes in cluster but that didn't help.
> I tried manual reassignment of that particular topic but that is getting
> stuck forever.  So I had to kill the reassignment by deleting
> /admin/reassign_partitions  node.  I restarted zookeeper so that leader
> gets change and then tried to reassign partitions but again it is getting
> stuck.
>
> I really appreciate if someone can help to understand the issue.
>

If all you have is 1 broker not in sync - can you please try to stop that
broker, delete all the data files on that broker, and restart? It should
catch up.


>
> No of nodes : 8
> Version : 2.1.1
>
> --
> Thanks
> Ashu
>