You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Misra, Rahul" <Ra...@altisource.com> on 2016/06/22 08:10:57 UTC

Kafka broker crash

Hi,

I'm facing a strange issue in my Kafka cluster. Could anybody please help me with it. The issue is as follows:

We have a 3 node kafka cluster. We installed the zookeeper separately and have pointed the brokers to it. The zookeeper is also 3 node, but for our POC setup, the zookeeper nodes are on the same machines as the Kafka brokers.

While receiving messages from an existing topic using a new groupId, 2 of the brokers crashed with same FATAL errors:

--------------------------------------------------------
<<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>

[2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group pocTestNew11 generation 1 (kafka.coordinator.Gro
upCoordinator)
[2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received from leader for group pocTestNew11 for genera
tion 1 (kafka.coordinator.GroupCoordinator)
[2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due to unrecoverable I/O error while handling p
roduce request:  (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-4'
        at kafka.log.Log.append(Log.scala:318)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
        at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
        at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
        at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
        at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
        at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
        at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
        at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
        at scala.Option.foreach(Option.scala:257)
        at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
        at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)

----------------------------------------------
<<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>

[2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error while replicating data. (kafka.server.ReplicaFe
tcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-4'
        at kafka.log.Log.append(Log.scala:318)
        at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
        at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:138)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
apply(AbstractFetcherThread.scala:122)
        at scala.Option.foreach(Option.scala:257)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFet
cherThread.scala:122)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such file or directory)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
        at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
        at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
        at kafka.log.Log.roll(Log.scala:627)
        at kafka.log.Log.maybeRoll(Log.scala:602)
        at kafka.log.Log.append(Log.scala:357)
        ... 19 more



For the topic "__consumer_offsets" which is used to commit consumer offsets the default number of partitions is 50 and the replication factor is 3.
So ideally all the 3 brokers should have logs for all partitions for "__consumer_offsets".
I checked the "/temp/kafka-logs" directory for each server and except for the broker 1, the other 2 brokers (server 2 and 3) do not contain replicas for all the partitions for "__consumer_offsets". There are log directories missing for many partitions for "__consumer_offsets" on brokers 2 and 3 (including partition 4 which resulted in the above crash).

What could be the cause for this crash. Is there any mis-configuration for the broker that can cause this?

Regards,
Rahul Misra

Technical Lead
Altisource(tm)
Mobile: 9886141541 | Ext: 298269
Rahul.Misra@Altisource.com<ma...@Altisource.com> | www.Altisource.com<http://www.altisource.com/>

This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. 
***********************************************************************************************************************

Re: Kafka broker crash

Posted by Radu Radutiu <rr...@gmail.com>.
/tmp is not a good location for storing files. It will get cleaned up
periodically, depending on your linux distribution.

Radu

On 22 June 2016 at 19:33, Misra, Rahul <Ra...@altisource.com> wrote:

> Hi Madhukar,
>
> Thanks for your quick response. The path is "/tmp/kafka-logs/". But the
> servers have not been restarted any time lately. The uptime for all the 3
> servers is almost 67 days.
>
> Regards,
> Rahul Misra
>
>
> -----Original Message-----
> From: Madhukar Bharti [mailto:bhartimadhukar@gmail.com]
> Sent: Wednesday, June 22, 2016 8:37 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka broker crash
>
> Hi Rahul,
>
> Whether the path is  "/tmp/kafka-logs/" or "/temp/kafka-logs" ?
>
> Mostly if path is set to "/tmp/" then in case machine restart it may
> delete the files. So it is throwing FileNotFoundException.
> you can change the file location to some other path and restart all broker.
> This might fix the issue.
>
> Regrads,
> Madhukar
>
> On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <Ra...@altisource.com>
> wrote:
>
> > Hi,
> >
> > I'm facing a strange issue in my Kafka cluster. Could anybody please
> > help me with it. The issue is as follows:
> >
> > We have a 3 node kafka cluster. We installed the zookeeper separately
> > and have pointed the brokers to it. The zookeeper is also 3 node, but
> > for our POC setup, the zookeeper nodes are on the same machines as the
> > Kafka brokers.
> >
> > While receiving messages from an existing topic using a new groupId, 2
> > of the brokers crashed with same FATAL errors:
> >
> > --------------------------------------------------------
> > <<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>
> >
> > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group
> > pocTestNew11 generation 1 (kafka.coordinator.Gro
> > upCoordinator)
> > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment
> > received from leader for group pocTestNew11 for genera tion 1
> > (kafka.coordinator.GroupCoordinator)
> > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting
> > due to unrecoverable I/O error while handling p roduce request:
> > (kafka.server.ReplicaManager)
> > kafka.common.KafkaStorageException: I/O exception in append to log
> > '__consumer_offsets-4'
> >         at kafka.log.Log.append(Log.scala:318)
> >         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
> >         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
> >         at
> > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
> >         at
> >
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
> >         at
> >
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
> >         at
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> >         at
> >
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> >         at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
> >         at
> > scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> >         at
> scala.collection.AbstractTraversable.map(Traversable.scala:104)
> >         at
> > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
> >         at
> > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
> >         at
> >
> kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
> >         at
> >
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
> >         at
> >
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
> >         at scala.Option.foreach(Option.scala:257)
> >         at
> >
> kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
> >         at
> > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
> >         at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> >         at
> > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.io.FileNotFoundException:
> > /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No
> > such file or directory)
> >         at java.io.RandomAccessFile.open0(Native Method)
> >         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> >         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
> >         at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
> >         at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
> >         at kafka.log.Log.roll(Log.scala:627)
> >         at kafka.log.Log.maybeRoll(Log.scala:602)
> >         at kafka.log.Log.append(Log.scala:357)
> >
> > ----------------------------------------------
> > <<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>
> >
> > [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error
> > while replicating data. (kafka.server.ReplicaFe
> > tcherThread)
> > kafka.common.KafkaStorageException: I/O exception in append to log
> > '__consumer_offsets-4'
> >         at kafka.log.Log.append(Log.scala:318)
> >         at
> >
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
> >         at
> >
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> > apply(AbstractFetcherThread.scala:138)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> > apply(AbstractFetcherThread.scala:122)
> >         at scala.Option.foreach(Option.scala:257)
> >         at
> > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$ano
> > nfun$apply$mcV$sp$1.apply(AbstractFet
> > cherThread.scala:122)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
> >         at
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >         at
> >
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >         at
> >
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> >         at
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> >         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
> >         at
> >
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at
> >
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
> >         at
> > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
> >         at
> > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> > Caused by: java.io.FileNotFoundException:
> > /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No
> > such file or directory)
> >         at java.io.RandomAccessFile.open0(Native Method)
> >         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> >         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
> >         at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
> >         at
> > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> >         at
> >
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
> >         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
> >         at kafka.log.Log.roll(Log.scala:627)
> >         at kafka.log.Log.maybeRoll(Log.scala:602)
> >         at kafka.log.Log.append(Log.scala:357)
> >         ... 19 more
> >
> >
> >
> > For the topic "__consumer_offsets" which is used to commit consumer
> > offsets the default number of partitions is 50 and the replication
> > factor is 3.
> > So ideally all the 3 brokers should have logs for all partitions for
> > "__consumer_offsets".
> > I checked the "/temp/kafka-logs" directory for each server and except
> > for the broker 1, the other 2 brokers (server 2 and 3) do not contain
> > replicas for all the partitions for "__consumer_offsets". There are
> > log directories missing for many partitions for "__consumer_offsets"
> > on brokers 2 and 3 (including partition 4 which resulted in the above
> crash).
> >
> > What could be the cause for this crash. Is there any mis-configuration
> > for the broker that can cause this?
> >
> > Regards,
> > Rahul Misra
> >
> > Technical Lead
> > Altisource(tm)
> > Mobile: 9886141541 | Ext: 298269
> > Rahul.Misra@Altisource.com<ma...@Altisource.com> |
> > www.Altisource.com<http://www.altisource.com/>
> >
> > This email message and any attachments are intended solely for the use
> > of the addressee. If you are not the intended recipient, you are
> > prohibited from reading, disclosing, reproducing, distributing,
> > disseminating or otherwise using this transmission. If you have
> > received this message in error, please promptly notify the sender by
> > reply email and immediately delete this message from your system. This
> > message and any attachments may contain information that is
> > confidential, privileged or exempt from disclosure. Delivery of this
> > message to any person other than the intended recipient is not
> > intended to waive any right or privilege. Message transmission is not
> guaranteed to be secure or free of software viruses.
> >
> > **********************************************************************
> > *************************************************
> >
> This email message and any attachments are intended solely for the use of
> the addressee. If you are not the intended recipient, you are prohibited
> from reading, disclosing, reproducing, distributing, disseminating or
> otherwise using this transmission. If you have received this message in
> error, please promptly notify the sender by reply email and immediately
> delete this message from your system. This message and any attachments may
> contain information that is confidential, privileged or exempt from
> disclosure. Delivery of this message to any person other than the intended
> recipient is not intended to waive any right or privilege. Message
> transmission is not guaranteed to be secure or free of software viruses.
>
> ***********************************************************************************************************************
>

RE: Kafka broker crash

Posted by "Misra, Rahul" <Ra...@altisource.com>.
Hi Madhukar,

Thanks for your quick response. The path is "/tmp/kafka-logs/". But the servers have not been restarted any time lately. The uptime for all the 3 servers is almost 67 days.

Regards,
Rahul Misra


-----Original Message-----
From: Madhukar Bharti [mailto:bhartimadhukar@gmail.com] 
Sent: Wednesday, June 22, 2016 8:37 PM
To: users@kafka.apache.org
Subject: Re: Kafka broker crash

Hi Rahul,

Whether the path is  "/tmp/kafka-logs/" or "/temp/kafka-logs" ?

Mostly if path is set to "/tmp/" then in case machine restart it may delete the files. So it is throwing FileNotFoundException.
you can change the file location to some other path and restart all broker.
This might fix the issue.

Regrads,
Madhukar

On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <Ra...@altisource.com>
wrote:

> Hi,
>
> I'm facing a strange issue in my Kafka cluster. Could anybody please 
> help me with it. The issue is as follows:
>
> We have a 3 node kafka cluster. We installed the zookeeper separately 
> and have pointed the brokers to it. The zookeeper is also 3 node, but 
> for our POC setup, the zookeeper nodes are on the same machines as the 
> Kafka brokers.
>
> While receiving messages from an existing topic using a new groupId, 2 
> of the brokers crashed with same FATAL errors:
>
> --------------------------------------------------------
> <<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>
>
> [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group
> pocTestNew11 generation 1 (kafka.coordinator.Gro
> upCoordinator)
> [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment 
> received from leader for group pocTestNew11 for genera tion 1 
> (kafka.coordinator.GroupCoordinator)
> [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting 
> due to unrecoverable I/O error while handling p roduce request:  
> (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> '__consumer_offsets-4'
>         at kafka.log.Log.append(Log.scala:318)
>         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
>         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
>         at
> kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
>         at
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
>         at
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>         at
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
>         at
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
>         at
> kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
>         at
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
>         at
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
>         at
> kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
>         at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException:
> /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No 
> such file or directory)
>         at java.io.RandomAccessFile.open0(Native Method)
>         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
>         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
>         at kafka.log.Log.roll(Log.scala:627)
>         at kafka.log.Log.maybeRoll(Log.scala:602)
>         at kafka.log.Log.append(Log.scala:357)
>
> ----------------------------------------------
> <<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>
>
> [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error 
> while replicating data. (kafka.server.ReplicaFe
> tcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log 
> '__consumer_offsets-4'
>         at kafka.log.Log.append(Log.scala:318)
>         at
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
>         at
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> apply(AbstractFetcherThread.scala:138)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> apply(AbstractFetcherThread.scala:122)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$ano
> nfun$apply$mcV$sp$1.apply(AbstractFet
> cherThread.scala:122)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
>         at 
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.FileNotFoundException:
> /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No 
> such file or directory)
>         at java.io.RandomAccessFile.open0(Native Method)
>         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
>         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
>         at kafka.log.Log.roll(Log.scala:627)
>         at kafka.log.Log.maybeRoll(Log.scala:602)
>         at kafka.log.Log.append(Log.scala:357)
>         ... 19 more
>
>
>
> For the topic "__consumer_offsets" which is used to commit consumer 
> offsets the default number of partitions is 50 and the replication 
> factor is 3.
> So ideally all the 3 brokers should have logs for all partitions for 
> "__consumer_offsets".
> I checked the "/temp/kafka-logs" directory for each server and except 
> for the broker 1, the other 2 brokers (server 2 and 3) do not contain 
> replicas for all the partitions for "__consumer_offsets". There are 
> log directories missing for many partitions for "__consumer_offsets" 
> on brokers 2 and 3 (including partition 4 which resulted in the above crash).
>
> What could be the cause for this crash. Is there any mis-configuration 
> for the broker that can cause this?
>
> Regards,
> Rahul Misra
>
> Technical Lead
> Altisource(tm)
> Mobile: 9886141541 | Ext: 298269
> Rahul.Misra@Altisource.com<ma...@Altisource.com> | 
> www.Altisource.com<http://www.altisource.com/>
>
> This email message and any attachments are intended solely for the use 
> of the addressee. If you are not the intended recipient, you are 
> prohibited from reading, disclosing, reproducing, distributing, 
> disseminating or otherwise using this transmission. If you have 
> received this message in error, please promptly notify the sender by 
> reply email and immediately delete this message from your system. This 
> message and any attachments may contain information that is 
> confidential, privileged or exempt from disclosure. Delivery of this 
> message to any person other than the intended recipient is not 
> intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses.
>
> **********************************************************************
> *************************************************
>
This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. 
***********************************************************************************************************************

Re: Kafka broker crash

Posted by Madhukar Bharti <bh...@gmail.com>.
Hi Rahul,

Whether the path is  "/tmp/kafka-logs/" or "/temp/kafka-logs" ?

Mostly if path is set to "/tmp/" then in case machine restart it may delete
the files. So it is throwing FileNotFoundException.
you can change the file location to some other path and restart all broker.
This might fix the issue.

Regrads,
Madhukar

On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <Ra...@altisource.com>
wrote:

> Hi,
>
> I'm facing a strange issue in my Kafka cluster. Could anybody please help
> me with it. The issue is as follows:
>
> We have a 3 node kafka cluster. We installed the zookeeper separately and
> have pointed the brokers to it. The zookeeper is also 3 node, but for our
> POC setup, the zookeeper nodes are on the same machines as the Kafka
> brokers.
>
> While receiving messages from an existing topic using a new groupId, 2 of
> the brokers crashed with same FATAL errors:
>
> --------------------------------------------------------
> <<<<<<<<<<<<<---- [server 2 logs] ---->>>>>>>>>>>>>>>
>
> [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group
> pocTestNew11 generation 1 (kafka.coordinator.Gro
> upCoordinator)
> [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received
> from leader for group pocTestNew11 for genera
> tion 1 (kafka.coordinator.GroupCoordinator)
> [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due
> to unrecoverable I/O error while handling p
> roduce request:  (kafka.server.ReplicaManager)
> kafka.common.KafkaStorageException: I/O exception in append to log
> '__consumer_offsets-4'
>         at kafka.log.Log.append(Log.scala:318)
>         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442)
>         at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268)
>         at
> kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428)
>         at
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401)
>         at
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386)
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>         at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
>         at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>         at
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386)
>         at
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322)
>         at
> kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228)
>         at
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
>         at
> kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429)
>         at
> kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
>         at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException:
> /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such
> file or directory)
>         at java.io.RandomAccessFile.open0(Native Method)
>         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
>         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
>         at kafka.log.Log.roll(Log.scala:627)
>         at kafka.log.Log.maybeRoll(Log.scala:602)
>         at kafka.log.Log.append(Log.scala:357)
>
> ----------------------------------------------
> <<<<<<<<<<<<<---- [server 3 logs] ---->>>>>>>>>>>>>>>
>
> [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error
> while replicating data. (kafka.server.ReplicaFe
> tcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log
> '__consumer_offsets-4'
>         at kafka.log.Log.append(Log.scala:318)
>         at
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113)
>         at
> kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> apply(AbstractFetcherThread.scala:138)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.
> apply(AbstractFetcherThread.scala:122)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFet
> cherThread.scala:122)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:120)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>         at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:120)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>         at
> kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:120)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:93)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.FileNotFoundException:
> /tmp/kafka-logs/__consumer_offsets-4/00000000000000000000.index (No such
> file or directory)
>         at java.io.RandomAccessFile.open0(Native Method)
>         at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
>         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277)
>         at
> kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at
> kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>         at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264)
>         at kafka.log.Log.roll(Log.scala:627)
>         at kafka.log.Log.maybeRoll(Log.scala:602)
>         at kafka.log.Log.append(Log.scala:357)
>         ... 19 more
>
>
>
> For the topic "__consumer_offsets" which is used to commit consumer
> offsets the default number of partitions is 50 and the replication factor
> is 3.
> So ideally all the 3 brokers should have logs for all partitions for
> "__consumer_offsets".
> I checked the "/temp/kafka-logs" directory for each server and except for
> the broker 1, the other 2 brokers (server 2 and 3) do not contain replicas
> for all the partitions for "__consumer_offsets". There are log directories
> missing for many partitions for "__consumer_offsets" on brokers 2 and 3
> (including partition 4 which resulted in the above crash).
>
> What could be the cause for this crash. Is there any mis-configuration for
> the broker that can cause this?
>
> Regards,
> Rahul Misra
>
> Technical Lead
> Altisource(tm)
> Mobile: 9886141541 | Ext: 298269
> Rahul.Misra@Altisource.com<ma...@Altisource.com> |
> www.Altisource.com<http://www.altisource.com/>
>
> This email message and any attachments are intended solely for the use of
> the addressee. If you are not the intended recipient, you are prohibited
> from reading, disclosing, reproducing, distributing, disseminating or
> otherwise using this transmission. If you have received this message in
> error, please promptly notify the sender by reply email and immediately
> delete this message from your system. This message and any attachments may
> contain information that is confidential, privileged or exempt from
> disclosure. Delivery of this message to any person other than the intended
> recipient is not intended to waive any right or privilege. Message
> transmission is not guaranteed to be secure or free of software viruses.
>
> ***********************************************************************************************************************
>