You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Zakee <kz...@netzero.net> on 2016/12/14 20:03:39 UTC
Brokers cashing with OOME Map failed
Recently, we have seen our brokers crash with below errors, any idea what might be wrong here? The brokers have been running for long with the same hosts/configs without this issue before. Is this something to do with new version 0.10.0.1 (which we upgraded recently) or could it be a h/w issue? 10 hosts are dedicated for one broker per host. Each host has 128 gb RAM and 20TB of storage mounts. Any pointers will help...
[2016-12-12 02:49:58,134] FATAL [app=broker] [ReplicaFetcherThread-15-15] [ReplicaFetcherThread-15-15], Disk error while replicating data for mytopic-19 (kafka.server.ReplicaFetcherThread)
kafka.common.KafkaStorageException: I/O exception in append to log ’ mytopic-19'
at kafka.log.Log.append(Log.scala:349)
at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130)
at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
at scala.Option.foreach(Option.scala:257)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
at kafka.log.Log.roll(Log.scala:772)
at kafka.log.Log.maybeRoll(Log.scala:742)
at kafka.log.Log.append(Log.scala:405)
... 16 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)
... 28 more
Thanks
-Zakee
Re: Brokers cashing with OOME Map failed
Posted by Zakee <kz...@netzero.net>.
Brokers failed repeatedly leaving behind page-cache in memory, which caused broker restarts to fail with OOM every time.
After manually cleaning up page-cache, I was able to restart the broker.
However, still wondering what could have caused this state in the first place.
Any ideas?
-Zakee
> On Dec 14, 2016, at 12:03 PM, Zakee <kz...@netzero.net> wrote:
>
> Recently, we have seen our brokers crash with below errors, any idea what might be wrong here? The brokers have been running for long with the same hosts/configs without this issue before. Is this something to do with new version 0.10.0.1 (which we upgraded recently) or could it be a h/w issue? 10 hosts are dedicated for one broker per host. Each host has 128 gb RAM and 20TB of storage mounts. Any pointers will help...
>
>
> [2016-12-12 02:49:58,134] FATAL [app=broker] [ReplicaFetcherThread-15-15] [ReplicaFetcherThread-15-15], Disk error while replicating data for mytopic-19 (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log ’ mytopic-19'
> at kafka.log.Log.append(Log.scala:349)
> at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130)
> at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
> at scala.Option.foreach(Option.scala:257)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
> at kafka.log.Log.roll(Log.scala:772)
> at kafka.log.Log.maybeRoll(Log.scala:742)
> at kafka.log.Log.append(Log.scala:405)
> ... 16 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)
> ... 28 more
>
>
> Thanks
> -Zakee
____________________________________________________________
How To Fix Your Fatigue (Do This Everyday)
gundrymd.com
http://thirdpartyoffers.netzero.net/TGL3231/58549b91d96241b91179est02vuc
Re: Brokers cashing with OOME Map failed
Posted by Gwen Shapira <gw...@confluent.io>.
Did you recently add topics / partitions? Each partitions takes a memory
buffer for replication, so you sometimes get OOME by adding partitions
without sizing memory.
You basically need the Java heapsize to be larger than # partitions on the
broker X replica.fetch.size
Gwen
On Wed, Dec 14, 2016 at 12:03 PM, Zakee <kz...@netzero.net> wrote:
> Recently, we have seen our brokers crash with below errors, any idea what
> might be wrong here? The brokers have been running for long with the same
> hosts/configs without this issue before. Is this something to do with new
> version 0.10.0.1 (which we upgraded recently) or could it be a h/w issue?
> 10 hosts are dedicated for one broker per host. Each host has 128 gb RAM
> and 20TB of storage mounts. Any pointers will help...
>
>
> [2016-12-12 02:49:58,134] FATAL [app=broker] [ReplicaFetcherThread-15-15]
> [ReplicaFetcherThread-15-15], Disk error while replicating data for
> mytopic-19 (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log ’
> mytopic-19'
> at kafka.log.Log.append(Log.scala:349)
> at kafka.server.ReplicaFetcherThread.processPartitionData(
> ReplicaFetcherThread.scala:130)
> at kafka.server.ReplicaFetcherThread.processPartitionData(
> ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$
> anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$
> anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
> at scala.Option.foreach(Option.scala:257)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(
> AbstractFetcherThread.scala:141)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(
> AbstractFetcherThread.scala:138)
> at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:48)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:
> 138)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$
> anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:136)
> at kafka.server.AbstractFetcherThread.doWork(
> AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(
> AbstractIndex.scala:116)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(
> AbstractIndex.scala:106)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.
> apply$mcV$sp(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.
> apply(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.
> apply(AbstractIndex.scala:160)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.
> scala:159)
> at kafka.log.Log.roll(Log.scala:772)
> at kafka.log.Log.maybeRoll(Log.scala:742)
> at kafka.log.Log.append(Log.scala:405)
> ... 16 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)
> ... 28 more
>
>
> Thanks
> -Zakee
--
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
<http://www.confluent.io/blog>
Re: Brokers cashing with OOME Map failed
Posted by Ismael Juma <is...@juma.me.uk>.
Hi,
This is probably not a Kafka bug, but we should improve the information we
report in this case. Something along the lines of what Lucene did here:
https://issues.apache.org/jira/browse/LUCENE-5673
This error may be caused by lack of enough unfragmented virtual address
space or too restrictive virtual memory limits enforced by the operating
system, preventing Kafka from mapping a chunk. Kafka could be asking for a
chunk that is unnecessarily large, but the former seems more likely.
Ismael
On Wed, Dec 14, 2016 at 12:03 PM, Zakee <kz...@netzero.net> wrote:
> Recently, we have seen our brokers crash with below errors, any idea what
> might be wrong here? The brokers have been running for long with the same
> hosts/configs without this issue before. Is this something to do with new
> version 0.10.0.1 (which we upgraded recently) or could it be a h/w issue?
> 10 hosts are dedicated for one broker per host. Each host has 128 gb RAM
> and 20TB of storage mounts. Any pointers will help...
>
>
> [2016-12-12 02:49:58,134] FATAL [app=broker] [ReplicaFetcherThread-15-15]
> [ReplicaFetcherThread-15-15], Disk error while replicating data for
> mytopic-19 (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log ’
> mytopic-19'
> at kafka.log.Log.append(Log.scala:349)
> at kafka.server.ReplicaFetcherThread.processPartitionData(Repli
> caFetcherThread.scala:130)
> at kafka.server.ReplicaFetcherThread.processPartitionData(Repli
> caFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfu
> n$apply$2.apply(AbstractFetcherThread.scala:159)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfu
> n$apply$2.apply(AbstractFetcherThread.scala:141)
> at scala.Option.foreach(Option.scala:257)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(A
> bstractFetcherThread.scala:141)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(A
> bstractFetcherThread.scala:138)
> at scala.collection.mutable.ResizableArray$class.foreach(Resiza
> bleArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
> scala:48)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$
> processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.server.AbstractFetcherThread.processFetchRequest(Abstr
> actFetcherThread.scala:136)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThr
> ead.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractInde
> x.scala:116)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractInde
> x.scala:106)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$
> mcV$sp(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(
> AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(
> AbstractIndex.scala:160)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:
> 159)
> at kafka.log.Log.roll(Log.scala:772)
> at kafka.log.Log.maybeRoll(Log.scala:742)
> at kafka.log.Log.append(Log.scala:405)
> ... 16 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)
> ... 28 more
>
>
> Thanks
> -Zakee
Re: Brokers cashing with OOME Map failed
Posted by Zakee <kz...@netzero.net>.
Brokers failed repeatedly leaving behind page-cache in memory, which caused broker restarts to fail with OOM every time.
After manually cleaning up page-cache, I was able to restart the broker.
However, still wondering what could have caused this state in the first place.
Any ideas?
-Zakee
> On Dec 14, 2016, at 12:03 PM, Zakee <kz...@netzero.net> wrote:
>
> Recently, we have seen our brokers crash with below errors, any idea what might be wrong here? The brokers have been running for long with the same hosts/configs without this issue before. Is this something to do with new version 0.10.0.1 (which we upgraded recently) or could it be a h/w issue? 10 hosts are dedicated for one broker per host. Each host has 128 gb RAM and 20TB of storage mounts. Any pointers will help...
>
>
> [2016-12-12 02:49:58,134] FATAL [app=broker] [ReplicaFetcherThread-15-15] [ReplicaFetcherThread-15-15], Disk error while replicating data for mytopic-19 (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaStorageException: I/O exception in append to log ’ mytopic-19'
> at kafka.log.Log.append(Log.scala:349)
> at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:130)
> at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:159)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:141)
> at scala.Option.foreach(Option.scala:257)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:141)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:138)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:138)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:136)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by: java.io.IOException: Map failed
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:907)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
> at kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
> at kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
> at kafka.log.Log.roll(Log.scala:772)
> at kafka.log.Log.maybeRoll(Log.scala:742)
> at kafka.log.Log.append(Log.scala:405)
> ... 16 more
> Caused by: java.lang.OutOfMemoryError: Map failed
> at sun.nio.ch.FileChannelImpl.map0(Native Method)
> at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:904)
> ... 28 more
>
>
> Thanks
> -Zakee
____________________________________________________________
How To Fix Your Fatigue (Do This Everyday)
gundrymd.com
http://thirdpartyoffers.netzero.net/TGL3231/58549b91d96241b91179est02vuc