You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "wang (JIRA)" <ji...@apache.org> on 2018/07/11 03:41:00 UTC
[jira] [Created] (KAFKA-7148) Kafka load log very slow after goes
down with outOfMemoryError
wang created KAFKA-7148:
---------------------------
Summary: Kafka load log very slow after goes down with outOfMemoryError
Key: KAFKA-7148
URL: https://issues.apache.org/jira/browse/KAFKA-7148
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 0.10.1.1
Reporter: wang
two questions
1、is there any idea to avoid the IOException: Map failed, or is there any limit in vm mem size ?
2、is it normal when kafka load one partition log file cost 20+ secend?
Detail Info:
1、Linux Version :
kafka_2.11-0.10.1.1> cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3
2、VM INFO :4C32G
3、java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
4、Start command :
java -Xmx16G -Xms16G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/***/kafka_2.11-0.10.1.1/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dkafka.logs.dir=/opt/***/kafka_2.11-0.10.1.1/bin/../logs -Dlog4j.configuration=file:./../config/log4j.properties -cp :………………………………
5、Broker&Topics
we have 3 broker,3 zookeeper
we have 4 topics in this kafka cluster
__consumer_offsets 50 partition,3 replicate
topic1 5 partition,3 replicate
topic2 160 partition,3 replicate
topic3 160 partition,3 replicate
Total data disk use :32G
du -sh data/
32G data/
6、logs
[2018-07-10 17:23:59,728] FATAL [Replica Manager on Broker 1]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log '**************-13'
at kafka.log.Log.append(Log.scala:349)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:436)
at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:61)
at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:52)
at kafka.log.LogSegment.<init>(LogSegment.scala:67)
at kafka.log.Log.roll(Log.scala:778)
at kafka.log.Log.maybeRoll(Log.scala:744)
at kafka.log.Log.append(Log.scala:405)
... 22 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
... 28 more
7、then I fllow this (https://issues.apache.org/jira/browse/KAFKA-6165?jql=project%20%3D%20KAFKA%20AND%20text%20~%20%22sun.nio.ch.FileChannelImpl.map%20Map%20failed%22)
add the vm.max_map_count from default value (65536)->262144
/sbin/sysctl -a |grep map
vm.max_map_count = 262144
cat /proc/2860/maps |wc -l
1195
and change the kafka-run-class.sh replaced `-XX:+DisableExplicitGC` with `-XX:+ExplicitGCInvokesConcurrent`
8、but the "IOException: Map failed" problem still exist
then we add vm mem server to 4C64G,and change the -Xmx16G -Xms16G to -Xmx4G -Xms4G
9、load log slow log
[2018-07-10 17:35:33,481] INFO Completed load of log ***-37 with 2 log segments and log end offset 2441365 in 15678 ms (kafka.log.Log)
[2018-07-10 17:35:33,484] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-34/00000000000002451611.index) has non-zero size but the last offset is 2451611 which is no larger than the base offset 2451611.}. deleting /opt/***/data/***-34/00000000000002451611.timeindex, /opt/***/data/***-34/00000000000002451611.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:35:33,485] INFO Recovering unflushed segment 1086746 in log ***-34. (kafka.log.Log)
[2018-07-10 17:35:49,070] INFO Recovering unflushed segment 2451611 in log ***-34. (kafka.log.Log)
[2018-07-10 17:35:49,071] INFO Completed load of log ***-34 with 2 log segments and log end offset 2451613 in 15589 ms (kafka.log.Log)
[2018-07-10 17:35:49,073] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-31/00000000000002437998.index) has non-zero size but the last offset is 2437998 which is no larger than the base offset 2437998.}. deleting /opt/***/data/***-31/00000000000002437998.timeindex, /opt/***/data/***-31/00000000000002437998.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:35:49,090] INFO Recovering unflushed segment 1080802 in log ***-31. (kafka.log.Log)
[2018-07-10 17:36:04,806] INFO Recovering unflushed segment 2437998 in log ***-31. (kafka.log.Log)
[2018-07-10 17:36:04,807] INFO Completed load of log ***-31 with 2 log segments and log end offset 2438000 in 15735 ms (kafka.log.Log)
[2018-07-10 17:36:04,810] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-28/00000000000002432077.index) has non-zero size but the last offset is 2432077 which is no larger than the base offset 2432077.}. deleting /opt/***/data/***-28/00000000000002432077.timeindex, /opt/***/data/***-28/00000000000002432077.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:36:04,811] INFO Recovering unflushed segment 1081522 in log ***-28. (kafka.log.Log)
[2018-07-10 17:36:19,537] INFO Recovering unflushed segment 2432077 in log ***-28. (kafka.log.Log)
[2018-07-10 17:36:19,538] INFO Completed load of log ***-28 with 2 log segments and log end offset 2432079 in 14730 ms (kafka.log.Log)
[2018-07-10 17:36:19,539] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-25/00000000000001080046.index) has non-zero size but the last offset is 1080046 which is no larger than the base offset 1080046.}. deleting /opt/***/data/***-25/00000000000001080046.timeindex, /opt/***/data/***-25/00000000000001080046.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:36:34,232] INFO Recovering unflushed segment 1080046 in log ***-25. (kafka.log.Log)
[2018-07-10 17:36:48,261] INFO Completed load of log ***-25 with 1 log segments and log end offset 2435166 in 28721 ms (kafka.log.Log)
[2018-07-10 17:36:48,266] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-22/00000000000002438178.index) has non-zero size but the last offset is 2438178 which is no larger than the base offset 2438178.}. deleting /opt/***/data/***-22/00000000000002438178.timeindex, /opt/***/data/***-22/00000000000002438178.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:36:48,267] INFO Recovering unflushed segment 1081500 in log ***-22. (kafka.log.Log)
[2018-07-10 17:37:02,803] INFO Recovering unflushed segment 2438178 in log ***-22. (kafka.log.Log)
[2018-07-10 17:37:02,803] INFO Completed load of log ***-22 with 2 log segments and log end offset 2438180 in 14540 ms (kafka.log.Log)
[2018-07-10 17:37:02,807] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-19/00000000000001081412.index) has non-zero size but the last offset is 1081412 which is no larger than the base offset 1081412.}. deleting /opt/***/data/***-19/00000000000001081412.timeindex, /opt/***/data/***-19/00000000000001081412.index and rebuilding index... (kafka.log.Log)
[2018-07-10 17:37:17,536] INFO Recovering unflushed segment 1081412 in log ***-19. (kafka.log.Log)
[2018-07-10 17:37:31,570] INFO Completed load of log ***-19 with 1 log segments and log end offset 2443668 in 28764 ms (kafka.log.Log)
[2018-07-10 17:37:31,573] WARN Found a corrupted index file due to requirement failed: Corrupt index found, index file (/opt/***/data/***-35/00000000000002440999.index) has non-zero size but the last offset is 2440999 which is no larger than the base offset 2440999.}. deleting /opt/***/data/***-35/00000000000002440999.timeindex, /opt/***/data/***-35/00000000000002440999.index and rebuilding index... (kafka.log.Log)
10、the I check the cpu ,it's 100% use while load logs ,and use top -H -p and jstack find the busy thread
"pool-3-thread-1" #20 prio=5 os_prio=0 tid=0x00007f7041537000 nid=0x5c3a runnable [0x00007f703042e000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:741)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
at kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:319)
at kafka.log.FileMessageSet$$anon$1.makeNext(FileMessageSet.scala:308)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
at kafka.log.LogSegment.recover(LogSegment.scala:211)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:218)
at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:179)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at kafka.log.Log.loadSegments(Log.scala:179)
at kafka.log.Log.<init>(Log.scala:108)
at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)