You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ismael Juma (Jira)" <ji...@apache.org> on 2019/12/02 04:53:00 UTC

[jira] [Resolved] (KAFKA-9213) BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1

     [ https://issues.apache.org/jira/browse/KAFKA-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ismael Juma resolved KAFKA-9213.
--------------------------------
    Resolution: Duplicate

Duplicate of KAFKA-9156.

> BufferOverflowException on rolling new segment after upgrading Kafka from 1.1.0 to 2.3.1
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-9213
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9213
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 2.3.1
>         Environment: Ubuntu 16.04, AWS instance d2.8xlarge.
> JAVA Options:
> -Xms16G 
> -Xmx16G 
> -XX:G1HeapRegionSize=16M 
> -XX:MetaspaceSize=96m 
> -XX:MinMetaspaceFreeRatio=50 
>            Reporter: Daniyar
>            Priority: Blocker
>
> We updated our Kafka cluster from 1.1.0 version to 2.3.1. We followed up to step 2 of the [update instruction|[https://kafka.apache.org/documentation/#upgrade]].
> Message format and inter-broker protocol versions were left the same:
> inter.broker.protocol.version=1.1
> log.message.format.version=1.1
>  
> After upgrading, we started to get some occasional exceptions:
> {code:java}
> 2019/11/19 05:30:53 INFO [ProducerStateManager
> partition=matchmaker_retry_clicks_15m-2] Writing producer snapshot at
> offset 788532 (kafka.log.ProducerStateManager)
> 2019/11/19 05:30:53 INFO [Log partition=matchmaker_retry_clicks_15m-2,
> dir=/mnt/kafka] Rolled new log segment at offset 788532 in 1 ms.
> (kafka.log.Log)
> 2019/11/19 05:31:01 ERROR [ReplicaManager broker=0] Error processing append
> operation on partition matchmaker_retry_clicks_15m-2
> (kafka.server.ReplicaManager)
> 2019/11/19 05:31:01 java.nio.BufferOverflowException
> 2019/11/19 05:31:01     at java.nio.Buffer.nextPutIndex(Buffer.java:527)
> 2019/11/19 05:31:01     at
> java.nio.DirectByteBuffer.putLong(DirectByteBuffer.java:797)
> 2019/11/19 05:31:01     at
> kafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:134)
> 2019/11/19 05:31:01     at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 2019/11/19 05:31:01     at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01     at
> kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)
> 2019/11/19 05:31:01     at
> kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:520)
> 2019/11/19 05:31:01     at kafka.log.Log.$anonfun$roll$8(Log.scala:1690)
> 2019/11/19 05:31:01     at
> kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1690)
> 2019/11/19 05:31:01     at scala.Option.foreach(Option.scala:407)
> 2019/11/19 05:31:01     at kafka.log.Log.$anonfun$roll$2(Log.scala:1690)
> 2019/11/19 05:31:01     at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01     at kafka.log.Log.roll(Log.scala:1654)
> 2019/11/19 05:31:01     at kafka.log.Log.maybeRoll(Log.scala:1639)
> 2019/11/19 05:31:01     at kafka.log.Log.$anonfun$append$2(Log.scala:966)
> 2019/11/19 05:31:01     at
> kafka.log.Log.maybeHandleIOException(Log.scala:2085)
> 2019/11/19 05:31:01     at kafka.log.Log.append(Log.scala:850)
> 2019/11/19 05:31:01     at kafka.log.Log.appendAsLeader(Log.scala:819)
> 2019/11/19 05:31:01     at
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> 2019/11/19 05:31:01     at
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> 2019/11/19 05:31:01     at
> kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> 2019/11/19 05:31:01     at
> kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> 2019/11/19 05:31:01     at
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> 2019/11/19 05:31:01     at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> 2019/11/19 05:31:01     at
> scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> 2019/11/19 05:31:01     at
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> 2019/11/19 05:31:01     at
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> 2019/11/19 05:31:01     at
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> 2019/11/19 05:31:01     at
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> 2019/11/19 05:31:01     at
> scala.collection.TraversableLike.map(TraversableLike.scala:238)
> 2019/11/19 05:31:01     at
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> 2019/11/19 05:31:01     at
> scala.collection.AbstractTraversable.map(Traversable.scala:108)
> 2019/11/19 05:31:01     at
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:751)
> 2019/11/19 05:31:01     at
> kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:492)
> 2019/11/19 05:31:01     at
> kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:544)
> 2019/11/19 05:31:01     at
> kafka.server.KafkaApis.handle(KafkaApis.scala:113)
> 2019/11/19 05:31:01     at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> 2019/11/19 05:31:01     at java.lang.Thread.run(Thread.java:748)
> {code}
> The error persists until broker gets restarted (or leadership gets moved to another broker).
>  
> Brokers config:
> {code:java}
> advertised.host.name={{ hostname }}
> port=9092
> # Default number of partitions if a value isn't set when the topic is created.
> num.partitions=3
> auto.create.topics.enable=false
> delete.topic.enable=false
> # Prevent not in-sync replica to become a leader.
> unclean.leader.election.enable=false
> # The number of threads per data directory to be used for log recovery at
> # startup and flushing at shutdown.
> num.recovery.threads.per.data.dir=36
> log.flush.interval.messages=10000
> log.flush.interval.ms=2000
> # 1 week
> log.retention.hours=168
> log.retention.check.interval.ms=300000
> log.cleaner.enable=false
> # Use broker time for message timestamps.
> log.message.timestamp.type=LogAppendTime
> zookeeper.connect={{zookeeper_host }}:2181
> zookeeper.connection.timeout.ms=6000
> controller.socket.timeout.ms=30000
> controller.message.queue.size=10
> # Replication configuration
> num.replica.fetchers=10
> # Socket server configuration
> num.io.threads=32
> num.network.threads=16
> socket.request.max.bytes=104857600
> socket.receive.buffer.bytes=1048576
> socket.send.buffer.bytes=1048576
> queued.max.requests=32
> fetch.purgatory.purge.interval.requests=100
> producer.purgatory.purge.interval.requests=100
> inter.broker.protocol.version=1.1
> log.message.format.version=1.1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)