You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "swathi mocharla (Jira)" <ji...@apache.org> on 2022/07/21 06:29:00 UTC

[jira] [Created] (KAFKA-14092) Unsafe memory access operation exception leading to Error processing append operation on partition

swathi mocharla created KAFKA-14092:
---------------------------------------

             Summary: Unsafe memory access operation exception leading to Error processing append operation on partition
                 Key: KAFKA-14092
                 URL: https://issues.apache.org/jira/browse/KAFKA-14092
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.0.0
         Environment: K8S
            Reporter: swathi mocharla


Hi,

We are frequently seeing "Unsafe memory access operation" exception leading to "Error processing append operation on partition" on a certain partition.

Here are some logs:
{code:java}
{"process":"kafka.Kafka","service":"eventqueue","host":"hubudgen10-ncs-1-workerbm-2","container":"7efcf75afb20","system":"vtas250","neid":"123456","timezone":"UTC","type":"log","level":"info","time":"2022-07-20T11:15:10.624000000Z","log":{"msg":"[ProducerStateManager partition=cdr-group-000-ccf-000-mgmt-0] Wrote producer snapshot at offset 3114 with 0 producer ids in 47 ms.","class":"kafka.log.ProducerStateManager"}}
{"process":"kafka.Kafka","service":"eventqueue","host":"hubudgen10-ncs-1-workerbm-2","container":"7efcf75afb20","system":"vtas250","neid":"123456","timezone":"UTC","type":"log","level":"error","time":"2022-07-20T11:15:10.628000000Z","log":{"msg":"[ReplicaManager broker=1] Error processing append operation on partition cdr-group-000-ccf-000-mgmt-0java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code\njava.io.UnixFileSystem.getBooleanAttributes0(Native Method)\njava.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)\njava.io.File.exists(File.java:830)\nkafka.log.TransactionIndex.<init>(TransactionIndex.scala:50)\nkafka.log.LogSegment$.open(LogSegment.scala:663)\nkafka.log.Log.$anonfun$roll$2(Log.scala:1692)\nkafka.log.Log.roll(Log.scala:2487)\nkafka.log.Log.maybeRoll(Log.scala:1632)\nkafka.log.Log.append(Log.scala:900)\nkafka.log.Log.appendAsLeader(Log.scala:741)\nkafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1042)\nkafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1030)\nkafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:931)\nscala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)\nscala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)\nscala.collection.mutable.HashMap.map(HashMap.scala:35)\nkafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:919)\nkafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:591)\nkafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:658)\nkafka.server.KafkaApis.handle(KafkaApis.scala:169)\nkafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)\njava.lang.Thread.run(Thread.java:750)","class":"kafka.server.ReplicaManager"}}{code}
{code:java}
{"process":"kafka.Kafka","service":"eventqueue","host":"hubudgen10-ncs-1-workerbm-2","container":"7efcf75afb20","system":"vtas250","neid":"123456","timezone":"UTC","type":"log","level":"warning","time":"2022-07-20T11:15:11.632000000Z","log":{"msg":"[Log partition=cdr-group-000-ccf-000-mgmt-0, dir=/data] Newly rolled segment file /data/cdr-group-000-ccf-000-mgmt-0/00000000000000003114.log already exists; deleting it first","class":"kafka.log.Log"}}
{"process":"kafka.Kafka","service":"eventqueue","host":"hubudgen10-ncs-1-workerbm-2","container":"7efcf75afb20","system":"vtas250","neid":"123456","timezone":"UTC","type":"log","level":"error","time":"2022-07-20T11:15:11.633000000Z","log":{"msg":"[ReplicaManager broker=1] Error processing append operation on partition cdr-group-000-ccf-000-mgmt-0java.lang.IllegalStateException: Attempt to append a timestamp (1658315708834) to slot 1 no larger than the last timestamp appended (8694993961132949504) to /data/cdr-group-000-ccf-000-mgmt-0/00000000000000003108.timeindex.\nkafka.log.TimeIndex.$anonfun$maybeAppend$1(TimeIndex.scala:128)\nkafka.log.TimeIndex.maybeAppend(TimeIndex.scala:114)\nkafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:503)\nkafka.log.Log.$anonfun$roll$8(Log.scala:1681)\nkafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1681)\nscala.Option.foreach(Option.scala:437)\nkafka.log.Log.$anonfun$roll$2(Log.scala:1681)\nkafka.log.Log.roll(Log.scala:2487)\nkafka.log.Log.maybeRoll(Log.scala:1632)\nkafka.log.Log.append(Log.scala:900)\nkafka.log.Log.appendAsLeader(Log.scala:741)\nkafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1042)\nkafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1030)\nkafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:931)\nscala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)\nscala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)\nscala.collection.mutable.HashMap.map(HashMap.scala:35)\nkafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:919)\nkafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:591)\nkafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:658)\nkafka.server.KafkaApis.handle(KafkaApis.scala:169)\nkafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)\njava.lang.Thread.run(Thread.java:750)","class":"kafka.server.ReplicaManager"}}{code}
Notice that Kafka picks up a timestamp that is unrelated to the current timestamps or anywhere around it
{code:java}
Attempt to append a timestamp (1658315708834) to slot 1 no larger than the last timestamp appended (8694993961132949504){code}
These are the files that have been generated and the dumps of a few of them:
{code:java}
bash-4.2$ ls -altr
total 8
-rw-r--r--.  1 logging logging  43 Jul 20 06:34 partition.metadata
-rw-r--r--.  1 logging logging  10 Jul 20 07:01 00000000000000000001.snapshot
-rw-r--r--.  1 logging logging  10 Jul 20 11:14 00000000000000003096.snapshot
-rw-r--r--.  1 logging logging 978 Jul 20 11:14 00000000000000003096.log
-rw-r--r--.  1 logging logging   0 Jul 20 11:14 00000000000000003096.index
-rw-r--r--.  1 logging logging  12 Jul 20 11:14 00000000000000003096.timeindex
-rw-r--r--.  1 logging logging  10 Jul 20 11:14 00000000000000003102.snapshot
-rw-r--r--.  1 logging logging 978 Jul 20 11:14 00000000000000003102.log
-rw-r--r--.  1 logging logging   0 Jul 20 11:15 00000000000000003102.index
-rw-r--r--.  1 logging logging  12 Jul 20 11:15 00000000000000003102.timeindex
-rw-r--r--.  1 logging logging  10 Jul 20 11:15 00000000000000003108.snapshot
-rw-r--r--.  1 logging logging 978 Jul 20 11:15 00000000000000003108.log
-rw-r--r--.  1 logging logging   0 Jul 20 11:15 00000000000000003108.index
-rw-r--r--.  1 logging logging  12 Jul 20 11:15 00000000000000003108.timeindex
-rw-r--r--.  1 logging logging  10 Jul 20 11:15 00000000000000003114.snapshot
-rw-r--r--.  1 logging logging  11 Jul 20 11:15 leader-epoch-checkpoint
drwxr-xr-x.  2 logging logging  16 Jul 20 11:15 .
drwxrwxrwx. 63 root    root     68 Jul 21 04:49 ..

bash-4.2$ ls -la --time-style=full-iso 00000000000000003114.snapshot
-rw-r--r--. 1 logging logging 10 2022-07-20 11:15:10.580041566 +0000 00000000000000003114.snapshot

bash-4.2$ ls -la --time-style=full-iso 00000000000000003108.timeindex
-rw-r--r--. 1 logging logging 12 2022-07-20 11:15:10.522041021 +0000 00000000000000003108.timeindex

bash-4.2$ kafka-run-class.sh kafka.tools.DumpLogSegments  --deep-iteration --print-data-log --files 00000000000000003108.timeindex
Dumping 00000000000000003108.timeindex
timestamp: 1658315708834 offset: 3113{code}
I've checked the dumpSegments of the log files and they look clean as well.

Tried deleting the files and this issue re-occurs almost every time.

Could this be related to KAFKA-9156 ?

cc: [~ijuma] ,[~junrao] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)