You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2022/02/23 18:49:00 UTC

[jira] [Commented] (KAFKA-13684) KStream rebalance can lead to JVM process crash when network issues occure

    [ https://issues.apache.org/jira/browse/KAFKA-13684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496995#comment-17496995 ] 

Guozhang Wang commented on KAFKA-13684:
---------------------------------------

Thanks [~petercipov] for reporting this, we will look into the files you provided.

At the mean time, if there are more related information you can collect upon the seg fault happenings it would be highly appreciated too.

> KStream rebalance can lead to JVM process crash when network issues occure
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-13684
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13684
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.8.1
>            Reporter: Peter Cipov
>            Priority: Critical
>         Attachments: crash-dump.log, crash-logs.csv
>
>
> Hello,
> Sporadically KStream rebalance leads to segmentation fault
> {code:java}
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000 {code}
> I have spotted it occuring when:
> 1) there some intermittent connection issues. I have found org.apache.kafka.common.errors.DisconnectException:  in logs during rebalance
> 2) a lot of partitions are shifted due to ks cluster re-balance
>  
> crash stack:
> {code:java}
> Current thread (0x00007f5bf407a000):  JavaThread "app-blue-v6-StreamThread-2" [_thread_in_native, id=231, stack(0x00007f5bdc2ed000,0x00007f5bdc3ee000)]
> Stack: [0x00007f5bdc2ed000,0x00007f5bdc3ee000],  sp=0x00007f5bdc3ebe30,  free space=1019kNative frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)C  [libc.so.6+0x37ab7]  abort+0x297
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)J 8080  org.rocksdb.WriteBatch.put(J[BI[BIJ)V (0 bytes) @ 0x00007f5c857ca520 [0x00007f5c857ca4a0+0x0000000000000080]J 8835 c2 
> org.apache.kafka.streams.state.internals.RocksDBStore$SingleColumnFamilyAccessor.prepareBatchForRestore(Ljava/util/Collection;Lorg/rocksdb/WriteBatch;)V (52 bytes) @ 0x00007f5c858dccb4 [0x00007f5c858dcb60+0x0000000000000154]J 9779 c1 
> org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(Ljava/util/Collection;)V (147 bytes) @ 0x00007f5c7ef7b7e4 [0x00007f5c7ef7b360+0x0000000000000484]J 8857 c2 
> org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$0(Lorg/apache/kafka/streams/processor/StateRestoreCallback;Ljava/util/Collection;)V (73 bytes) @ 0x00007f5c858f86dc [0x00007f5c858f8500+0x00000000000001dc]J 9686 c1 
> org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter$$Lambda$937.restoreBatch(Ljava/util/Collection;)V (9 bytes) @ 0x00007f5c7dff7bb4 [0x00007f5c7dff7b40+0x0000000000000074]J 9683 c1 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.restore(Lorg/apache/kafka/streams/processor/internals/ProcessorStateManager$StateStoreMetadata;Ljava/util/List;)V (176 bytes) @ 0x00007f5c7e71af4c [0x00007f5c7e719740+0x000000000000180c]J 8882 c2 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restoreChangelog(Lorg/apache/kafka/streams/processor/internals/StoreChangelogReader$ChangelogMetadata;)Z (334 bytes) @ 0x00007f5c859052ec [0x00007f5c85905140+0x00000000000001ac]J 12689 c2 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(Ljava/util/Map;)V (412 bytes) @ 0x00007f5c85ce98d4 [0x00007f5c85ce8420+0x00000000000014b4]J 12688 c2 
> org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase()V (214 bytes) @ 0x00007f5c85ce580c [0x00007f5c85ce5540+0x00000000000002cc]J 17654 c2 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce()V (725 bytes) @ 0x00007f5c859960e8 [0x00007f5c85995fa0+0x0000000000000148]j  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop()Z+61j
> org.apache.kafka.streams.processor.internals.StreamThread.run()V+36v  
> ~StubRoutines::call_stub 
> siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000{code}
> I attached whole java cash-dump and digest from our logs. 
> It is executed on azul jdk11
> KS 2.8.1
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)