You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Markus Bergman (JIRA)" <ji...@apache.org> on 2017/06/04 00:22:04 UTC

[jira] [Created] (KAFKA-5377) Kafka process crashing due to access violation

Markus Bergman created KAFKA-5377:
-------------------------------------

             Summary: Kafka process crashing due to access violation
                 Key: KAFKA-5377
                 URL: https://issues.apache.org/jira/browse/KAFKA-5377
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.2.1, 0.10.2.0
         Environment: Windows 2008 R2, Intel Xeon CPU, 64 GB RAM
4 Disk Drives (C for software, D for log files, E/F for Kafka/Zookeeper data)
2 broker cluster
            Reporter: Markus Bergman
         Attachments: hs_err_pid15944.log, hs_err_pid6304.log, hs_err_pid7356.log, hs_err_pid9056.log, hs_err_pid9276.log, java_error7192.log, server.1.properties

We are running Kafka in a 2 x broker cluster configuration on Windows, and overall it has been working well for us. We have been seeing occasional issues where the broker crashes first on one node, and then almost immediately on the second. When we go an try to re-start, the broker continues to crash during startup.

I finally figured out that the cause of the startup not working was a bad set of files in __consumer_offsets-2 (in this latest case). Once I deleted the bad files, the broker started up correctly again. 

From what I can tell, looking at both code, crash dump files, it is all happening because of the log cleaner, and I can pinpoint it down in most (if not all) cases to TimeIndex. The java dump file indicates some kind of an access violation, but I am not sure when/how that is happening.

I am attaching dump files from two separate instances of when it initially crashed, and then when we try to restart. Also including the broker config settings that we are using.

I'm not sure what additional information to provide, but I can add more if needed.

Any help, suggestions or input would be very appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)