You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ben Corlett (JIRA)" <ji...@apache.org> on 2017/11/09 16:03:00 UTC

[jira] [Created] (KAFKA-6194) Server crash

Ben Corlett created KAFKA-6194:
----------------------------------

             Summary: Server crash
                 Key: KAFKA-6194
                 URL: https://issues.apache.org/jira/browse/KAFKA-6194
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 1.0.0
         Environment: kafka version: 1.0
client versions: 0.8.2.1-0.10.2.1
platform: aws (eu-west-1a)
nodes: 36 x r4.xlarge
disk storage: 2.5 tb per node (~73% usage per node)
topics: 250
number of partitions: 48k (approx)
os: ubuntu 14.04
jvm: Java(TM) SE Runtime Environment (build 1.8.0_131-b11), Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
            Reporter: Ben Corlett


We upgraded our R+D cluster to 1.0 in the hope that it would fix the deadlock from 0.11.0.*. Sadly our cluster has the memory leak issues with 1.0 most likely from https://issues.apache.org/jira/browse/KAFKA-6185. We are running one server on a patched version of 1.0 with the pull request from that.

However today we have had two different servers fall over for non-heap related reasons. The exceptions in the kafka log are :

{code}
[2017-11-09 15:32:04,037] ERROR Error while deleting segments for xxxxxxxxxx-49 in dir /mnt/secure/kafka/datalog (kafka.server.LogDirFailureChannel)
java.io.IOException: Delete of log 00000000000000000000.log.deleted failed.
        at kafka.log.LogSegment.delete(LogSegment.scala:496)
        at kafka.log.Log.$anonfun$asyncDeleteSegment$3(Log.scala:1596)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
        at kafka.log.Log.deleteSeg$1(Log.scala:1596)
        at kafka.log.Log.$anonfun$asyncDeleteSegment$4(Log.scala:1599)
        at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
[2017-11-09 15:32:04,040] INFO [ReplicaManager broker=122] Stopping serving replicas in dir /mnt/secure/kafka/datalog (kafka.server.ReplicaManager)
[2017-11-09 15:32:04,041] ERROR Uncaught exception in scheduled task 'delete-file' (kafka.utils.KafkaScheduler)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for xxxxxxxxxxxxxx-49 in dir /mnt/secure/kafka/datalog
Caused by: java.io.IOException: Delete of log 00000000000000000000.log.deleted failed.
        at kafka.log.LogSegment.delete(LogSegment.scala:496)
        at kafka.log.Log.$anonfun$asyncDeleteSegment$3(Log.scala:1596)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
        at kafka.log.Log.maybeHandleIOException(Log.scala:1669)
        at kafka.log.Log.deleteSeg$1(Log.scala:1596)
        at kafka.log.Log.$anonfun$asyncDeleteSegment$4(Log.scala:1599)
        at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:110)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:61)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
{code}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)