You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Cheung (JIRA)" <ji...@apache.org> on 2017/12/20 09:55:00 UTC

[jira] [Comment Edited] (KAFKA-6188) Broker fails with FATAL Shutdown - log dirs have failed

    [ https://issues.apache.org/jira/browse/KAFKA-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298188#comment-16298188 ] 

David Cheung edited comment on KAFKA-6188 at 12/20/17 9:54 AM:
---------------------------------------------------------------

Hi, I am facing exactly the same problem here. My stack: kafka (4.0 tag from https://hub.docker.com/r/confluentinc/cp-enterprise-kafka/) running on docker swarm under Amazon ec2 instances. The storage I used is Amazon's EFS. In my case, some log files cannot be deleted which will trigger this bug:
{code:xml}
Caused by: java.nio.file.FileSystemException: /var/lib/kafka/data/ksql_transient_8376289768731246768_1513675960541-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-1.a9edc755278d425e9227bb03eb0cd55f-delete/.nfs937861751206a94a00000fa2: Device or resource busy
...
...
[2017-12-19 10:56:37,681] INFO Stopping serving logs in dir /var/lib/kafka/data (kafka.log.LogManager)
[2017-12-19 10:56:37,682] FATAL Shutdown broker because all log dirs in /var/lib/kafka/data have failed (kafka.log.LogManager)
{code}


was (Author: chubao):
Hi, I am facing exactly the same problem here. My stack: kafka (4.0 tag from https://hub.docker.com/r/confluentinc/cp-enterprise-kafka/) running on docker swarm under Amazon ec2 instances. The storage I used is Amazon's EFS. In my case, some log files cannot be deleted which will trigger this bug:
{code:xml}
Caused by: java.nio.file.FileSystemException: /var/lib/kafka/data/ksql_transient_8376289768731246768_1513675960541-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-1.a9edc755278d425e9227bb03eb0cd55f-delete/.nfs937861751206a94a00000fa2: Device or resource busy
{code}

> Broker fails with FATAL Shutdown - log dirs have failed
> -------------------------------------------------------
>
>                 Key: KAFKA-6188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6188
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, log
>    Affects Versions: 1.0.0
>         Environment: Windows 10
>            Reporter: Valentina Baljak
>            Priority: Blocker
>
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The test environment is very simple, with only one producer and one consumer. Initially, everything started fine, stand alone tests worked as expected. However, running my code, Kafka clients fail after approximately 10 minutes. Kafka won't start after that and it fails with the same error. 
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions  are offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  because they are in the failed log dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)