You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Valentina Baljak (JIRA)" <ji...@apache.org> on 2017/11/08 13:55:00 UTC

[jira] [Updated] (KAFKA-6188) Broker fails with FATAL Shutdown - log dirs have failed

     [ https://issues.apache.org/jira/browse/KAFKA-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Valentina Baljak updated KAFKA-6188:
------------------------------------
    Description: 
Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The test environment is very simple, with only one producer and one consumer. Initially, everything started fine, stand alone tests worked as expected. However, running my code, Kafka clients fail after approximately 10 minutes. Kafka won't start after that and it fails with the same error. 

Deleting logs helps to start again, and the same problem occurs.

Here is the error traceback:

[2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
[2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions  are offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
[2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  because they are in the failed log dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,892] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
[2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)


  was:
Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The test environment is very simple, with only one producer and one consumer. Initially, everything started fine, stand alone tests worked as expected. However, running my code, Kafka clients fail after approximately 10 minutes. Kafka won't start after that and it fails with the same error. 

Deleting logs helps to start again, and the same problem occurs.

Here is the error traceback.

bq. [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
[2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions  are offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
[2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  because they are in the failed log dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
[2017-11-08 08:21:57,892] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
[2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)



> Broker fails with FATAL Shutdown - log dirs have failed
> -------------------------------------------------------
>
>                 Key: KAFKA-6188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6188
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, log
>    Affects Versions: 1.0.0
>         Environment: Windows 10
>            Reporter: Valentina Baljak
>            Priority: Blocker
>
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The test environment is very simple, with only one producer and one consumer. Initially, everything started fine, stand alone tests worked as expected. However, running my code, Kafka clients fail after approximately 10 minutes. Kafka won't start after that and it fails with the same error. 
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions  are offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions  (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions  because they are in the failed log dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)