You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Peter Bukowinski (Jira)" <ji...@apache.org> on 2019/10/15 17:51:00 UTC

[jira] [Created] (KAFKA-9044) Brokers occasionally (randomly?) dropping out of clusters

Peter Bukowinski created KAFKA-9044:
---------------------------------------

             Summary: Brokers occasionally (randomly?) dropping out of clusters
                 Key: KAFKA-9044
                 URL: https://issues.apache.org/jira/browse/KAFKA-9044
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.3.1
         Environment: Ubuntu 14.04
            Reporter: Peter Bukowinski


I have several cluster running kafka 2.3.1 and this issue has affected all of them. Because of replication and the size of the clusters (30 brokers), this bug is not causing any data loss, but it is nevertheless concerning. When a broker drops out, the log gives no indication that there are any zookeeper issues (and indeed the zookeepers are healthy when this occurs. Here's snippet from a broker log when it occurs.

 

{{[2019-10-07 11:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl] Found deletable segments with base offsets [1975332] due to retention time 3600000ms breach (kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl] Scheduling log segment [baseOffset 1975332, size 92076008] for deletion. (kafka.log.Log)
[2019-10-07 11:02:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl] Incrementing log start offset to 2000317 (kafka.log.Log)
[2019-10-07 11:03:56,936] INFO [Log partition=internal_test-52, dir=/data/3/kl] Deleting segment 1975332 (kafka.log.Log)
[2019-10-07 11:03:56,957] INFO Deleted log /data/3/kl/internal_test-52/00000000000001975332.log.deleted. (kafka.log.LogSegment)
[2019-10-07 11:03:56,957] INFO Deleted offset index /data/3/kl/internal_test-52/00000000000001975332.index.deleted. (kafka.log.LogSegment)
[2019-10-07 11:03:56,958] INFO Deleted time index /data/3/kl/internal_test-52/00000000000001975332.timeindex.deleted. (kafka.log.LogSegment)
[2019-10-07 11:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 11:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:12:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:42:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 12:52:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:02:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:22:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 13:52:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:02:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:12:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:22:27,630] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:32:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:42:27,629] INFO [GroupMetadataManager brokerId=14] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2019-10-07 14:46:07,510] INFO [Partition internal_test-33 broker=14] Shrinking ISR from 16,17,14 to 14. Leader: (highWatermark: 2007553, endOffset: 2007555). Out of sync replicas: (brokerId: 16, endOffset: 2007553) (brokerId: 17, endOffset: 2007553). (kafka.cluster.Partition)
[2019-10-07 14:46:07,511] INFO [Partition internal_test-33 broker=14] Cached zkVersion [17] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)}}

 

The controller show the following:

 

{{[2019-10-07 14:45:55,427] INFO [Controller id=24] Newly added brokers: , deleted brokers: 14, bounced brokers: , all live brokers: 1,2,3,4,5,6,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24,25 (kafka.controller.KafkaController)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutting down (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Shutdown completed (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,477] INFO [RequestSendThread controllerId=24] Stopped (kafka.controller.RequestSendThread)
[2019-10-07 14:45:55,481] INFO [Controller id=24] Broker failure callback for 14 (kafka.controller.KafkaController)}}

 

The clusters use {{zookeeper.session.timeout.ms=45000}}, and {{zookeeper.connection.timeout.ms=90000}}.

I'm unable to find a cause for this behavior. The only thing I can do to resolve the issue is to restart the broker.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)