You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Gwen Shapira (JIRA)" <ji...@apache.org> on 2015/12/18 00:47:46 UTC

[jira] [Resolved] (KAFKA-3004) Brokers failing over repeatadly

     [ https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gwen Shapira resolved KAFKA-3004.
---------------------------------
    Resolution: Won't Fix

> Brokers failing over repeatadly
> -------------------------------
>
>                 Key: KAFKA-3004
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3004
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, network
>    Affects Versions: 0.8.2.0
>         Environment: Centos 6.5
> OpenJDK 1.7.0_79
> 6 Kafka nodes
> 3 ZK nodes (cluster mode)
>            Reporter: Sadek
>            Assignee: Neha Narkhede
>
> While doing load testing we have noticed one of more brokers will un-register/register almost every hour with the following entry in its log:
> INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
> I noticed an increase in minor-GC collection around the same time.
> 2015-12-17T22:00:40.961+0000: 15693.112: [GC2015-12-17T22:00:46.404+0000: 15698.554: [ParNew: 282865K->3922K(314560K), 0.0104700 secs] 576345K->297570K(1013632K), 5.4531250 secs] [Times: user=0.05 sys=0.00, real=5.46 secs]
> And also disk IO spike on the Kafka nodes.
>  
> Here's a snippet of the broker log around that time
> [2015-12-17 22:00:36,090] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
> 15754934 [main-SendThread(kfk02.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 12203ms for sessionid 0x151b10503e60002, closing socket connection and attempting reconnect
> [2015-12-17 22:01:55,533] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient)
> 15755399 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server kfk01.local/10.124.80.140:2182. Will not attempt to authenticate using SASL (unknown error)
> 15755400 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to kfk01.local/10.124.80.140:2182, initiating session
> 15755401 [main-SendThread(kfk01.local:2182)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server kfk01.local/10.124.80.140:2182, sessionid = 0x151b10503e60002, negotiated timeout = 12000
> [2015-12-17 22:01:55,902] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient)
> Any idea what may be causing this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)