You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Anil (JIRA)" <ji...@apache.org> on 2018/06/04 17:01:00 UTC
[jira] [Reopened] (KAFKA-6281) Kafka JavaAPI Producer failed with NotLeaderForPartitionException

     [ https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anil reopened KAFKA-6281:
-------------------------

Thanks [~omkreddy], 

We've got this issue again today, twice. 

I see that both the times, there is GC running in my server1 , which I'm connecting to. But it seems to be of short duration. 

Please suggest, if that could be the cause. Regarding the network connection issue(s), b/w which components there could be issue ? I mean the zoo-keeper of server1, couldnot connect to zoo-keeper of server2 ? . We haven't found any connectivity issues, but still want to make sure on the ports and availablitiy.

 

 
{code:java}

Exception 1 :
-2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

GC 1:
2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause) (young), 0.0158735 secs]
[Parallel Time: 15.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 1465075225.5, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4]
[Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5]
[Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41]
[Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.4]
[GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap: 262.0M(1024.0M)->217.0M(1024.0M)]
[Times: user=0.03 sys=0.00, real=0.02 secs]




--Exception 2 --
-2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

--GC--2

2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause) (young), 0.0205628 secs]
[Parallel Time: 20.0 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3]
[Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1]
[Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37]
[Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.8]
[GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap: 425.0M(1024.0M)->364.5M(1024.0M)]
[Times: user=0.04 sys=0.00, real=0.02 secs]


{code}

> Kafka JavaAPI Producer failed with NotLeaderForPartitionException
> -----------------------------------------------------------------
>
>                 Key: KAFKA-6281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6281
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Anil
>            Priority: Major
>         Attachments: server1-controller.log, server2-controller.log
>
>
> We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We have 2 producers (Java API) acting on different topics. Each topic has single partition. The topic where we had this issue, has one consumer running. This set up has been running fine for 3 months, and we saw this issue. All the suggested cases/solutions for this issue in other forums don't seem to apply for my scenario.
> Exception at producer;
> {code}
> -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR client.producer.BingLogProducerCallback - Encountered exception in sending message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
> {code}
> We haven't enabled retries for the messages, because this is transactional data and we want to maintain the order.
> Producer config:
> {code}
> bootstrap.servers : server1ip:9092
> acks :all
> retries : 0
> linger.ms :0
> buffer.memory :10240000
> max.request.size :1024000
> key.serializer : org.apache.kafka.common.serialization.StringSerializer
> value.serializer : org.apache.kafka.common.serialization.StringSerializer
> {code}
> We are connecting to server1 at both producer and consumer. The controller log at server2 indicates there is some shutdown happened at during sametime, but I dont understand why this happened.
> {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 1 is 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO [SessionExpirationListener on 2], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) [2017-11-25 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2 (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] DEBUG [Controller 2]: De-registering IsrChangeNotificationListener (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO [delete-topics-thread-2], Shutting down (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 17:34:18,317] INFO [delete-topics-thread-2], Stopped (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped partition state machine (kafka.controller.PartitionStateMachine) [2017-11-25 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped replica state machine (kafka.controller.ReplicaStateMachine) [2017-11-25 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Stopped (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-2-send-thread], Shutdown completed (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread], Shutting down (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread], Stopped (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread], Shutdown completed (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller 2]: Broker 2 resigned as the controller (kafka.controller.KafkaController) [2017-11-25 17:34:18,353] DEBUG [IsrChangeNotificationListener] Fired!!! (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,353] DEBUG [IsrChangeNotificationListener] Fired!!! (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,354] INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for path /brokers/ids with children 1,2 (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2017-11-25 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired for topics to be deleted (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25 17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/ESQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/Test1 (kafka.controller.PartitionStateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"0":[2]}} for path /brokers/topics/ImageQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}} for path /brokers/topics/NMS_NotifyQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/TempBinLogReqQ @{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)