You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2016/10/25 16:01:58 UTC
[jira] [Commented] (KAFKA-3536) ReplicaFetcherThread should not log errors when leadership changes

    [ https://issues.apache.org/jira/browse/KAFKA-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605691#comment-15605691 ] 

Guozhang Wang commented on KAFKA-3536:
--------------------------------------

This issue is observed in 0.9.0 as well, copying from mailing list:

{code}
Hello,

The error log indicates that by the time the the broker tries to complete the fetch request, it is no longer the leader for some of the requested topic-partitions. This can happen when brokers are shutting down / restarting that cause leaders to be migrated by controller automatically or when leaders are manually migrated due to operations, and such error logs should be transient as the follower replicas should be able to refresh its metadata later and discover the new leaders.

The confusion comes from the fact that, the fetch request from the ReplicaFetcherThread-0-2141642 does not actually contain partition [retail.d.ris.spider.request, 1], but it causes the HW of some of the fetched partitions to be advanced, and hence the broker tries to see if some OTHER delayed fetch requests can now be satisfied and completed because of the advanced HW (in function tryCompleteDelayedRequests). And some of those requests contains the partition, whose leader is no longer broker 2141642.

I think your encountered issue is summarized in KAFKA-3536 and has not been resolved yet, currently you can ignore the spammed error logs.


Guozhang



On Tue, Oct 25, 2016 at 5:32 AM, Json Tu <ka...@126.com> wrote:
Hi all,
        I use Kafka 0.9.0.0, and we have a cluster with 6 nodes, when I restart a broker，we find there are many logs as below，

[2016-10-24 15:29:00,914] ERROR [KafkaApi-2141642] error when handling request Name: FetchRequest; Version: 1; CorrelationId: 4928; ClientId: ReplicaFetcherThread-0-2141642; ReplicaId: 2141386; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [retail.c.order.logistics,0] -> PartitionFetchInfo(215258,1048576),[waimai_c_ugc_msg,1] -> PartitionFetchInfo(12426588,1048576),[waimai_c_ucenter_asyncrelationbind_delay,25] -> PartitionFetchInfo(0,1048576),[waimai_c_order_databus_wmorder,44] -> PartitionFetchInfo(49555913,1048576),[__consumer_offsets,23] -> PartitionFetchInfo(11846051,1048576),[retail.m.sp.sku.update,3] -> PartitionFetchInfo(21563,1048576),[waimai_c_monitor_orderlogisticsstatus,28] -> PartitionFetchInfo(26926356,1048576),[waimai_c_ucenter_loadrelation,0] -> PartitionFetchInfo(54583,1048576),[__consumer_offsets,29] -> PartitionFetchInfo(23479045,1048576),[waimai_c_order_databus_wmorder,14] -> PartitionFetchInfo(49568225,1048576),[waimai_c_ordertag_orderdealremark,31] -> PartitionFetchInfo(1829838,1048576),[retail.d.ris.spider.request,0] -> PartitionFetchInfo(709845,1048576),[__consumer_offsets,13] -> PartitionFetchInfo(9376691,1048576),[waimai_c_ugc_msg_staging,2] -> PartitionFetchInfo(38,1048576),[retail.b.openapi.push.retry.stage,0] -> PartitionFetchInfo(0,1048576),[waimai_c_ucenter_favoritepoi,15] -> PartitionFetchInfo(390045,1048576),[retail.b.order.phonecall,0] -> PartitionFetchInfo(1,1048576),[waimai_c_ucenter_loadrelation,45] -> PartitionFetchInfo(53975,1048576),[waimai_c_ordertag_orderdealremark,1] -> PartitionFetchInfo(1829848,1048576),[retail.d.ris.spider.jddj.request,0] -> PartitionFetchInfo(5116337,1048576),[waimai_c_ucenter_asyncrelationbind_delay,13] -> PartitionFetchInfo(0,1048576),[waimai_c_ucenter_asyncrelationbind_delay,55] -> PartitionFetchInfo(0,1048576),[waimai_push_e_operate_prod,3] -> PartitionFetchInfo(442564,1048576),[waimai_ordersa_topic_user_order_in_poi_count_diff,5] -> PartitionFetchInfo(23791010,1048576),[retail.c.order.create,4] -> PartitionFetchInfo(72902,1048576),[waimai_c_ucenter_asyncrelationbind_staging,2] -> PartitionFetchInfo(66,1048576),[waimai_c_order_orderevent_topic,35] -> PartitionFetchInfo(0,1048576),[waimai_c_ucenter_syncuserrelation,43] -> PartitionFetchInfo(0,1048576),[waimai_c_order_databus_wmorder,48] -> PartitionFetchInfo(49496018,1048576),[waimai_c_monitor_orderstatus,32] -> PartitionFetchInfo(50623699,1048576),[waimai_c_ucenter_loadrelation,15] -> PartitionFetchInfo(54360,1048576),[waimai_c_monitor_orderstatus,2] -> PartitionFetchInfo(50624881,1048576),[waimai_c_order_databus_wmorder,24] -> PartitionFetchInfo(49548334,1048576),[waimai_c_order_databus_wmorder,18] -> PartitionFetchInfo(49489397,1048576),[waimai_c_ucenter_asyncrelationbind,36] -> PartitionFetchInfo(53430,1048576) (kafka.server.KafkaApis)
kafka.common.NotLeaderForPartitionException: Leader not local for partition [retail.d.ris.spider.request,1] on broker 2141642
        at kafka.server.ReplicaManager.getLeaderReplicaIfLocal(ReplicaManager.scala:296)
        at kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:77)
        at kafka.server.DelayedFetch$$anonfun$tryComplete$1.apply(DelayedFetch.scala:72)
        at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at kafka.server.DelayedFetch.tryComplete(DelayedFetch.scala:72)
        at kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:307)
        at kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:227)
        at kafka.server.ReplicaManager.tryCompleteDelayedFetch(ReplicaManager.scala:202)
        at kafka.cluster.Partition.tryCompleteDelayedRequests(Partition.scala:372)
        at kafka.cluster.Partition.maybeExpandIsr(Partition.scala:294)
        at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:243)
        at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:852)
        at kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:849)
        at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
        at kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:849)
        at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:467)
        at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:434)
        at kafka.server.KafkaApis.handle(KafkaApis.scala:69)
        at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
        at java.lang.Thread.run(Thread.java:745)


what confused me is that，retail.d.ris.spider.request is not contained in this request，why will log it in handleFetchRequest，how can it happen and how to resolve it？

{code}

> ReplicaFetcherThread should not log errors when leadership changes
> ------------------------------------------------------------------
>
>                 Key: KAFKA-3536
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3536
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Stig Rohde Døssing
>            Priority: Minor
>
> When there is a leadership change, ReplicaFetcherThread will spam the log with errors similar to the log snippet below.
> {noformat}
> [ReplicaFetcherThread-0-2], Error for partition [ticketupdate,7] to broker 2:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread)
> {noformat}
> ReplicaFetcherThread/AbstractFetcherThread should log those exceptions at a lower log level, since they don't actually indicate an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)