You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/30 20:09:00 UTC

[jira] [Commented] (KAFKA-7572) Producer should not send requests with negative partition id

    [ https://issues.apache.org/jira/browse/KAFKA-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669268#comment-16669268 ] 

ASF GitHub Bot commented on KAFKA-7572:
---------------------------------------

yaodong66 opened a new pull request #5858: KAFKA-7572: Producer should not send requests with negative partition id
URL: https://github.com/apache/kafka/pull/5858
 
 
   Partition id should never be a negative value.
   This commit will make debug easier, when custom Partitioner generate an invalid negative partition id.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Producer should not send requests with negative partition id
> ------------------------------------------------------------
>
>                 Key: KAFKA-7572
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7572
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 1.0.1
>            Reporter: Yaodong Yang
>            Priority: Major
>
> h3. Issue:
> In one Kafka producer log from our users, we found the following weird one:
> timestamp="2018-10-09T17:37:41,237-0700",level="ERROR", Message="Write to Kafka failed with: ",exception="java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topicName--2: 30042 ms has passed since batch creation plus linger time
>  at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
>  at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
>  at org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for topicName--2: 30042 ms has passed since batch creation plus linger time"
> After a few hours debugging, we finally understood the root cause of this issue:
>  # The producer used a buggy custom Partitioner, which sometimes generates negative partition ids for new records.
>  # The corresponding produce requests were rejected by brokers, because it's illegal to have a partition with a negative id.
>  # The client kept refreshing its local cluster metadata, but could not send produce requests successfully.
>  # From the above log, we found a suspicious string "topicName--2":
>  # According to the source code, the format of this string in the log is TopicName+"-"+PartitionId.
>  # It's not easy to notice that there were 2 consecutive dash in the above log.
>  # Eventually, we found that the second dash was a negative sign. Therefore, the partition id is -2, rather than 2.
>  # The bug the custom Partitioner.
> h3. Proposal:
>  # Producer code should check the partitionId before sending requests to brokers.
>  # If there is a negative partition Id, just throw an IllegalStateException{{ }}exception.
>  # Such a quick check can save lots of time for people debugging their producer code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)