You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Dong Lin (JIRA)" <ji...@apache.org> on 2018/11/16 23:48:00 UTC

[jira] [Comment Edited] (KAFKA-7648) Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests

    [ https://issues.apache.org/jira/browse/KAFKA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690140#comment-16690140 ] 

Dong Lin edited comment on KAFKA-7648 at 11/16/18 11:47 PM:
------------------------------------------------------------

Currently TestUtils.createTopic(...) will re-send znode creation request to zookeeper service if the previous response shows Code.CONNECTIONLOSS. See KafkaZkClient.retryRequestsUntilConnected() for related logic.

This means that the test will fail if the zookeeper has created znode upon the first request, the response to the first request is lost or timed-out, the second request is sent, and the response of the second request shows Code.NODEEXISTS.

In order to fix this flaky test, we probably should implement some logic similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been created in the with the same session id after receiving Code.NODEEXISTS.

Given the above understanding and the fact that the test passes with high probability, this flaky test does not indicate bug and should not be a blocking issue for 2.1.0 release.


was (Author: lindong):
Currently TestUtils.createTopic(...) will re-send znode creation request to zookeeper service if the previous response shows Code.CONNECTIONLOSS. See KafkaZkClient.retryRequestsUntilConnected() for related logic.

This means that the test will fail if the zookeeper has created znode upon the first request, the response to the first request is lost or timed-out, the second request is sent, and the response of the second request shows Code.NODEEXISTS.

In order to fix this flaky test, we probably should implement some logic similar to KafkaZkClient.CheckedEphemeral() to check whether the znode has been created in the with the same session id after receiving Code.NODEEXISTS.



> Flaky test DeleteTopicsRequestTest.testValidDeleteTopicRequests
> ---------------------------------------------------------------
>
>                 Key: KAFKA-7648
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7648
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Dong Lin
>            Priority: Major
>
> Observed in [https://builds.apache.org/job/kafka-2.1-jdk8/52/testReport/junit/kafka.server/DeleteTopicsRequestTest/testValidDeleteTopicRequests/]
>  
> {code}
> Error Message
> org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists.
> h3. Stacktrace
> org.apache.kafka.common.errors.TopicExistsException: Topic 'topic-4' already exists.
> h3. Standard Output
> [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Error for partition topic-3-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:10,812] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition topic-3-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:14,805] WARN Client session timed out, have not heard from server in 4000ms for sessionid 0x10051eebf480003 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,806] WARN Unable to read additional data from client sessionid 0x10051eebf480003, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,807] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480002 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,807] WARN Unable to read additional data from client sessionid 0x10051eebf480002, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:14,823] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480001 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:14,824] WARN Unable to read additional data from client sessionid 0x10051eebf480001, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,423] WARN Client session timed out, have not heard from server in 4002ms for sessionid 0x10051eebf480000 (org.apache.zookeeper.ClientCnxn:1112) [2018-11-07 17:53:15,423] WARN Unable to read additional data from client sessionid 0x10051eebf480000, likely client has closed socket (org.apache.zookeeper.server.NIOServerCnxn:376) [2018-11-07 17:53:15,879] WARN fsync-ing the write ahead log in SyncThread:0 took 4456ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide (org.apache.zookeeper.server.persistence.FileTxnLog:338) [2018-11-07 17:53:16,831] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition topic-4-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,087] ERROR [ReplicaFetcher replicaId=1, leaderId=0, fetcherId=0] Error for partition invalid-timeout-1 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,088] ERROR [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition invalid-timeout-3 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. [2018-11-07 17:53:23,137] ERROR [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition invalid-timeout-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
>   
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)