You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias (Jira)" <ji...@apache.org> on 2020/08/07 13:04:00 UTC

[jira] [Commented] (FLINK-18522) ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss failed with "Address already in use"

    [ https://issues.apache.org/jira/browse/FLINK-18522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173126#comment-17173126 ] 

Matthias commented on FLINK-18522:
----------------------------------

ZooKeepers TestServer is implemented in a way that it selects a random available port. Unfortunately, the port selection is separated from the instantiation of the ZooKeeper instances. It looks like there's a race condition between multiple ZooKeeper instances selecting the same port due to the tests running in parallel.

We investigated the option of changing the default port from {{-1}} to 0 to trigger the port selection at startup time. Unfortunately, the [ZooKeeper code|https://github.com/apache/zookeeper/blob/branch-3.4.14/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java#L316] does a validation on the configuration leading to an IllegalArgumentException.

We decided to close this issue for now. We might have to consider running the tests in sequential order if we see this error coming up more often.

> ZKCheckpointIDCounterMultiServersTest.testRecoveredAfterConnectionLoss failed with "Address already in use"
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18522
>                 URL: https://issues.apache.org/jira/browse/FLINK-18522
>             Project: Flink
>          Issue Type: Test
>          Components: Runtime / Checkpointing, Runtime / Coordination, Tests
>    Affects Versions: 1.10.1
>            Reporter: Dian Fu
>            Assignee: Matthias
>            Priority: Major
>              Labels: test-stability
>
> [https://travis-ci.org/github/apache/flink/jobs/705770513]
> {code}
> 15:09:34.674 [ERROR] testRecoveredAfterConnectionLoss(org.apache.flink.runtime.checkpoint.ZKCheckpointIDCounterMultiServersTest)  Time elapsed: 5.74 s  <<< ERROR!
> java.net.BindException: Address already in use
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)