You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/23 02:33:00 UTC

[jira] [Work logged] (CURATOR-535) TestServer random port selection has a race condition

     [ https://issues.apache.org/jira/browse/CURATOR-535?focusedWorklogId=773270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773270 ]

ASF GitHub Bot logged work on CURATOR-535:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/May/22 02:32
            Start Date: 23/May/22 02:32
    Worklog Time Spent: 10m 
      Work Description: paul8263 commented on PR #406:
URL: https://github.com/apache/curator/pull/406#issuecomment-1134102562

   > It seems that with this patch it's still possible that a resolved port be in used by other process before it's used by the caller.
   
   Yes. currently this PR updates InstanceSpec so that the random port will only be allocated when it is actually used.
   But There are still some more steps before the port is actually used: wrap the configuration into QuorumPeerConfig and start the server. There could be a race condition in these steps.
   ```java    
   QuorumPeerConfig config = configBuilder.buildConfig(thisInstanceIndex);
   main.runFromConfig(config);
   ```
   I suggest we could introduce a file lock in order to alleviate the race condition among processes. But the implementation might be cumbersome.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 773270)
    Time Spent: 20m  (was: 10m)

> TestServer random port selection has a race condition
> -----------------------------------------------------
>
>                 Key: CURATOR-535
>                 URL: https://issues.apache.org/jira/browse/CURATOR-535
>             Project: Apache Curator
>          Issue Type: Bug
>    Affects Versions: 4.2.0
>         Environment: Operating System: 
>   Fedora 30 (amd64)
> JVM:
>   openjdk version "1.8.0_212"
>   OpenJDK Runtime Environment (build 1.8.0_212-b04)
>   OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)
>            Reporter: Laverne Schrock
>            Priority: Minor
>         Attachments: BugReproducer.java, log4j.properties
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using one of the constructors for org.apache.curator.test.TestingServer that doesn't take a port number, the org.apache.curator.test.InstanceSpec that is constructed will chose random available ports to use. However, InstanceSpec only binds those ports during construction and then unbinds them so that they can be used when TestingServer.start() is called.
> This disconnect between port selection creates a race condition where some other process (or thread) could bind the port before TestingServer is started.
> I've seen this very rarely in our integration test suite that spins up and tears down TestingServer many times. I've attached a simple class for reproducing the issue. If you run it in an environment with log4j loaded and the attached log4j.properties, you should see output like the following (though it sometimes takes more iterations):
> {{completed iteration: 0}}
>  {{completed iteration: 500}}
>  {{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server (random state: false) for instance: InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707, electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true, serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={}, hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}}
>  {{java.net.BindException: Address already in use}}
>  {{    at sun.nio.ch.Net.bind0(Native Method)}}
>  {{    at sun.nio.ch.Net.bind(Net.java:433)}}
>  {{    at sun.nio.ch.Net.bind(Net.java:425)}}
>  {{    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}}
>  {{    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}}
>  {{    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}}
>  {{    at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}}
>  {{    at org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}}
>  {{    at org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}}
>  {{    at org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}}
>  {{    at org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}}
>  {{    at java.lang.Thread.run(Thread.java:748)}}
>  {{java.lang.IllegalStateException: Timed out waiting for watch removal}}
>  {{    at org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}}
>  {{    at org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}}
>  {{    at org.apache.curator.test.TestingServer.start(TestingServer.java:148)}}
>  {{    at BugReproducer.main(BugReproducer.java:15)}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)