You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/03/07 09:59:05 UTC

[GitHub] flink pull request #5652: [hotfix][tests] Do not use singleActorSystem in Lo...

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/5652

    [hotfix][tests] Do not use singleActorSystem in LocalFlinkMiniCluster

    ## What is the purpose of the change
    
    The legacy cluster started in {{MiniClusterResource}} used a single actor system, which rendered the returned {{ClusterClient}} unusable.
    
    This change will unfortunately cause tests to take longer, but i don't know how to fix this in another way.
    
    On every access you would get this exception below:
    ```
    org.apache.flink.client.program.ProgramInvocationException: Failed to retrieve the JobManager gateway.
    
        at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:513)
    
        at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:113)
    
    Caused by: org.apache.flink.util.FlinkException: Could not find out our own hostname by connecting to the leading JobManager. Please make sure that the Flink cluster has been started.
    
        at org.apache.flink.client.program.ClusterClient$LazyActorSystemLoader.get(ClusterClient.java:248)
    
        at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:923)
    
        at org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:511)
    
        ... 30 more
    
    Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not find the connecting address by connecting to the current leader.
    
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.findConnectingAddress(LeaderRetrievalUtils.java:164)
    
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.findConnectingAddress(LeaderRetrievalUtils.java:145)
    
        at org.apache.flink.client.program.ClusterClient$LazyActorSystemLoader.get(ClusterClient.java:244)
    
        ... 32 more
    
    Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the connecting address to the current leader with the akka URL akka://flink/user/jobmanager_1.
    
        at org.apache.flink.runtime.net.ConnectionUtils$LeaderConnectingAddressListener.findConnectingAddress(ConnectionUtils.java:472)
    
        at org.apache.flink.runtime.net.ConnectionUtils$LeaderConnectingAddressListener.findConnectingAddress(ConnectionUtils.java:361)
    
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.findConnectingAddress(LeaderRetrievalUtils.java:162)
    
        ... 34 more
    
    Caused by: java.lang.Exception: Could not retrieve InetSocketAddress from Akka URL akka://flink/user/jobmanager_1
    
        at org.apache.flink.runtime.akka.AkkaUtils$.getInetSocketAddressFromAkkaURL(AkkaUtils.scala:709)
    
        at org.apache.flink.runtime.akka.AkkaUtils.getInetSocketAddressFromAkkaURL(AkkaUtils.scala)
    
        at org.apache.flink.runtime.net.ConnectionUtils$LeaderConnectingAddressListener.findConnectingAddress(ConnectionUtils.java:392)
    
        ... 36 more
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink hotfix_single

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5652
    
----
commit 6a105cbb194b87dec98224b985ee5ceb9239d492
Author: zentol <ch...@...>
Date:   2018-03-05T12:45:33Z

    [hotfix][tests] Do not use singleActorSystem in LocalFlinkMiniCluster
    
    Using a singleActorSystem rendered the returned client unusable.

----


---

[GitHub] flink issue #5652: [hotfix][tests] Do not use singleActorSystem in LocalFlin...

Posted by zentol <gi...@git.apache.org>.

Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/5652
  
    All legacy tests going through the `MiniClusterResource` will take longer. I don't know by how much, but we now have to start multiple actor systems and the JM<->TM communication is no longer local.


---