You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "sk_acura@yahoo.com" <sk...@yahoo.com> on 2020/06/18 14:15:00 UTC

Submitted Flink Jobs EMR are failing (Could not start rest endpoint on any port in port range 8081)

I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the following command
flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar
and i am getting the following error:
    Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 
After going through the logs on the worker nodes and job manager logs it looks like there is a port conflict
    2020-06-17 21:40:51,199 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not start cluster entrypoint YarnJobClusterEntrypoint.    org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.            at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)            at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)            at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)    Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.            at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)            at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)            at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)            at java.security.AccessController.doPrivileged(Native Method)            at javax.security.auth.Subject.doAs(Subject.java:422)            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)            at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)            at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)            ... 2 more    Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8081            at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:219)            at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)            ... 9 more
There seems to be JIRA Ticket (https://issues.apache.org/jira/browse/FLINK-15394) open for this (though it is for 1.9 version of Flink) and the suggested solution is to use port range for **rest.bind-port** in Flink config File.
How ever in 1.10 version of Flink we only the following the the Yan Conf YML File
    rest.port: 8081
Another issue i am facing is i have submitted multiple Flink jobs (same job multiple times) using AWS Console and via Add Step ui. Only one of the job succeeded and the rest have failed with the error posted above. And when i go to Flink UI it doesn't show any jobs at all.
Wondering whether each of the submitted jobs trying to create a Flink Yarn session instead of using the existing one.
ThanksSateesh

Re: Submitted Flink Jobs EMR are failing (Could not start rest endpoint on any port in port range 8081)

Posted by Yang Wang <da...@gmail.com>.
Hi Sateesh, if the "rest.port" or "rest.bind-port" is configured
explicitly, it will be used to
start the rest server. So you need to remove them from the flink-conf.yaml
or configure them
to "0" or port range(50100-50200).

By default, "flink run" will always start a dedicated Flink cluster for
each job. If you want to use
session mode, you need to start with "yarn-session.sh" first. And then use
"flink run ... -yid application_id"
to submit a Flink job to existing cluster.


Best,
Yang

Arvid Heise <ar...@ververica.com> 于2020年6月22日周一 下午9:58写道:

> Hi Sateesh,
>
> the solution still applies, there are not all entries listed in the conf
> template.
>
> From what you have written, it's most certainly that the first jobs are
> not finished (hence port is taken). Make sure you don't use the detached
> mode when submitting.
> You can see the status of the jobs in YARN resource manager which also
> links to the respective Flink JobManagers.
>
> And yes, by default, each job creates a new YARN session unless you use
> them explicitly [1].
>
> If you need more help, please post your steps.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#flink-yarn-session
>
> On Thu, Jun 18, 2020 at 4:15 PM sk_acura@yahoo.com <sk...@yahoo.com>
> wrote:
>
>> I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the
>> following command
>>
>> flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar
>>
>> and i am getting the following error:
>>
>>     Caused by:
>> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
>> YARN application unexpectedly switched to state FAILED during deployment.
>>
>> After going through the logs on the worker nodes and job manager logs it
>> looks like there is a port conflict
>>
>>     2020-06-17 21:40:51,199 ERROR
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not
>> start cluster entrypoint YarnJobClusterEntrypoint.
>>     org.apache.flink.runtime.entrypoint.ClusterEntrypointException:
>> Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
>>             at
>> org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
>>     Caused by: org.apache.flink.util.FlinkException: Could not create the
>> DispatcherResourceManagerComponent.
>>             at
>> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
>>             at java.security.AccessController.doPrivileged(Native Method)
>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>             at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>             at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>             at
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
>>             ... 2 more
>>     Caused by: java.net.BindException: Could not start rest endpoint on
>> any port in port range 8081
>>             at
>> org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:219)
>>             at
>> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
>>             ... 9 more
>>
>> There seems to be JIRA Ticket (
>> https://issues.apache.org/jira/browse/FLINK-15394) open for this (though
>> it is for 1.9 version of Flink) and the suggested solution is to use port
>> range for **rest.bind-port** in Flink config File.
>>
>> How ever in 1.10 version of Flink we only the following the the Yan Conf
>> YML File
>>
>>     rest.port: 8081
>>
>> Another issue i am facing is i have submitted multiple Flink jobs (same
>> job multiple times) using AWS Console and via Add Step ui. Only one of the
>> job succeeded and the rest have failed with the error posted above. And
>> when i go to Flink UI it doesn't show any jobs at all.
>>
>> Wondering whether each of the submitted jobs trying to create a Flink
>> Yarn session instead of using the existing one.
>>
>> Thanks
>> Sateesh
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Submitted Flink Jobs EMR are failing (Could not start rest endpoint on any port in port range 8081)

Posted by Arvid Heise <ar...@ververica.com>.
Hi Sateesh,

the solution still applies, there are not all entries listed in the conf
template.

From what you have written, it's most certainly that the first jobs are not
finished (hence port is taken). Make sure you don't use the detached mode
when submitting.
You can see the status of the jobs in YARN resource manager which also
links to the respective Flink JobManagers.

And yes, by default, each job creates a new YARN session unless you use
them explicitly [1].

If you need more help, please post your steps.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#flink-yarn-session

On Thu, Jun 18, 2020 at 4:15 PM sk_acura@yahoo.com <sk...@yahoo.com>
wrote:

> I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the
> following command
>
> flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar
>
> and i am getting the following error:
>
>     Caused by:
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
> YARN application unexpectedly switched to state FAILED during deployment.
>
> After going through the logs on the worker nodes and job manager logs it
> looks like there is a port conflict
>
>     2020-06-17 21:40:51,199 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not
> start cluster entrypoint YarnJobClusterEntrypoint.
>     org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed
> to initialize the cluster entrypoint YarnJobClusterEntrypoint.
>             at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
>             at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
>             at
> org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
>     Caused by: org.apache.flink.util.FlinkException: Could not create the
> DispatcherResourceManagerComponent.
>             at
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
>             at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
>             at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
>             at java.security.AccessController.doPrivileged(Native Method)
>             at javax.security.auth.Subject.doAs(Subject.java:422)
>             at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>             at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>             at
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
>             ... 2 more
>     Caused by: java.net.BindException: Could not start rest endpoint on
> any port in port range 8081
>             at
> org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:219)
>             at
> org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
>             ... 9 more
>
> There seems to be JIRA Ticket (
> https://issues.apache.org/jira/browse/FLINK-15394) open for this (though
> it is for 1.9 version of Flink) and the suggested solution is to use port
> range for **rest.bind-port** in Flink config File.
>
> How ever in 1.10 version of Flink we only the following the the Yan Conf
> YML File
>
>     rest.port: 8081
>
> Another issue i am facing is i have submitted multiple Flink jobs (same
> job multiple times) using AWS Console and via Add Step ui. Only one of the
> job succeeded and the rest have failed with the error posted above. And
> when i go to Flink UI it doesn't show any jobs at all.
>
> Wondering whether each of the submitted jobs trying to create a Flink Yarn
> session instead of using the existing one.
>
> Thanks
> Sateesh
>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng