You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gyula Fóra <gy...@gmail.com> on 2016/11/12 12:11:22 UTC

Task managers cant start on YARN cluster

Hi,

I am running into some strange issues on yarn with Flink 1.1.3 & 4. For
some reason I started getting this error (see under text.)
The job manager starts and the application is in Accepted state but cannot
seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems
strange)

I didn't change anything on the yarn cluster and this seemed to work
previously (but I just cant get it to work now). The yarn-site.xml contains
the proper rm addresses.

Anybody has any ideas  where to go from here?

Cheers,
Gyula

JM log:

2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client
                     - The ping interval is 60000 ms.
2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client
                     - Connecting to /0.0.0.0:8030
2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client
                     - closing ipc connection to 0.0.0.0/0.0.0.0:8030:
Connection refused

java.net.ConnectException: Call From
splat24.sto.midasplayer.com/172.25.86.166 to 0.0.0.0:8030 failed on
connection exception: java.net.ConnectException: Connection refused;
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
	at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
	at org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)
	at org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)
	at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
	at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
	at akka.actor.ActorCell.create(ActorCell.scala:580)
	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)


Client:

2016-11-12 12:31:31,080 INFO
org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No
path for the flink jar passed. Using the location of class
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2016-11-12 12:31:31,080 INFO
org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No
path for the flink jar passed. Using the location of class
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2016-11-12 12:31:31,101 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   - Using
values:
2016-11-12 12:31:31,101 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
	TaskManager count = 1
2016-11-12 12:31:31,101 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
	JobManager memory = 1024
2016-11-12 12:31:31,102 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
	TaskManager memory = 11000
2016-11-12 12:31:31,119 INFO  org.apache.hadoop.yarn.client.RMProxy
                     - Connecting to ResourceManager at /0.0.0.0:8032
2016-11-12 12:31:31,394 WARN
org.apache.flink.yarn.YarnClusterDescriptor                   - The
file system scheme is 'file'. This indicates that the specified Hadoop
configuration path is wrong and the system is using the default Hadoop
configuration values.The Flink YARN client needs to store its files in
a distributed file system
2016-11-12 12:31:31,457 INFO  org.apache.flink.yarn.Utils
                     - Copying from
file:/fjord/sites/flink-1.1.3/conf/log4j.properties to
file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties
2016-11-12 12:31:42,321 INFO  org.apache.flink.yarn.Utils
                     - Copying from file:/fjord/sites/flink-1.1.3/lib
to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib
2016-11-12 12:32:18,457 INFO  org.apache.flink.yarn.Utils
                     - Copying from
file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar to
file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar
2016-11-12 12:32:39,725 INFO  org.apache.flink.yarn.Utils
                     - Copying from
file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar to
file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar
2016-11-12 12:32:58,154 INFO  org.apache.flink.yarn.Utils
                     - Copying from
/fjord/sites/flink-1.1.3/conf/flink-conf.yaml to
file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml
2016-11-12 12:33:02,218 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
Submitting application master application_1478896050022_0013
2016-11-12 12:33:02,256 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         -
Submitted application application_1478896050022_0013
2016-11-12 12:33:02,257 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
Waiting for the cluster to be allocated
2016-11-12 12:33:02,259 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
Deploying cluster, current state ACCEPTED
2016-11-12 12:34:02,485 INFO
org.apache.flink.yarn.YarnClusterDescriptor                   -
Deployment took more than 60 seconds. Please check if the requested
resources are available in the YARN cluster

Re: Task managers cant start on YARN cluster

Posted by Ufuk Celebi <uc...@apache.org>.
Ah, sorry. I thought it was something related to Flink. ;)

On 14 November 2016 at 10:59:44, Gyula Fóra (gyula.fora@gmail.com) wrote:
> What I mean is the logs coming from org.apache.hadoop.ipc.Client if you
> look at my original email (at JM logs)
>  
> Gyula
>  
> Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 10:52):
>  
> > What was the log message shown on DEBUG level?
> >
> > Maybe it makes sense to promote it to INFO. ;)
> >
> > I guess there is no easy way to verify the version, right Max or Robert?
> >
> > On 14 November 2016 at 10:45:52, Gyula Fóra (gyula.fora@gmail.com) wrote:
> > > Hi,
> > >
> > > The main problem was that whatever was going wrong was not apparent in
> > the
> > > Flink Application master runner but it was only shown in the YarnClient
> > > debug log.
> > >
> > > If you run with the default INFO log level all you see that the Yarn
> > client
> > > is trying to fail over again and again as if something was wrong with the
> > > resource manager. Setting it to debug actually shows the error.
> > >
> > > Also it would be great if there was a way to verify YARN versions and
> > > incompatibility, not sure if this is possible easily.
> > >
> > > Gyula
> > >
> > > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42):
> > >
> > > > Good to know that you solved this. :) Do you think there is something
> > we
> > > > can do to help users noticing this situation faster?
> > > >
> > > > – Ufuk
> > > >
> > > > On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com)
> > wrote:
> > > > > Hi,
> > > > >
> > > > > What happened is that I compiled Flink with the wrong hadoop
> > version...
> > > > >
> > > > > Sorry :)
> > > > > Gyula
> > > > >
> > > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> > > > > 13:11):
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am running into some strange issues on yarn with Flink 1.1.3 &
> > 4. For
> > > > > > some reason I started getting this error (see under text.)
> > > > > > The job manager starts and the application is in Accepted state but
> > > > cannot
> > > > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030
> > seems
> > > > > > strange)
> > > > > >
> > > > > > I didn't change anything on the yarn cluster and this seemed to
> > work
> > > > > > previously (but I just cant get it to work now). The yarn-site.xml
> > > > contains
> > > > > > the proper rm addresses.
> > > > > >
> > > > > > Anybody has any ideas where to go from here?
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > > > JM log:
> > > > > >
> > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The
> > ping
> > > > interval
> > > > > is 60000 ms.
> > > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
> > > > Connecting to /0.0.0.0:8030
> > > > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client -
> > closing
> > > > ipc connection
> > > > > to 0.0.0.0/0.0.0.0:8030: Connection refused
> > > > > >
> > > > > > java.net.ConnectException: Call From
> > > > splat24.sto.midasplayer.com/172.25.86.166
> > > > > to 0.0.0.0:8030 failed on connection exception:
> > > > java.net.ConnectException: Connection
> > > > > refused; For more details see:
> > > > http://wiki.apache.org/hadoop/ConnectionRefused
> > > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > > Method)
> > > > > > at
> > > >
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)  
> > > > > > at
> > > >
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
> > > > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > > > > > at org.apache.hadoop.net
> > .NetUtils.wrapWithMessage(NetUtils.java:783)
> > > > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > > > > > at
> > > >
> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)  
> > > > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)  
> > > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > > at
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
> > > > > > at
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
> > > > > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > > > > at
> > > >
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)  
> > > > > > at
> > > >
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)  
> > > > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)  
> > > > > > at
> > > >
> > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)  
> > > > > > at
> > > >
> > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)  
> > > > > > at
> > > >
> > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)  
> > > > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > > > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > > > > > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > > > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > > > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > > > > >
> > > > > >
> > > > > > Client:
> > > > > >
> > > > > > 2016-11-12 12:31:31,080 INFO
> > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > > - No path for the flink jar passed. Using the location of class
> > > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > > to locate the jar
> > > > > > 2016-11-12 12:31:31,080 INFO
> > > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > > - No path for the flink jar passed. Using the location of class
> > > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > > to locate the jar
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Using values:
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > TaskManager count = 1
> > > > > > 2016-11-12 12:31:31,101 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > JobManager memory = 1024
> > > > > > 2016-11-12 12:31:31,102 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > TaskManager memory = 11000
> > > > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy
> > -
> > > > Connecting
> > > > > to ResourceManager at /0.0.0.0:8032
> > > > > > 2016-11-12 12:31:31,394 WARN
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > The file system scheme is 'file'. This indicates that the specified
> > > > Hadoop configuration
> > > > > path is wrong and the system is using the default Hadoop
> > configuration
> > > > values.The Flink
> > > > > YARN client needs to store its files in a distributed file system
> > > > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties  
> > > > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/lib
> > > > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib  
> > > > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar  
> > > > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying
> > > > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar  
> > > > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying
> > > > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml
> > > > > to
> > > >
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml  
> > > > > > 2016-11-12 12:33:02,218 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Submitting application master application_1478896050022_0013
> > > > > > 2016-11-12 12:33:02,256 INFO
> > > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
> > > > > - Submitted application application_1478896050022_0013
> > > > > > 2016-11-12 12:33:02,257 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Waiting for the cluster to be allocated
> > > > > > 2016-11-12 12:33:02,259 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Deploying cluster, current state ACCEPTED
> > > > > > 2016-11-12 12:34:02,485 INFO
> > > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > > Deployment took more than 60 seconds. Please check if the requested
> > > > resources are available
> > > > > in the YARN cluster
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>  


Re: Task managers cant start on YARN cluster

Posted by Gyula Fóra <gy...@gmail.com>.
What I mean is the logs coming from org.apache.hadoop.ipc.Client if you
look at my original email (at JM logs)

Gyula

Ufuk Celebi <uc...@apache.org> ezt írta (időpont: 2016. nov. 14., H, 10:52):

> What was the log message shown on DEBUG level?
>
> Maybe it makes sense to promote it to INFO. ;)
>
> I guess there is no easy way to verify the version, right Max or Robert?
>
> On 14 November 2016 at 10:45:52, Gyula Fóra (gyula.fora@gmail.com) wrote:
> > Hi,
> >
> > The main problem was that whatever was going wrong was not apparent in
> the
> > Flink Application master runner but it was only shown in the YarnClient
> > debug log.
> >
> > If you run with the default INFO log level all you see that the Yarn
> client
> > is trying to fail over again and again as if something was wrong with the
> > resource manager. Setting it to debug actually shows the error.
> >
> > Also it would be great if there was a way to verify YARN versions and
> > incompatibility, not sure if this is possible easily.
> >
> > Gyula
> >
> > Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42):
> >
> > > Good to know that you solved this. :) Do you think there is something
> we
> > > can do to help users noticing this situation faster?
> > >
> > > – Ufuk
> > >
> > > On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com)
> wrote:
> > > > Hi,
> > > >
> > > > What happened is that I compiled Flink with the wrong hadoop
> version...
> > > >
> > > > Sorry :)
> > > > Gyula
> > > >
> > > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> > > > 13:11):
> > > >
> > > > > Hi,
> > > > >
> > > > > I am running into some strange issues on yarn with Flink 1.1.3 &
> 4. For
> > > > > some reason I started getting this error (see under text.)
> > > > > The job manager starts and the application is in Accepted state but
> > > cannot
> > > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030
> seems
> > > > > strange)
> > > > >
> > > > > I didn't change anything on the yarn cluster and this seemed to
> work
> > > > > previously (but I just cant get it to work now). The yarn-site.xml
> > > contains
> > > > > the proper rm addresses.
> > > > >
> > > > > Anybody has any ideas where to go from here?
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > JM log:
> > > > >
> > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The
> ping
> > > interval
> > > > is 60000 ms.
> > > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
> > > Connecting to /0.0.0.0:8030
> > > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client -
> closing
> > > ipc connection
> > > > to 0.0.0.0/0.0.0.0:8030: Connection refused
> > > > >
> > > > > java.net.ConnectException: Call From
> > > splat24.sto.midasplayer.com/172.25.86.166
> > > > to 0.0.0.0:8030 failed on connection exception:
> > > java.net.ConnectException: Connection
> > > > refused; For more details see:
> > > http://wiki.apache.org/hadoop/ConnectionRefused
> > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > Method)
> > > > > at
> > >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> > > > > at
> > >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > > > > at org.apache.hadoop.net
> .NetUtils.wrapWithMessage(NetUtils.java:783)
> > > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > > > > at
> > >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> > > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > > > > at
> > >
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
> > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > > at
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > > > at
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > > > at
> > >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> > > > > at
> > >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> > > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > > > > at
> > >
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
> > > > > at
> > >
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
> > > > > at
> > >
> org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)
> > > > > at
> > >
> org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)
> > > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > > > > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > > > >
> > > > >
> > > > > Client:
> > > > >
> > > > > 2016-11-12 12:31:31,080 INFO
> > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > - No path for the flink jar passed. Using the location of class
> > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > to locate the jar
> > > > > 2016-11-12 12:31:31,080 INFO
> > > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > > - No path for the flink jar passed. Using the location of class
> > > org.apache.flink.yarn.YarnClusterDescriptor
> > > > to locate the jar
> > > > > 2016-11-12 12:31:31,101 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > Using values:
> > > > > 2016-11-12 12:31:31,101 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > TaskManager count = 1
> > > > > 2016-11-12 12:31:31,101 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > JobManager memory = 1024
> > > > > 2016-11-12 12:31:31,102 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > TaskManager memory = 11000
> > > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy
> -
> > > Connecting
> > > > to ResourceManager at /0.0.0.0:8032
> > > > > 2016-11-12 12:31:31,394 WARN
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > The file system scheme is 'file'. This indicates that the specified
> > > Hadoop configuration
> > > > path is wrong and the system is using the default Hadoop
> configuration
> > > values.The Flink
> > > > YARN client needs to store its files in a distributed file system
> > > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying
> > > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties
> > > > to
> > >
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties
> > > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying
> > > from file:/fjord/sites/flink-1.1.3/lib
> > > > to
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib
> > > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying
> > > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar
> > > > to
> > >
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar
> > > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying
> > > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar
> > > > to
> > >
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar
> > > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying
> > > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml
> > > > to
> > >
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml
> > > > > 2016-11-12 12:33:02,218 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > Submitting application master application_1478896050022_0013
> > > > > 2016-11-12 12:33:02,256 INFO
> > > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
> > > > - Submitted application application_1478896050022_0013
> > > > > 2016-11-12 12:33:02,257 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > Waiting for the cluster to be allocated
> > > > > 2016-11-12 12:33:02,259 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > Deploying cluster, current state ACCEPTED
> > > > > 2016-11-12 12:34:02,485 INFO
> > > org.apache.flink.yarn.YarnClusterDescriptor -
> > > > Deployment took more than 60 seconds. Please check if the requested
> > > resources are available
> > > > in the YARN cluster
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>

Re: Task managers cant start on YARN cluster

Posted by Ufuk Celebi <uc...@apache.org>.
What was the log message shown on DEBUG level?

Maybe it makes sense to promote it to INFO. ;)

I guess there is no easy way to verify the version, right Max or Robert?

On 14 November 2016 at 10:45:52, Gyula Fóra (gyula.fora@gmail.com) wrote:
> Hi,
>  
> The main problem was that whatever was going wrong was not apparent in the
> Flink Application master runner but it was only shown in the YarnClient
> debug log.
>  
> If you run with the default INFO log level all you see that the Yarn client
> is trying to fail over again and again as if something was wrong with the
> resource manager. Setting it to debug actually shows the error.
>  
> Also it would be great if there was a way to verify YARN versions and
> incompatibility, not sure if this is possible easily.
>  
> Gyula
>  
> Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 9:42):
>  
> > Good to know that you solved this. :) Do you think there is something we
> > can do to help users noticing this situation faster?
> >
> > – Ufuk
> >
> > On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com) wrote:
> > > Hi,
> > >
> > > What happened is that I compiled Flink with the wrong hadoop version...
> > >
> > > Sorry :)
> > > Gyula
> > >
> > > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> > > 13:11):
> > >
> > > > Hi,
> > > >
> > > > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For
> > > > some reason I started getting this error (see under text.)
> > > > The job manager starts and the application is in Accepted state but
> > cannot
> > > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems
> > > > strange)
> > > >
> > > > I didn't change anything on the yarn cluster and this seemed to work
> > > > previously (but I just cant get it to work now). The yarn-site.xml
> > contains
> > > > the proper rm addresses.
> > > >
> > > > Anybody has any ideas where to go from here?
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > JM log:
> > > >
> > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping
> > interval
> > > is 60000 ms.
> > > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
> > Connecting to /0.0.0.0:8030
> > > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing
> > ipc connection
> > > to 0.0.0.0/0.0.0.0:8030: Connection refused
> > > >
> > > > java.net.ConnectException: Call From
> > splat24.sto.midasplayer.com/172.25.86.166
> > > to 0.0.0.0:8030 failed on connection exception:
> > java.net.ConnectException: Connection
> > > refused; For more details see:
> > http://wiki.apache.org/hadoop/ConnectionRefused
> > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > > > at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)  
> > > > at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
> > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > > > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> > > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > > > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > > > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > > > at
> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)  
> > > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > > > at
> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)  
> > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > at
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
> > > > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
> > > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > > at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)  
> > > > at
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)  
> > > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > > > at
> > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)  
> > > > at
> > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)  
> > > > at
> > org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)  
> > > > at
> > org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)  
> > > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > > > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > > >
> > > >
> > > > Client:
> > > >
> > > > 2016-11-12 12:31:31,080 INFO
> > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > - No path for the flink jar passed. Using the location of class
> > org.apache.flink.yarn.YarnClusterDescriptor
> > > to locate the jar
> > > > 2016-11-12 12:31:31,080 INFO
> > org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > > - No path for the flink jar passed. Using the location of class
> > org.apache.flink.yarn.YarnClusterDescriptor
> > > to locate the jar
> > > > 2016-11-12 12:31:31,101 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > Using values:
> > > > 2016-11-12 12:31:31,101 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > TaskManager count = 1
> > > > 2016-11-12 12:31:31,101 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > JobManager memory = 1024
> > > > 2016-11-12 12:31:31,102 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > TaskManager memory = 11000
> > > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy -
> > Connecting
> > > to ResourceManager at /0.0.0.0:8032
> > > > 2016-11-12 12:31:31,394 WARN
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > The file system scheme is 'file'. This indicates that the specified
> > Hadoop configuration
> > > path is wrong and the system is using the default Hadoop configuration
> > values.The Flink
> > > YARN client needs to store its files in a distributed file system
> > > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying
> > from file:/fjord/sites/flink-1.1.3/conf/log4j.properties
> > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties  
> > > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying
> > from file:/fjord/sites/flink-1.1.3/lib
> > > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib  
> > > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying
> > from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar
> > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar  
> > > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying
> > from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar
> > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar  
> > > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying
> > from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml
> > > to
> > file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml  
> > > > 2016-11-12 12:33:02,218 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > Submitting application master application_1478896050022_0013
> > > > 2016-11-12 12:33:02,256 INFO
> > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
> > > - Submitted application application_1478896050022_0013
> > > > 2016-11-12 12:33:02,257 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > Waiting for the cluster to be allocated
> > > > 2016-11-12 12:33:02,259 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > Deploying cluster, current state ACCEPTED
> > > > 2016-11-12 12:34:02,485 INFO
> > org.apache.flink.yarn.YarnClusterDescriptor -
> > > Deployment took more than 60 seconds. Please check if the requested
> > resources are available
> > > in the YARN cluster
> > > >
> > > >
> > >
> >
> >
>  


Re: Task managers cant start on YARN cluster

Posted by Gyula Fóra <gy...@gmail.com>.
Hi,

The main problem was that whatever was going wrong was not apparent in the
Flink Application master runner but it was only shown in the YarnClient
debug log.

If you run with the default INFO log level all you see that the Yarn client
is trying to fail over again and again as if something was wrong with the
resource manager. Setting it to debug actually shows the error.

Also it would be great if there was a way to verify YARN versions and
incompatibility, not sure if this is possible easily.

Gyula

Ufuk Celebi <uc...@apache.org> ezt írta (időpont: 2016. nov. 14., H, 9:42):

> Good to know that you solved this. :) Do you think there is something we
> can do to help users noticing this situation faster?
>
> – Ufuk
>
> On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com) wrote:
> > Hi,
> >
> > What happened is that I compiled Flink with the wrong hadoop version...
> >
> > Sorry :)
> > Gyula
> >
> > Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> > 13:11):
> >
> > > Hi,
> > >
> > > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For
> > > some reason I started getting this error (see under text.)
> > > The job manager starts and the application is in Accepted state but
> cannot
> > > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems
> > > strange)
> > >
> > > I didn't change anything on the yarn cluster and this seemed to work
> > > previously (but I just cant get it to work now). The yarn-site.xml
> contains
> > > the proper rm addresses.
> > >
> > > Anybody has any ideas where to go from here?
> > >
> > > Cheers,
> > > Gyula
> > >
> > > JM log:
> > >
> > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping
> interval
> > is 60000 ms.
> > > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client -
> Connecting to /0.0.0.0:8030
> > > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing
> ipc connection
> > to 0.0.0.0/0.0.0.0:8030: Connection refused
> > >
> > > java.net.ConnectException: Call From
> splat24.sto.midasplayer.com/172.25.86.166
> > to 0.0.0.0:8030 failed on connection exception:
> java.net.ConnectException: Connection
> > refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> > > at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> > > at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > > at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> > > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > > at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> > > at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> > > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > > at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
> > > at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
> > > at
> org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)
> > > at
> org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)
> > > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> > >
> > >
> > > Client:
> > >
> > > 2016-11-12 12:31:31,080 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor
> > to locate the jar
> > > 2016-11-12 12:31:31,080 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> > - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor
> > to locate the jar
> > > 2016-11-12 12:31:31,101 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > Using values:
> > > 2016-11-12 12:31:31,101 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > TaskManager count = 1
> > > 2016-11-12 12:31:31,101 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > JobManager memory = 1024
> > > 2016-11-12 12:31:31,102 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > TaskManager memory = 11000
> > > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy -
> Connecting
> > to ResourceManager at /0.0.0.0:8032
> > > 2016-11-12 12:31:31,394 WARN
> org.apache.flink.yarn.YarnClusterDescriptor -
> > The file system scheme is 'file'. This indicates that the specified
> Hadoop configuration
> > path is wrong and the system is using the default Hadoop configuration
> values.The Flink
> > YARN client needs to store its files in a distributed file system
> > > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying
> from file:/fjord/sites/flink-1.1.3/conf/log4j.properties
> > to
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties
> > > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying
> from file:/fjord/sites/flink-1.1.3/lib
> > to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib
> > > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying
> from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar
> > to
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar
> > > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying
> from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar
> > to
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar
> > > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying
> from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml
> > to
> file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml
> > > 2016-11-12 12:33:02,218 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > Submitting application master application_1478896050022_0013
> > > 2016-11-12 12:33:02,256 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
> > - Submitted application application_1478896050022_0013
> > > 2016-11-12 12:33:02,257 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > Waiting for the cluster to be allocated
> > > 2016-11-12 12:33:02,259 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > Deploying cluster, current state ACCEPTED
> > > 2016-11-12 12:34:02,485 INFO
> org.apache.flink.yarn.YarnClusterDescriptor -
> > Deployment took more than 60 seconds. Please check if the requested
> resources are available
> > in the YARN cluster
> > >
> > >
> >
>
>

Re: Task managers cant start on YARN cluster

Posted by Ufuk Celebi <uc...@apache.org>.
Good to know that you solved this. :) Do you think there is something we can do to help users noticing this situation faster?

– Ufuk

On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.fora@gmail.com) wrote:
> Hi,
>  
> What happened is that I compiled Flink with the wrong hadoop version...
>  
> Sorry :)
> Gyula
>  
> Gyula Fóra ezt írta (időpont: 2016. nov. 12., Szo,
> 13:11):
>  
> > Hi,
> >
> > I am running into some strange issues on yarn with Flink 1.1.3 & 4. For
> > some reason I started getting this error (see under text.)
> > The job manager starts and the application is in Accepted state but cannot
> > seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems
> > strange)
> >
> > I didn't change anything on the yarn cluster and this seemed to work
> > previously (but I just cant get it to work now). The yarn-site.xml contains
> > the proper rm addresses.
> >
> > Anybody has any ideas where to go from here?
> >
> > Cheers,
> > Gyula
> >
> > JM log:
> >
> > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - The ping interval  
> is 60000 ms.
> > 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8030  
> > 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client - closing ipc connection  
> to 0.0.0.0/0.0.0.0:8030: Connection refused
> >
> > java.net.ConnectException: Call From splat24.sto.midasplayer.com/172.25.86.166  
> to 0.0.0.0:8030 failed on connection exception: java.net.ConnectException: Connection  
> refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused  
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)  
> > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
> > at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> > at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> > at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> > at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)  
> > at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> > at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)  
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
> > at java.lang.reflect.Method.invoke(Method.java:497)
> > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)  
> > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)  
> > at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> > at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)  
> > at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)  
> > at org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)  
> > at org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)  
> > at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> > at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> > at akka.actor.ActorCell.create(ActorCell.scala:580)
> > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> >
> >
> > Client:
> >
> > 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli  
> - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor  
> to locate the jar
> > 2016-11-12 12:31:31,080 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli  
> - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor  
> to locate the jar
> > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> Using values:
> > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> TaskManager count = 1
> > 2016-11-12 12:31:31,101 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> JobManager memory = 1024
> > 2016-11-12 12:31:31,102 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> TaskManager memory = 11000
> > 2016-11-12 12:31:31,119 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting  
> to ResourceManager at /0.0.0.0:8032
> > 2016-11-12 12:31:31,394 WARN org.apache.flink.yarn.YarnClusterDescriptor -  
> The file system scheme is 'file'. This indicates that the specified Hadoop configuration  
> path is wrong and the system is using the default Hadoop configuration values.The Flink  
> YARN client needs to store its files in a distributed file system
> > 2016-11-12 12:31:31,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/conf/log4j.properties  
> to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties  
> > 2016-11-12 12:31:42,321 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib  
> to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib  
> > 2016-11-12 12:32:18,457 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar  
> to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar  
> > 2016-11-12 12:32:39,725 INFO org.apache.flink.yarn.Utils - Copying from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar  
> to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar  
> > 2016-11-12 12:32:58,154 INFO org.apache.flink.yarn.Utils - Copying from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml  
> to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml  
> > 2016-11-12 12:33:02,218 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> Submitting application master application_1478896050022_0013
> > 2016-11-12 12:33:02,256 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  
> - Submitted application application_1478896050022_0013
> > 2016-11-12 12:33:02,257 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> Waiting for the cluster to be allocated
> > 2016-11-12 12:33:02,259 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> Deploying cluster, current state ACCEPTED
> > 2016-11-12 12:34:02,485 INFO org.apache.flink.yarn.YarnClusterDescriptor -  
> Deployment took more than 60 seconds. Please check if the requested resources are available  
> in the YARN cluster
> >
> >
>  


Re: Task managers cant start on YARN cluster

Posted by Gyula Fóra <gy...@gmail.com>.
Hi,

What happened is that I compiled Flink with the wrong hadoop version...

Sorry :)
Gyula

Gyula Fóra <gy...@gmail.com> ezt írta (időpont: 2016. nov. 12., Szo,
13:11):

> Hi,
>
> I am running into some strange issues on yarn with Flink 1.1.3 & 4. For
> some reason I started getting this error (see under text.)
> The job manager starts and the application is in Accepted state but cannot
> seem to be able to communicate with the scheduler. (0.0.0.0:8030 seems
> strange)
>
> I didn't change anything on the yarn cluster and this seemed to work
> previously (but I just cant get it to work now). The yarn-site.xml contains
> the proper rm addresses.
>
> Anybody has any ideas  where to go from here?
>
> Cheers,
> Gyula
>
> JM log:
>
> 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client                                  - The ping interval is 60000 ms.
> 2016-11-12 11:56:06,894 DEBUG org.apache.hadoop.ipc.Client                                  - Connecting to /0.0.0.0:8030
> 2016-11-12 11:56:06,899 DEBUG org.apache.hadoop.ipc.Client                                  - closing ipc connection to 0.0.0.0/0.0.0.0:8030: Connection refused
>
> java.net.ConnectException: Call From splat24.sto.midasplayer.com/172.25.86.166 to 0.0.0.0:8030 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy8.registerApplicationMaster(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy9.registerApplicationMaster(Unknown Source)
> 	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:196)
> 	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
> 	at org.apache.flink.yarn.YarnFlinkResourceManager.initialize(YarnFlinkResourceManager.java:259)
> 	at org.apache.flink.runtime.clusterframework.FlinkResourceManager.preStart(FlinkResourceManager.java:185)
> 	at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
> 	at akka.actor.UntypedActor.aroundPreStart(UntypedActor.scala:97)
> 	at akka.actor.ActorCell.create(ActorCell.scala:580)
> 	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
> 	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>
>
> Client:
>
> 2016-11-12 12:31:31,080 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2016-11-12 12:31:31,080 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2016-11-12 12:31:31,101 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Using values:
> 2016-11-12 12:31:31,101 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - 	TaskManager count = 1
> 2016-11-12 12:31:31,101 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - 	JobManager memory = 1024
> 2016-11-12 12:31:31,102 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - 	TaskManager memory = 11000
> 2016-11-12 12:31:31,119 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
> 2016-11-12 12:31:31,394 WARN  org.apache.flink.yarn.YarnClusterDescriptor                   - The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
> 2016-11-12 12:31:31,457 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/fjord/sites/flink-1.1.3/conf/log4j.properties to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/log4j.properties
> 2016-11-12 12:31:42,321 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/fjord/sites/flink-1.1.3/lib to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/lib
> 2016-11-12 12:32:18,457 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/fjord/sites/rbea/rbea-on-flink-1.0-SNAPSHOT.jar to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/rbea-on-flink-1.0-SNAPSHOT.jar
> 2016-11-12 12:32:39,725 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/fjord/sites/flink-1.1.3/lib/flink-dist_2.10-1.1.4.jar to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-dist_2.10-1.1.4.jar
> 2016-11-12 12:32:58,154 INFO  org.apache.flink.yarn.Utils                                   - Copying from /fjord/sites/flink-1.1.3/conf/flink-conf.yaml to file:/fjord/splat/flink/yarn/.flink/application_1478896050022_0013/flink-conf.yaml
> 2016-11-12 12:33:02,218 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Submitting application master application_1478896050022_0013
> 2016-11-12 12:33:02,256 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1478896050022_0013
> 2016-11-12 12:33:02,257 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Waiting for the cluster to be allocated
> 2016-11-12 12:33:02,259 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deploying cluster, current state ACCEPTED
> 2016-11-12 12:34:02,485 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
>
>