You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (JIRA)" <ji...@apache.org> on 2015/11/09 17:50:11 UTC

[jira] [Created] (FLINK-2990) Scala 2.11 build fails to start on YARN

Robert Metzger created FLINK-2990:
-------------------------------------

             Summary: Scala 2.11 build fails to start on YARN
                 Key: FLINK-2990
                 URL: https://issues.apache.org/jira/browse/FLINK-2990
             Project: Flink
          Issue Type: Bug
          Components: Build System, YARN Client
    Affects Versions: 0.10, 1.0
            Reporter: Robert Metzger
            Assignee: Robert Metzger


Deploying the scala 2.11 build of Flink on YARN seems to fail

{code}
robert@hn0-apache:~/flink010-hd22-scala211/flink-0.10.0$ ./bin/yarn-session.sh -n 2
16:36:32,484 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16:36:32,748 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Using values:
16:36:32,750 INFO  org.apache.flink.yarn.FlinkYarnClient                         - 	TaskManager count = 2
16:36:32,750 INFO  org.apache.flink.yarn.FlinkYarnClient                         - 	JobManager memory = 1024
16:36:32,750 INFO  org.apache.flink.yarn.FlinkYarnClient                         - 	TaskManager memory = 1024
16:36:32,874 INFO  org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing over to rm2
16:36:32,930 WARN  org.apache.flink.yarn.FlinkYarnClient                         - The JobManager or TaskManager memory is below the smallest possible YARN Container size. The value of 'yarn.scheduler.minimum-allocation-mb' is '1536'. Please increase the memory size.YARN will allocate the smaller containers but the scheduler will account for the minimum-allocation-mb, maybe not all instances you requested will start.
16:36:33,448 WARN  org.apache.hadoop.hdfs.BlockReaderLocal                       - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16:36:33,489 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-distabc.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar
16:36:35,367 INFO  org.apache.flink.yarn.Utils                                   - Copying from /home/robert/flink010-hd22-scala211/flink-0.10.0/conf/flink-conf.yaml to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-conf.yaml
16:36:35,695 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-python_2.11-0.10.0.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-python_2.11-0.10.0.jar
16:36:35,882 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/flink-distabc.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar
16:36:37,522 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/slf4j-log4j12-1.7.7.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/slf4j-log4j12-1.7.7.jar
16:36:37,740 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/lib/log4j-1.2.17.jar to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/log4j-1.2.17.jar
16:36:37,960 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/conf/logback.xml to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/logback.xml
16:36:38,397 INFO  org.apache.flink.yarn.Utils                                   - Copying from file:/home/robert/flink010-hd22-scala211/flink-0.10.0/conf/log4j.properties to hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/log4j.properties
16:36:38,840 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Submitting application master application_1447063737177_0017
16:36:39,081 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Submitted application application_1447063737177_0017
16:36:39,081 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Waiting for the cluster to be allocated
16:36:39,084 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
16:36:40,086 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Deploying cluster, current state ACCEPTED
Error while deploying YARN cluster: The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1447063737177_0017 failed 1 times due to AM Container for appattempt_1447063737177_0017_000001 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8088/proxy/application_1447063737177_0017/Then, click on links to logs of each attempt.
Diagnostics: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508
java.io.IOException: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508
	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1447063737177_0017
org.apache.flink.yarn.FlinkYarnClientBase$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1447063737177_0017 failed 1 times due to AM Container for appattempt_1447063737177_0017_000001 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8088/proxy/application_1447063737177_0017/Then, click on links to logs of each attempt.
Diagnostics: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508
java.io.IOException: Resource hdfs://hn1-apache.vbkocrowebre3dyigxo55soqnb.ax.internal.cloudapp.net:8020/user/robert/.flink/application_1447063737177_0017/flink-distabc.jar changed on src filesystem (expected 1447086995336, was 1447086997508
	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
	at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
	at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1447063737177_0017
	at org.apache.flink.yarn.FlinkYarnClientBase.deployInternal(FlinkYarnClientBase.java:646)
	at org.apache.flink.yarn.FlinkYarnClientBase.deploy(FlinkYarnClientBase.java:338)
	at org.apache.flink.client.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:409)
	at org.apache.flink.client.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:351)
{code}

The problem is that the flink-dist.jar is uploaded twice to HDFS, overwriting the timestamp. When the YARN container gets allocated, the timestamps mismatch and YARN rejects the JAR file.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)