You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/01/31 10:31:00 UTC

[jira] [Closed] (FLINK-8068) Failed single-job Flink cluster on YARN shows as SUCCEEDED

     [ https://issues.apache.org/jira/browse/FLINK-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann closed FLINK-8068.
--------------------------------
    Resolution: Invalid

Should no longer be a problem.

> Failed single-job Flink cluster on YARN shows as SUCCEEDED
> ----------------------------------------------------------
>
>                 Key: FLINK-8068
>                 URL: https://issues.apache.org/jira/browse/FLINK-8068
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.4.0
>            Reporter: Nico Kruber
>            Priority: Major
>
> A single-job Flink cluster with a failing program, e.g. a non-existing input file as in this word count example {{flink run -m yarn-cluster -yn 2 -ys 1 -yjm 768 -ytm 1024 ./examples/batch/WordCount.jar --input foobar}}, will be shown by YARN as {{SUCCEEDED}}. This was fixed in the past by FLINK-2226 but apparently turned up again.
> FYI: Flink's CLI will have a non-zero exit status and will also correctly report the error in the output, such as:
> {code}
> ------------------------------------------------------------
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Failed to submit job eede3007fc880de24e4ad24f8de3d4a6 (Flink Java Job at Tue Nov 14 10:09:27 UTC 2017)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:492)
>         at org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:215)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:456)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:444)
>         at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>         at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:815)
>         at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
>         at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
>         at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:89)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
>         at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
>         at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:396)
>         at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:802)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282)
>         at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054)
>         at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101)
>         at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to submit job eede3007fc880de24e4ad24f8de3d4a6 (Flink Java Job at Tue Nov 14 10:09:27 UTC 2017)
>         at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1325)
>         at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:447)
>         at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at org.apache.flink.runtime.clusterframework.ContaineredJobManager$$anonfun$handleContainerMessage$1.applyOrElse(ContaineredJobManager.scala:107)
>         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>         at org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnShutdown$1.applyOrElse(YarnJobManager.scala:110)
>         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>         at org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
>         at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>         at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>         at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>         at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>         at org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
>         at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:122)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:495)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:224)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: File foobar does not exist or the user running Flink ('yarn') has insufficient permissions to access it.
>         at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:262)
>         at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:801)
>         at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:180)
>         at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1277)
>         ... 23 more
> Caused by: java.io.FileNotFoundException: File foobar does not exist or the user running Flink ('yarn') has insufficient permissions to access it.
>         at org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:114)
>         at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:473)
>         at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:62)
>         at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:248)
>         ... 26 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)