You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2018/11/18 12:44:10 UTC

[jira] [Updated] (FLINK-10381) concurrent submit job get ProgramAbortException

     [ https://issues.apache.org/jira/browse/FLINK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-10381:
----------------------------------
    Fix Version/s: 1.8.0

> concurrent submit job get ProgramAbortException
> -----------------------------------------------
>
>                 Key: FLINK-10381
>                 URL: https://issues.apache.org/jira/browse/FLINK-10381
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.4.0, 1.5.1, 1.6.0
>         Environment: Flink 1.4.0, standardalone cluster.
>            Reporter: Youjun Yuan
>            Priority: Major
>             Fix For: 1.7.0, 1.8.0
>
>         Attachments: image-2018-09-20-22-40-31-846.png
>
>
> if submit multiple jobs concurrently, some the them are likely to fail, and return following exception: 
> _java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not run the jar._ 
> _at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:90)_ 
> _at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler$$Lambda$47/713642705.get(Unknown Source)_ 
> _at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1582)_ 
> _at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)_ 
> _at java.util.concurrent.FutureTask.run(FutureTask.java:266)_ 
> _at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)_ 
> _at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)_ 
> _at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)_ 
> _at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)_ 
> _at java.lang.Thread.run(Thread.java:745)_
> _Caused by: org.apache.flink.util.FlinkException: Could not run the jar. ... 10 more_
> _Caused by: org.apache.flink.client.program.ProgramInvocationException: The program caused an error:_ 
> _at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:93)_ 
> _at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:334)_ 
> _at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:76)_ 
> _at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleJsonRequest$0(JarRunHandler.java:69) ... 9 more_
> _Caused by: org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException_ 
> _at org.apache.flink.streaming.api.environment.StreamPlanEnvironment.execute(StreamPlanEnvironment.java:72)_ 
> _..._ 
> _at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)_ 
> _at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_ 
> _at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_ 
> _at java.lang.reflect.Method.invoke(Method.java:497)_ 
> _at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)_ 
> _at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)_ 
> _at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:83_
>  
> h2. Possible Cause:
> in OptimizerPlanEnvironment.getOptimizerPlan(), setAsContext() will set a static variable named contextEnvironmentFactory in ExecutionEnviroment, which will eventually cause ExecutionEnviroment.getExecutionEnvironment() returns the currently OptimizerPlanEnvironment instance, and capture the optimizerPlan and save to a instance vairable in OptimizerPlanEnvironment.
> However, if multiple jobs are submitted at the same time, the static variable contextEnvironmentFactory in ExecutionEnvironment will be set again by a following job, hence force ExecutionEnviroment.getExecutionEnvironment() return another new instance of OptimizerPlanEnvironment, therefore, the first intance of OptimizerPlanEnvironment will not caputre the optimizerPlan, and throws ProgramInvocationException. The spot is copied below for you convience:
> setAsContext();
>  try {
>  prog.invokeInteractiveModeForExecution();
>  }
>  catch (ProgramInvocationException e) {
>  throw e;
>  }
>  catch (Throwable t) {
>  // the invocation gets aborted with the preview plan
>  if (optimizerPlan != null) {
>  return optimizerPlan;
>  } else {
>  throw new ProgramInvocationException("The program caused an error: ", t);
>  }
>  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)