You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Ana M. Martinez" <an...@cs.aau.dk> on 2016/01/20 15:23:31 UTC

Could not upload the jar files to the job manager IOException

Hi all,

I am running some experiments with flink in an Amazon cluster and every now and then (it seems to appear at random) I get the following IOException:
> org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the jar files to the job manager.

Sometimes when it fails, I just try to run it again immediately afterwords and it works fine. Any idea on why that might be happening?

Thanks,
Ana

Re: Could not upload the jar files to the job manager IOException

Posted by Robert Metzger <rm...@apache.org>.
Hi,

this is the log file of your local client submitting the job to Flink's
JobManager (master). As you can see from the log, the jar upload failed
because of this issue: "Caused by: java.net.SocketException: Connection
reset". The JobManager is the service at the other end receiving the file.
I'm pretty sure its logging why the connection has been reset. Maybe the JM
jvm failed, the disk was full, there was an network issue ....

The JobManager log file is located on the machine where the jobmanager is
running.
If you are using Flink on YARN, just get the aggregated application logs,
and search for the "jobmanager.log" file.

On Thu, Jan 21, 2016 at 9:27 AM, Ana M. Martinez <an...@cs.aau.dk> wrote:

>
> Hi Robert,
>
> Thanks for your answer. Do you mean the log file in (e.g.)
> flink-0.10.0/log/flink-hadoop-client-ip-172-31-10-193.log? Or you mean
> another log file?
>
> In this log, the error message is as follows:
>
> 08:16:03,437 INFO  org.apache.flink.runtime.client.JobClient
>       - Job execution complete
> 08:16:03,438 INFO  org.apache.flink.api.java.ExecutionEnvironment
>       - The job has 5 registered types and 0 default Kryo serializers
> 08:16:03,444 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Received job Flink Java Job at Thu Jan 21 08:16:03 UTC 2016
> (73f8ab2fbab61fb72dc4a53fd8dcbb9f).
> 08:16:03,444 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Could not submit job Flink Java Job at Thu Jan 21 08:16:03 UTC 2016
> (73f8ab2fbab61fb72dc4a53fd8dcbb9f), because there is no connection to a
> JobManager.
> 08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Connected to new JobManager
> akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
> 08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Sending message to JobManager
> akka.tcp://flink@172.31.5.123:34614/user/jobmanager to submit job Flink
> Java Job at Thu Jan 21 08:16:03 UTC 2016 (73f8ab2fbab61fb72dc4a53fd8dcbb9f)
> and wait for progress
> 08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Upload jar files to job manager
> akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
> 08:16:03,860 INFO  org.apache.flink.runtime.client.JobClientActor
>       - Submit job to the job manager
> akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
> 08:16:03,860 INFO  org.apache.flink.runtime.client.JobClient
>       - Job execution failed
> 08:16:03,860 ERROR org.apache.flink.client.CliFrontend
>       - Error while running the command.
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error.
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:512)
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
> at
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:675)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:977)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1027)
> Caused by: java.lang.reflect.UndeclaredThrowableException
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeaderARFFFolder(DataFlinkLoader.java:176)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeader(DataFlinkLoader.java:137)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.access$000(DataFlinkLoader.java:43)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader$DataFlinkFile.<init>(DataFlinkLoader.java:281)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.loadDataFromFolder(DataFlinkLoader.java:80)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.loadDynamicDataFromFolder(DataFlinkLoader.java:90)
> at
> eu.amidst.flinklink.examples.reviewMeeting2015.GenerateData.createDataSetsDBN(GenerateData.java:194)
> at
> eu.amidst.flinklink.examples.reviewMeeting2015.GenerateData.main(GenerateData.java:208)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
> ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The
> program execution failed: Could not upload the jar files to the job manager.
> at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
> at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
> at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
> at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:804)
> at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
> at
> eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeaderARFFFolder(DataFlinkLoader.java:156)
> ... 18 more
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could
> not upload the jar files to the job manager.
> at
> org.apache.flink.runtime.client.JobClientActor$2.call(JobClientActor.java:338)
> at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.IOException: PUT operation failed: Connection reset
> at
> org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:465)
> at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:327)
> at
> org.apache.flink.runtime.jobgraph.JobGraph.uploadRequiredJarFiles(JobGraph.java:525)
> at
> org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.java:292)
> at
> org.apache.flink.runtime.client.JobClientActor$2.call(JobClientActor.java:332)
> ... 10 more
> Caused by: java.net.SocketException: Connection reset
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at org.apache.flink.runtime.blob.BlobUtils.writeLength(BlobUtils.java:262)
> at
> org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:451)
> ... 14 more
>
> That is, it finds the jar files well until almost the end of the
> execution. Actually, if I run it again it may or may not work.
>
> Ana
>
>
> On 20 Jan 2016, at 15:24, Robert Metzger <rm...@apache.org> wrote:
>
> Hi,
>
> can you check the log file of the JobManager you're trying to submit the
> job to?
> Maybe there you can find helpful information why it failed.
>
> On Wed, Jan 20, 2016 at 3:23 PM, Ana M. Martinez <an...@cs.aau.dk> wrote:
>
>> Hi all,
>>
>> I am running some experiments with flink in an Amazon cluster and every
>> now and then (it seems to appear at random) I get the following IOException:
>> > org.apache.flink.client.program.ProgramInvocationException: The program
>> execution failed: Could not upload the jar files to the job manager.
>>
>> Sometimes when it fails, I just try to run it again immediately
>> afterwords and it works fine. Any idea on why that might be happening?
>>
>> Thanks,
>> Ana
>
>
>
>
>

Re: Could not upload the jar files to the job manager IOException

Posted by "Ana M. Martinez" <an...@cs.aau.dk>.
Hi Robert,

Thanks for your answer. Do you mean the log file in (e.g.) flink-0.10.0/log/flink-hadoop-client-ip-172-31-10-193.log? Or you mean another log file?

In this log, the error message is as follows:

08:16:03,437 INFO  org.apache.flink.runtime.client.JobClient                     - Job execution complete
08:16:03,438 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 5 registered types and 0 default Kryo serializers
08:16:03,444 INFO  org.apache.flink.runtime.client.JobClientActor                - Received job Flink Java Job at Thu Jan 21 08:16:03 UTC 2016 (73f8ab2fbab61fb72dc4a53fd8dcbb9f).
08:16:03,444 INFO  org.apache.flink.runtime.client.JobClientActor                - Could not submit job Flink Java Job at Thu Jan 21 08:16:03 UTC 2016 (73f8ab2fbab61fb72dc4a53fd8dcbb9f), because there is no connection to a JobManager.
08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor                - Connected to new JobManager akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor                - Sending message to JobManager akka.tcp://flink@172.31.5.123:34614/user/jobmanager to submit job Flink Java Job at Thu Jan 21 08:16:03 UTC 2016 (73f8ab2fbab61fb72dc4a53fd8dcbb9f) and wait for progress
08:16:03,446 INFO  org.apache.flink.runtime.client.JobClientActor                - Upload jar files to job manager akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
08:16:03,860 INFO  org.apache.flink.runtime.client.JobClientActor                - Submit job to the job manager akka.tcp://flink@172.31.5.123:34614/user/jobmanager.
08:16:03,860 INFO  org.apache.flink.runtime.client.JobClient                     - Job execution failed
08:16:03,860 ERROR org.apache.flink.client.CliFrontend                           - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:512)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
at org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:675)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:326)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:977)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1027)
Caused by: java.lang.reflect.UndeclaredThrowableException
at eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeaderARFFFolder(DataFlinkLoader.java:176)
at eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeader(DataFlinkLoader.java:137)
at eu.amidst.flinklink.core.io.DataFlinkLoader.access$000(DataFlinkLoader.java:43)
at eu.amidst.flinklink.core.io.DataFlinkLoader$DataFlinkFile.<init>(DataFlinkLoader.java:281)
at eu.amidst.flinklink.core.io.DataFlinkLoader.loadDataFromFolder(DataFlinkLoader.java:80)
at eu.amidst.flinklink.core.io.DataFlinkLoader.loadDynamicDataFromFolder(DataFlinkLoader.java:90)
at eu.amidst.flinklink.examples.reviewMeeting2015.GenerateData.createDataSetsDBN(GenerateData.java:194)
at eu.amidst.flinklink.examples.reviewMeeting2015.GenerateData.main(GenerateData.java:208)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
... 6 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the jar files to the job manager.
at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:804)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
at eu.amidst.flinklink.core.io.DataFlinkLoader.loadHeaderARFFFolder(DataFlinkLoader.java:156)
... 18 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not upload the jar files to the job manager.
at org.apache.flink.runtime.client.JobClientActor$2.call(JobClientActor.java:338)
at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: PUT operation failed: Connection reset
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:465)
at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:327)
at org.apache.flink.runtime.jobgraph.JobGraph.uploadRequiredJarFiles(JobGraph.java:525)
at org.apache.flink.runtime.client.JobClient.uploadJarFiles(JobClient.java:292)
at org.apache.flink.runtime.client.JobClientActor$2.call(JobClientActor.java:332)
... 10 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.apache.flink.runtime.blob.BlobUtils.writeLength(BlobUtils.java:262)
at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:451)
... 14 more

That is, it finds the jar files well until almost the end of the execution. Actually, if I run it again it may or may not work.

Ana

On 20 Jan 2016, at 15:24, Robert Metzger <rm...@apache.org>> wrote:

Hi,

can you check the log file of the JobManager you're trying to submit the job to?
Maybe there you can find helpful information why it failed.

On Wed, Jan 20, 2016 at 3:23 PM, Ana M. Martinez <an...@cs.aau.dk>> wrote:
Hi all,

I am running some experiments with flink in an Amazon cluster and every now and then (it seems to appear at random) I get the following IOException:
> org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the jar files to the job manager.

Sometimes when it fails, I just try to run it again immediately afterwords and it works fine. Any idea on why that might be happening?

Thanks,
Ana




Re: Could not upload the jar files to the job manager IOException

Posted by Robert Metzger <rm...@apache.org>.
Hi,

can you check the log file of the JobManager you're trying to submit the
job to?
Maybe there you can find helpful information why it failed.

On Wed, Jan 20, 2016 at 3:23 PM, Ana M. Martinez <an...@cs.aau.dk> wrote:

> Hi all,
>
> I am running some experiments with flink in an Amazon cluster and every
> now and then (it seems to appear at random) I get the following IOException:
> > org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: Could not upload the jar files to the job manager.
>
> Sometimes when it fails, I just try to run it again immediately afterwords
> and it works fine. Any idea on why that might be happening?
>
> Thanks,
> Ana