You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2014/07/10 12:16:05 UTC

[jira] [Updated] (FLINK-819) OutOfMemoryError from TaskManager is causing hard to understand exceptions and blocking JobManager

     [ https://issues.apache.org/jira/browse/FLINK-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ufuk Celebi updated FLINK-819:
------------------------------

    Issue Type: Improvement  (was: Bug)

> OutOfMemoryError from TaskManager is causing hard to understand exceptions and blocking JobManager
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-819
>                 URL: https://issues.apache.org/jira/browse/FLINK-819
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> While doing some pre 0.5 release testing, I saw this exception twice in the JobManager's log.
> It occured during the setup of a new job.
> (Here is the full log: https://gist.github.com/rmetzger/1a9ed4080eedb4e0c8f1)
> It also seems that the task cancellation of the job does not work. The jobmanager does not print any output for more than 15 minutes now. But I think this is a known issue. Pressing "Cancel" does work, also in this situation.
> ```
> 13:42:00,512 ERROR eu.stratosphere.nephele.jobmanager.JobManager                 - Cannot check library availability: java.io.IOException: Call to /192.168.7.12:38350 failed on local exception: java.io.EOFException
> 	at eu.stratosphere.nephele.ipc.Client.wrapException(Client.java:737)
> 	at eu.stratosphere.nephele.ipc.Client.call(Client.java:706)
> 	at eu.stratosphere.nephele.ipc.RPC$Invoker.invoke(RPC.java:250)
> 	at com.sun.proxy.$Proxy13.updateLibraryCache(Unknown Source)
> 	at eu.stratosphere.nephele.instance.AbstractInstance.checkLibraryAvailability(AbstractInstance.java:174)
> 	at eu.stratosphere.nephele.jobmanager.JobManager$7.run(JobManager.java:1094)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:701)
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:392)
> 	at eu.stratosphere.nephele.ipc.Client$Connection.receiveResponse(Client.java:497)
> 	at eu.stratosphere.nephele.ipc.Client$Connection.run(Client.java:443)
> ```
> Machine 192.168.7.12 has locally the following exception:
> ```
> 13:40:53,120 INFO  eu.stratosphere.nephele.execution.ExecutionStateTransition    - TM: ExecutionState set from FINISHING to FINISHED for task Reduce(<Unnamed Reducer>) (8/8)
> 13:42:00,118 WARN  eu.stratosphere.nephele.ipc.Server                            - Out of Memory in server select
> java.lang.OutOfMemoryError: Java heap space
>         at eu.stratosphere.nephele.execution.librarycache.LibraryCacheManager.readLibraryFromStreamInternal(LibraryCac
> heManager.java:582)
>         at eu.stratosphere.nephele.execution.librarycache.LibraryCacheManager.readLibraryFromStream(LibraryCacheManage
> r.java:556)
>         at eu.stratosphere.nephele.execution.librarycache.LibraryCacheUpdate.read(LibraryCacheUpdate.java:53)
>         at eu.stratosphere.nephele.ipc.RPC$Invocation.read(RPC.java:136)
>         at eu.stratosphere.nephele.ipc.Server$Connection.processData(Server.java:897)
>         at eu.stratosphere.nephele.ipc.Server$Connection.readAndProcess(Server.java:858)
>         at eu.stratosphere.nephele.ipc.Server$Listener.doRead(Server.java:450)
>         at eu.stratosphere.nephele.ipc.Server$Listener.run(Server.java:353)
> 13:43:00,180 INFO  eu.stratosphere.nephele.ipc.Server                            - IPC Server listener on 38350: readAndProcess threw exception java.io.IOException: [dd5ba55f25851ac2f9f0d53971ed92f70cee2afc|https://github.com/stratosphere/stratosphere/commit/dd5ba55f25851ac2f9f0d53971ed92f70cee2afc].jar does not exist in the library cache. Count of bytes read: 0
> java.io.IOException: [dd5ba55f25851ac2f9f0d53971ed92f70cee2afc|https://github.com/stratosphere/stratosphere/commit/dd5ba55f25851ac2f9f0d53971ed92f70cee2afc].jar does not exist in the library cache
>         at eu.stratosphere.nephele.execution.librarycache.LibraryCacheManager.registerInternal(LibraryCacheManager.java:316)
>         at eu.stratosphere.nephele.execution.librarycache.LibraryCacheManager.register(LibraryCacheManager.java:277)
>         at eu.stratosphere.nephele.deployment.TaskDeploymentDescriptor.read(TaskDeploymentDescriptor.java:240)
>         at eu.stratosphere.nephele.util.SerializableArrayList.read(SerializableArrayList.java:100)
>         at eu.stratosphere.nephele.ipc.RPC$Invocation.read(RPC.java:136)
>         at eu.stratosphere.nephele.ipc.Server$Connection.processData(Server.java:897)
>         at eu.stratosphere.nephele.ipc.Server$Connection.readAndProcess(Server.java:858)
>         at eu.stratosphere.nephele.ipc.Server$Listener.doRead(Server.java:450)
>         at eu.stratosphere.nephele.ipc.Server$Listener.run(Server.java:353)
> ```
> How can we improve the user experience here?
> (I have to admit, the TaskManager has only 512 MB heapspace)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/819
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: bug, question, runtime, 
> Created at: Thu May 15 16:04:17 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)