You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2015/02/06 15:22:51 UTC

[jira] [Commented] (FLINK-1492) Exceptions on shutdown concerning BLOB store cleanup

    [ https://issues.apache.org/jira/browse/FLINK-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309212#comment-14309212 ] 

Stephan Ewen commented on FLINK-1492:
-------------------------------------

The current solution is a bit hacky. Right now, we see multiple cleanups I think one is on graceful shutdown of the task manager (through observation of job manager death) and then through the shutdown hook.

I think the right solution is to not simply let the shutdown hook delete the directory, but to have the shutdown hook trigger call a "shutdown" on the Blob manager.
The shutdown should also make sure it occurs only once, so it does not happen through both the task manager shutdown, and the shutdown hook.

It is also good practice that the blob manager should remove the shutdown hook once shutdown is called, to prevent resource leaks.


> Exceptions on shutdown concerning BLOB store cleanup
> ----------------------------------------------------
>
>                 Key: FLINK-1492
>                 URL: https://issues.apache.org/jira/browse/FLINK-1492
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>            Assignee: Ufuk Celebi
>             Fix For: 0.9
>
>
> The following stack traces occur not every time, but frequently.
> {code}
> java.lang.IllegalArgumentException: /tmp/blobStore-7a89856a-47f9-45d6-b88b-981a3eff1982 does not exist
> 	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
> 	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> 	at org.apache.flink.runtime.blob.BlobServer.shutdown(BlobServer.java:213)
> 	at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.shutdown(BlobLibraryCacheManager.java:171)
> 	at org.apache.flink.runtime.jobmanager.JobManager.postStop(JobManager.scala:136)
> 	at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
> 	at org.apache.flink.runtime.jobmanager.JobManager.aroundPostStop(JobManager.scala:80)
> 	at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
> 	at akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:292)
> 	at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:369)
> 	at akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:63)
> 	at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:369)
> 	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:455)
> 	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> 	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> 	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:16:15,350 ERROR org.apache.flink.test.util.ForkableFlinkMiniCluster$$anonfun$startTaskManager$1$$anon$1  - LibraryCacheManager did not shutdown properly.
> java.io.IOException: Unable to delete file: /tmp/blobStore-e2619536-fb7c-452a-8639-487a074d1582/cache/blob_ff74895f7bdeeaa3bd70b6932beed143048bb4c7
> 	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2279)
> 	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
> 	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> 	at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
> 	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
> 	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> 	at org.apache.flink.runtime.blob.BlobCache.shutdown(BlobCache.java:159)
> 	at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager.shutdown(BlobLibraryCacheManager.java:171)
> 	at org.apache.flink.runtime.taskmanager.TaskManager.postStop(TaskManager.scala:173)
> 	at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
> 	at org.apache.flink.runtime.taskmanager.TaskManager.aroundPostStop(TaskManager.scala:86)
> 	at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
> 	at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
> 	at akka.actor.ActorCell.terminate(ActorCell.scala:369)
> 	at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
> 	at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
> 	at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> 	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 15:16:15,345 ERROR org.apache.flink.runtime.blob.BlobCache                       - Error deleting directory /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 during JVM shutdown: /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 does not exist
> java.lang.IllegalArgumentException: /tmp/blobStore-4313349e-8a58-4683-9fd0-3d2c52be1864 does not exist
> 	at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1637)
> 	at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
> 	at org.apache.flink.runtime.blob.BlobUtils$1.run(BlobUtils.java:210)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)