You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/08/20 03:21:45 UTC

[jira] [Assigned] (SPARK-10125) Fix a potential deadlock in JobGenerator.stop

     [ https://issues.apache.org/jira/browse/SPARK-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-10125:
------------------------------------

    Assignee:     (was: Apache Spark)

> Fix a potential deadlock in JobGenerator.stop
> ---------------------------------------------
>
>                 Key: SPARK-10125
>                 URL: https://issues.apache.org/jira/browse/SPARK-10125
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Shixiong Zhu
>
> Because `lazy val` uses `this` lock, if JobGenerator.stop and JobGenerator.doCheckpoint (JobGenerator.shouldCheckpoint has not yet been initialized) run at the same time, it may hang.
> Here are the stack traces for the deadlock:
> {code}
> "pool-1-thread-1-ScalaTest-running-StreamingListenerSuite" #11 prio=5 os_prio=31 tid=0x00007fd35d094800 nid=0x5703 in Object.wait() [0x000000012ecaf000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Thread.join(Thread.java:1245)
>         - locked <0x00000007b5d8d7f8> (a org.apache.spark.util.EventLoop$$anon$1)
>         at java.lang.Thread.join(Thread.java:1319)
>         at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
>         at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:155)
>         - locked <0x00000007b5d8cea0> (a org.apache.spark.streaming.scheduler.JobGenerator)
>         at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:95)
>         - locked <0x00000007b5d8ced8> (a org.apache.spark.streaming.scheduler.JobScheduler)
>         at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:687)
> "JobGenerator" #67 daemon prio=5 os_prio=31 tid=0x00007fd35c3b9800 nid=0x9f03 waiting for monitor entry [0x0000000139e4a000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint$lzycompute(JobGenerator.scala:63)
>         - waiting to lock <0x00000007b5d8cea0> (a org.apache.spark.streaming.scheduler.JobGenerator)
>         at org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint(JobGenerator.scala:63)
>         at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:290)
>         at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
>         at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:83)
>         at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:82)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> I can use this patch to produce this deadlock: https://github.com/zsxwing/spark/commit/8a88f28d1331003a65fabef48ae3d22a7c21f05f
> And a timeout build in Jenkins due to this deadlock: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1654/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org