You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/04/26 23:24:13 UTC

[jira] [Commented] (SPARK-14930) Race condition in CheckpointWriter.stop()

    [ https://issues.apache.org/jira/browse/SPARK-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258954#comment-15258954 ] 

Apache Spark commented on SPARK-14930:
--------------------------------------

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/12712

> Race condition in CheckpointWriter.stop()
> -----------------------------------------
>
>                 Key: SPARK-14930
>                 URL: https://issues.apache.org/jira/browse/SPARK-14930
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> {{CheckpointWriter.stop()}} is prone to a race condition where the writer thread becomes blocked by the stop caller when trying to access the {{fs}}:
> {code}
> "pool-31-thread-1" #156 prio=5 os_prio=31 tid=0x00007fea02cd2000 nid=0x5c0b waiting for monitor entry [0x000000013bc4c000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at org.apache.spark.streaming.CheckpointWriter.org$apache$spark$streaming$CheckpointWriter$$fs(Checkpoint.scala:302)
>     - waiting to lock <0x00000007bf53ee78> (a org.apache.spark.streaming.CheckpointWriter)
>     at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:224)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> "pool-1-thread-1-ScalaTest-running-MapWithStateSuite" #11 prio=5 os_prio=31 tid=0x00007fe9ff879800 nid=0x5703 waiting on condition [0x000000012e54c000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x00000007bf564568> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>     at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>     at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
>     at org.apache.spark.streaming.CheckpointWriter.stop(Checkpoint.scala:291)
>     - locked <0x00000007bf53ee78> (a org.apache.spark.streaming.CheckpointWriter)
>     at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:159)
>     - locked <0x00000007bf53ea90> (a org.apache.spark.streaming.scheduler.JobGenerator)
>     at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:115)
>     - locked <0x00000007bf53d3f0> (a org.apache.spark.streaming.scheduler.JobScheduler)
>     at org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:680)
>     at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1219)
>     at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:679)
>     - locked <0x00000007bf516a70> (a org.apache.spark.streaming.StreamingContext)
>     at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:644)
>     - locked <0x00000007bf516a70> (a org.apache.spark.streaming.StreamingContext)
>     at org.apache.spark.streaming.MapWithStateSuite.org$apache$spark$streaming$MapWithStateSuite$$getOperationOutput(MapWithStateSuite.scala:570)
>     at org.apache.spark.streaming.MapWithStateSuite.org$apache$spark$streaming$MapWithStateSuite$$testOperation(MapWithStateSuite.scala:539)
>     at org.apache.spark.streaming.MapWithStateSuite$$anonfun$18.apply$mcV$sp(MapWithStateSuite.scala:407)
>     at org.apache.spark.streaming.MapWithStateSuite$$anonfun$18.apply(MapWithStateSuite.scala:359)
>     at org.apache.spark.streaming.MapWithStateSuite$$anonfun$18.apply(MapWithStateSuite.scala:359)
>     [...]
> {code}
> This leads to test flakiness (SPARK-13693) and significantly slows down tests (it makes MapWithStateSuite 10 times slower). We should fix this by refactoring the code to remove unnecessary synchronization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org