You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/11/10 21:12:58 UTC

[jira] [Updated] (SPARK-18406) Race between end-of-task and completion iterator read lock release

     [ https://issues.apache.org/jira/browse/SPARK-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-18406:
-------------------------------
    Description: 
The following log comes from a production streaming job where executors periodically die due to uncaught exceptions during block release:


{code}
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7921
16/11/07 17:11:06 INFO Executor: Running task 0.0 in stage 2390.0 (TID 7921)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7922
16/11/07 17:11:06 INFO Executor: Running task 1.0 in stage 2390.0 (TID 7922)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7923
16/11/07 17:11:06 INFO Executor: Running task 2.0 in stage 2390.0 (TID 7923)
16/11/07 17:11:06 INFO TorrentBroadcast: Started reading broadcast variable 2721
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7924
16/11/07 17:11:06 INFO Executor: Running task 3.0 in stage 2390.0 (TID 7924)
16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721_piece0 stored as bytes in memory (estimated size 5.0 KB, free 4.9 GB)
16/11/07 17:11:06 INFO TorrentBroadcast: Reading broadcast variable 2721 took 3 ms
16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721 stored as values in memory (estimated size 9.4 KB, free 4.9 GB)
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_3 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_2 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_4 locally
16/11/07 17:11:06 INFO PythonRunner: Times: total = 2, boot = -566, init = 567, finish = 1
16/11/07 17:11:06 INFO PythonRunner: Times: total = 7, boot = -540, init = 541, finish = 6
16/11/07 17:11:06 INFO Executor: Finished task 2.0 in stage 2390.0 (TID 7923). 1429 bytes result sent to driver
16/11/07 17:11:06 INFO PythonRunner: Times: total = 8, boot = -532, init = 533, finish = 7
16/11/07 17:11:06 INFO Executor: Finished task 3.0 in stage 2390.0 (TID 7924). 1429 bytes result sent to driver
16/11/07 17:11:06 ERROR Executor: Exception in task 0.0 in stage 2390.0 (TID 7921)
java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:165)
	at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
	at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356)
	at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7925
16/11/07 17:11:06 INFO Executor: Running task 0.1 in stage 2390.0 (TID 7925)
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
16/11/07 17:11:06 INFO PythonRunner: Times: total = 41, boot = -536, init = 576, finish = 1
16/11/07 17:11:06 INFO Executor: Finished task 1.0 in stage 2390.0 (TID 7922). 1429 bytes result sent to driver
16/11/07 17:11:06 ERROR Utils: Uncaught exception in thread stdout writer for /databricks/python/bin/python
java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
16/11/07 17:11:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /databricks/python/bin/python,5,main]
java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala
{code}

I think that there's some sort of internal race condition between a task finishing (TID 7921) and automatically releasing locks and between some "automatically release locks on hitting the end of an iterator" logic running in a separate thread. The log above came from a production streaming job where executors periodically died with this type of error.

  was:
{code}
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7921
16/11/07 17:11:06 INFO Executor: Running task 0.0 in stage 2390.0 (TID 7921)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7922
16/11/07 17:11:06 INFO Executor: Running task 1.0 in stage 2390.0 (TID 7922)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7923
16/11/07 17:11:06 INFO Executor: Running task 2.0 in stage 2390.0 (TID 7923)
16/11/07 17:11:06 INFO TorrentBroadcast: Started reading broadcast variable 2721
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7924
16/11/07 17:11:06 INFO Executor: Running task 3.0 in stage 2390.0 (TID 7924)
16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721_piece0 stored as bytes in memory (estimated size 5.0 KB, free 4.9 GB)
16/11/07 17:11:06 INFO TorrentBroadcast: Reading broadcast variable 2721 took 3 ms
16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721 stored as values in memory (estimated size 9.4 KB, free 4.9 GB)
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_3 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_2 locally
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_4 locally
16/11/07 17:11:06 INFO PythonRunner: Times: total = 2, boot = -566, init = 567, finish = 1
16/11/07 17:11:06 INFO PythonRunner: Times: total = 7, boot = -540, init = 541, finish = 6
16/11/07 17:11:06 INFO Executor: Finished task 2.0 in stage 2390.0 (TID 7923). 1429 bytes result sent to driver
16/11/07 17:11:06 INFO PythonRunner: Times: total = 8, boot = -532, init = 533, finish = 7
16/11/07 17:11:06 INFO Executor: Finished task 3.0 in stage 2390.0 (TID 7924). 1429 bytes result sent to driver
16/11/07 17:11:06 ERROR Executor: Exception in task 0.0 in stage 2390.0 (TID 7921)
java.lang.AssertionError: assertion failed
	at scala.Predef$.assert(Predef.scala:165)
	at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
	at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356)
	at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7925
16/11/07 17:11:06 INFO Executor: Running task 0.1 in stage 2390.0 (TID 7925)
16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
16/11/07 17:11:06 INFO PythonRunner: Times: total = 41, boot = -536, init = 576, finish = 1
16/11/07 17:11:06 INFO Executor: Finished task 1.0 in stage 2390.0 (TID 7922). 1429 bytes result sent to driver
16/11/07 17:11:06 ERROR Utils: Uncaught exception in thread stdout writer for /databricks/python/bin/python
java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
16/11/07 17:11:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /databricks/python/bin/python,5,main]
java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala
{code}

I think that there's some sort of internal race condition between a task finishing (TID 7921) and automatically releasing locks and between some "automatically release locks on hitting the end of an iterator" logic running in a separate thread. The log above came from a production streaming job where executors periodically died with this type of error.


> Race between end-of-task and completion iterator read lock release
> ------------------------------------------------------------------
>
>                 Key: SPARK-18406
>                 URL: https://issues.apache.org/jira/browse/SPARK-18406
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Josh Rosen
>
> The following log comes from a production streaming job where executors periodically die due to uncaught exceptions during block release:
> {code}
> 16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7921
> 16/11/07 17:11:06 INFO Executor: Running task 0.0 in stage 2390.0 (TID 7921)
> 16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7922
> 16/11/07 17:11:06 INFO Executor: Running task 1.0 in stage 2390.0 (TID 7922)
> 16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7923
> 16/11/07 17:11:06 INFO Executor: Running task 2.0 in stage 2390.0 (TID 7923)
> 16/11/07 17:11:06 INFO TorrentBroadcast: Started reading broadcast variable 2721
> 16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7924
> 16/11/07 17:11:06 INFO Executor: Running task 3.0 in stage 2390.0 (TID 7924)
> 16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721_piece0 stored as bytes in memory (estimated size 5.0 KB, free 4.9 GB)
> 16/11/07 17:11:06 INFO TorrentBroadcast: Reading broadcast variable 2721 took 3 ms
> 16/11/07 17:11:06 INFO MemoryStore: Block broadcast_2721 stored as values in memory (estimated size 9.4 KB, free 4.9 GB)
> 16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
> 16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_3 locally
> 16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_2 locally
> 16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_4 locally
> 16/11/07 17:11:06 INFO PythonRunner: Times: total = 2, boot = -566, init = 567, finish = 1
> 16/11/07 17:11:06 INFO PythonRunner: Times: total = 7, boot = -540, init = 541, finish = 6
> 16/11/07 17:11:06 INFO Executor: Finished task 2.0 in stage 2390.0 (TID 7923). 1429 bytes result sent to driver
> 16/11/07 17:11:06 INFO PythonRunner: Times: total = 8, boot = -532, init = 533, finish = 7
> 16/11/07 17:11:06 INFO Executor: Finished task 3.0 in stage 2390.0 (TID 7924). 1429 bytes result sent to driver
> 16/11/07 17:11:06 ERROR Executor: Exception in task 0.0 in stage 2390.0 (TID 7921)
> java.lang.AssertionError: assertion failed
> 	at scala.Predef$.assert(Predef.scala:165)
> 	at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
> 	at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
> 	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362)
> 	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
> 	at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> 	at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356)
> 	at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 16/11/07 17:11:06 INFO CoarseGrainedExecutorBackend: Got assigned task 7925
> 16/11/07 17:11:06 INFO Executor: Running task 0.1 in stage 2390.0 (TID 7925)
> 16/11/07 17:11:06 INFO BlockManager: Found block rdd_2741_1 locally
> 16/11/07 17:11:06 INFO PythonRunner: Times: total = 41, boot = -536, init = 576, finish = 1
> 16/11/07 17:11:06 INFO Executor: Finished task 1.0 in stage 2390.0 (TID 7922). 1429 bytes result sent to driver
> 16/11/07 17:11:06 ERROR Utils: Uncaught exception in thread stdout writer for /databricks/python/bin/python
> java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
> 	at scala.Predef$.assert(Predef.scala:179)
> 	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
> 	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
> 	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
> 	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
> 	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
> 	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
> 	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
> 	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
> 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
> 	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
> 16/11/07 17:11:06 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /databricks/python/bin/python,5,main]
> java.lang.AssertionError: assertion failed: Block rdd_2741_1 is not locked for reading
> 	at scala.Predef$.assert(Predef.scala:179)
> 	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:294)
> 	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:630)
> 	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:434)
> 	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
> 	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
> 	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
> 	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:504)
> 	at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:328)
> 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1882)
> 	at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala
> {code}
> I think that there's some sort of internal race condition between a task finishing (TID 7921) and automatically releasing locks and between some "automatically release locks on hitting the end of an iterator" logic running in a separate thread. The log above came from a production streaming job where executors periodically died with this type of error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org