You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2017/05/24 21:53:52 UTC

[GitHub] spark pull request #18099: [SPARK-18406][CORE][Backport-2.1] Race between en...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/18099

    [SPARK-18406][CORE][Backport-2.1] Race between end-of-task and completion iterator read lock release

    This is a backport PR of  #18076 to 2.1.
    
    ## What changes were proposed in this pull request?
    
    When a TaskContext is not propagated properly to all child threads for the task, just like the reported cases in this issue, we fail to get to TID from TaskContext and that causes unable to release the lock and assertion failures. To resolve this, we have to explicitly pass the TID value to the `unlock` method.
    
    ## How was this patch tested?
    
    Add new failing regression test case in `RDDSuite`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark completion-iterator-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18099.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18099
    
----
commit aa59b1bceeb5e87ab5fdf04a936fe5001460a20b
Author: Xingbo Jiang <xi...@databricks.com>
Date:   2017-05-24T07:43:23Z

    [SPARK-18406][CORE] Race between end-of-task and completion iterator read lock release
    
    When a TaskContext is not propagated properly to all child threads for the task, just like the reported cases in this issue, we fail to get to TID from TaskContext and that causes unable to release the lock and assertion failures. To resolve this, we have to explicitly pass the TID value to the `unlock` method.
    
    Add new failing regression test case in `RDDSuite`.
    
    Author: Xingbo Jiang <xi...@databricks.com>
    
    Closes #18076 from jiangxb1987/completion-iterator.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18099: [SPARK-18406][CORE][Backport-2.1] Race between en...

Posted by ConeyLiu <gi...@git.apache.org>.
Github user ConeyLiu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18099#discussion_r118394510
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -454,14 +454,20 @@ private[spark] class BlockManager(
           case Some(info) =>
             val level = info.level
             logDebug(s"Level for block $blockId is $level")
    +        val taskAttemptId = Option(TaskContext.get()).map(_.taskAttemptId())
    --- End diff --
    
    Hi, @jiangxb1987. Looks like the same way to get `taskAttemptId` as `BlockInfoManger`. What's the difference to get the `taskAttemptId` instead of `BlockInfoManager.currentTaskAttemptId` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18099: [SPARK-18406][CORE][Backport-2.1] Race between en...

Posted by ConeyLiu <gi...@git.apache.org>.
Github user ConeyLiu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18099#discussion_r118396798
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -454,14 +454,20 @@ private[spark] class BlockManager(
           case Some(info) =>
             val level = info.level
             logDebug(s"Level for block $blockId is $level")
    +        val taskAttemptId = Option(TaskContext.get()).map(_.taskAttemptId())
    --- End diff --
    
    Got it, thanks for your explanation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    thanks, merging to 2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    **[Test build #77311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77311/testReport)** for PR 18099 at commit [`aa59b1b`](https://github.com/apache/spark/commit/aa59b1bceeb5e87ab5fdf04a936fe5001460a20b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18099: [SPARK-18406][CORE][Backport-2.1] Race between en...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18099#discussion_r118396407
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -454,14 +454,20 @@ private[spark] class BlockManager(
           case Some(info) =>
             val level = info.level
             logDebug(s"Level for block $blockId is $level")
    +        val taskAttemptId = Option(TaskContext.get()).map(_.taskAttemptId())
    --- End diff --
    
    This is getting the `taskAttemptId` from the main thread, in case a TaskContext is not propagated properly to all child threads for the task, we would fail in getting the `taskAttemptId` in `BlockInfoManager`, see the test case added in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77311/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    same issue in spark 2.2.1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18099: [SPARK-18406][CORE][Backport-2.1] Race between en...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 closed the pull request at:

    https://github.com/apache/spark/pull/18099


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by appleyuchi <gi...@git.apache.org>.
Github user appleyuchi commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    it this fix available to spark2.3.1?
    thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by appleyuchi <gi...@git.apache.org>.
Github user appleyuchi commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    the following occur to me when I run lab with ALS in spark
    
    8/08/22 21:24:14 ERROR Utils: Uncaught exception in thread stdout writer for python
    j**ava.lang.AssertionError: assertion failed: Block rdd_7_0 is not locked for reading**
    	at scala.Predef$.assert(Predef.scala:170)
    	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
    	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769)
    	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540)
    	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44)
    	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33)
    	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213)
    	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407)
    	at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215)
    	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991)
    	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170)
    Exception in thread "stdout writer for python" java.lang.AssertionError: assertion failed: Block rdd_7_0 is not locked for reading
    	at scala.Predef$.assert(Predef.scala:170)
    	at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
    	at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769)
    	at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540)
    	at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44)
    	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33)
    	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213)
    	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407)
    	at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215)
    	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991)
    	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    **[Test build #77311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77311/testReport)** for PR 18099 at commit [`aa59b1b`](https://github.com/apache/spark/commit/aa59b1bceeb5e87ab5fdf04a936fe5001460a20b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18099
  
    cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org