You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/04/22 06:44:48 UTC

[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/5625

    [SPARK-7044] [SQL] Fix the deadlock in script transformation

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark transform

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5625
    
----
commit 885685a9fbc80df18b76de996e3f9e568b7d5081
Author: Cheng Hao <ha...@intel.com>
Date:   2015-04-22T03:05:23Z

    fix the deadlock in transform

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95377815
  
    cc @marmbrus @yhuai , this is actually a critical bug introduced by #4014, the BigBench is blocked by this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95044410
  
      [Test build #30728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30728/consoleFull) for   PR 5625 at commit [`885685a`](https://github.com/apache/spark/commit/885685a9fbc80df18b76de996e3f9e568b7d5081).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95619459
  
      [Test build #30847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30847/consoleFull) for   PR 5625 at commit [`5ec1dd2`](https://github.com/apache/spark/commit/5ec1dd2c9cb74d4c7263fb31780893b3518e6958).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95666195
  
    Actually this doesn't merge cleanly into 1.3. Do you mind submitting a pull request for that branch? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5625#discussion_r28945099
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala ---
    @@ -145,20 +147,27 @@ case class ScriptTransformation(
           val dataOutputStream = new DataOutputStream(outputStream)
           val outputProjection = new InterpretedProjection(input, child.output)
     
    -      iter
    -        .map(outputProjection)
    -        .foreach { row =>
    +      // Put the write(output to the pipeline) into a single thread
    +      // and keep the collector as remain in the main thread.
    +      // otherwise it will causes deadlock if the data size greater than
    +      // the pipeline / buffer capacity.
    +      future {
    --- End diff --
    
    should we just create a new thread to run this? seems wrong to use a global shared thread pool


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95650683
  
      [Test build #30847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30847/consoleFull) for   PR 5625 at commit [`5ec1dd2`](https://github.com/apache/spark/commit/5ec1dd2c9cb74d4c7263fb31780893b3518e6958).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5625#discussion_r28970625
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala ---
    @@ -145,20 +147,27 @@ case class ScriptTransformation(
           val dataOutputStream = new DataOutputStream(outputStream)
           val outputProjection = new InterpretedProjection(input, child.output)
     
    -      iter
    -        .map(outputProjection)
    -        .foreach { row =>
    +      // Put the write(output to the pipeline) into a single thread
    +      // and keep the collector as remain in the main thread.
    +      // otherwise it will causes deadlock if the data size greater than
    +      // the pipeline / buffer capacity.
    +      future {
    --- End diff --
    
    Thank you @rxin, I think you are right, updated!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95650694
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30847/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95665707
  
    Merging in master & branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95033198
  
      [Test build #30728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30728/consoleFull) for   PR 5625 at commit [`885685a`](https://github.com/apache/spark/commit/885685a9fbc80df18b76de996e3f9e568b7d5081).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95765603
  
    Thank you @rxin. I've created another PR #5671 for backporting to 1.3, let's wait for the testing result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-7044] [SQL] Fix the deadlock in script ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5625#issuecomment-95044427
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30728/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org