You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liutang123 <gi...@git.apache.org> on 2018/04/26 08:58:02 UTC

[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...

GitHub user liutang123 opened a pull request:

    https://github.com/apache/spark/pull/21164

    [SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish

    When feed thread doesn't set its _exception variable and the progress doesn't exit completely, output Iterator will return false in hasNext function.
    
    ## What changes were proposed in this pull request?
    wait script process exiting before output iterator finish.
    
    ## How was this patch tested?
    manual test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liutang123/spark SPARK-24098

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21164
    
----
commit a77ac06180cbecc28383f3b73e05259502896613
Author: liutang123 <li...@...>
Date:   2018-04-26T08:54:44Z

    [SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    @rxin hi, Do you have time to look at this PR? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21164#discussion_r185693669
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala ---
    @@ -137,13 +137,12 @@ case class ScriptTransformationExec(
                 throw writerThread.exception.get
               }
     
    -          if (!proc.isAlive) {
    -            val exitCode = proc.exitValue()
    -            if (exitCode != 0) {
    -              logError(stderrBuffer.toString) // log the stderr circular buffer
    -              throw new SparkException(s"Subprocess exited with status $exitCode. " +
    -                s"Error: ${stderrBuffer.toString}", cause)
    -            }
    +          proc.waitFor()
    --- End diff --
    
    Although writerThread._exception is a volatile variable, some times writerThread.exception function may be called before writerThread._exception's assignment due to readThread and writerThread are working parallel.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    **[Test build #91616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91616/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    **[Test build #91616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91616/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    **[Test build #97563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97563/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21164#discussion_r185511245
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala ---
    @@ -137,13 +137,12 @@ case class ScriptTransformationExec(
                 throw writerThread.exception.get
               }
     
    -          if (!proc.isAlive) {
    -            val exitCode = proc.exitValue()
    -            if (exitCode != 0) {
    -              logError(stderrBuffer.toString) // log the stderr circular buffer
    -              throw new SparkException(s"Subprocess exited with status $exitCode. " +
    -                s"Error: ${stderrBuffer.toString}", cause)
    -            }
    +          proc.waitFor()
    --- End diff --
    
    basically you are moving the `proc.waitFor()` to after `if (writerThread.exception.isDefined) ...`, why is it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Bug Reappearance:
    1. Add Thread.sleep(1000 * 600) before assign for _exception.
    2. structure a python script witch will throw exception like follow:
    test.py
    ```import sys 
    for line in sys.stdin:   
      raise Exception('error') 
      print line
    ```
    3. use script created in step 2 in transform.
    ```spark-sql>add files /path_to/test.py;```
    ```spark-sql>select transform(id) using 'python test.py' as id from city;```
    The result is that spark will end successfully.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    **[Test build #97563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97563/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    @liutang123  cc @cloud-fan @gatorsmile 
    
    I also encountered this problem.
    
    ![image](https://user-images.githubusercontent.com/3898450/40981493-9b0b0d46-690d-11e8-8607-c14756610d59.png)
    python:
    ```python
    import sys
    for line in sys.stdin:
        print 1/0
    ```
    sql:
    ```sql
    ADD FILE test.py;
    SELECT TRANSFORM(1) USING 'python test.py'
    AS (c1)
    ```
    
    I solved it this way:  *writerThread.join()*
    ```java
    if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
        writerThread.join()
        checkFailureAndPropagate()
        return false
    }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    @cloud-fan hi, fan, do you have time to see this pr? I think this is a critical bug.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    cc @dongjoon-hyun @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91616/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97563/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...

Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:

    https://github.com/apache/spark/pull/21164
  
    @gatorsmile Could you please give some comments when you have time? Thanks so much.
    In addition, I think this is a critical bug!!!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org