You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/27 14:27:19 UTC

[GitHub] [spark] slamke opened a new pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

slamke opened a new pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724
 
 
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR try to fix a bug in `org.apache.spark.sql.hive.execution.ScriptTransformationExec`. This bug appears in our online cluster.  `ScriptTransformationExec` should throw an exception, when user uses a python script which contains parse error.  But current implementation may miss this case of failure.
   
   
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   When user uses a python script which contains a parse error, there will be no output. So  `scriptOutputReader.next(scriptOutputWritable) <= 0` matches, then we use `checkFailureAndPropagate()` to check the `proc`.  But the `proc` may still be alive and `writerThread.exception` is not defined,  `checkFailureAndPropagate` cannot check this case of failure.  In the end, the Spark SQL job runs successfully and returns no result. In fact, the SparK SQL job should fails and shows the exception properly.
   
   For example, the error python script is blow.
   ``` python
   # encoding: utf8
   import unknow_module
   import sys
   
   for line in sys.stdin:
       print line
   ``` 
   The bug can be reproduced by running the following code in our cluter.
   ```
   spark.range(100*100).toDF("index").createOrReplaceTempView("test")
   spark.sql("select TRANSFORM(index) USING 'python error_python.py' as new_index from test").collect.foreach(println)
   ```
   
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Existing UT

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612890710
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121185/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966090
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613544279
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121281/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000965
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-592401223
 
 
   Your demo can not reproduce stably. Actually, I never reproduce the issue using the demo. But I see what's the problem around here and I realize it's hard to reproduce it externally without hack inside.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407360813
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +162,25 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  // There can be a lag between reader read EOF and the process termination.
+                  // If the script fails to startup, this kind of error may be missed.
+                  // So explicitly waiting for the process termination.
+                  val timeout = SparkEnv.get.conf.get("spark.sql.transformation.exit.timeout",
+                    "1000").toInt
 
 Review comment:
   fixed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613246184
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612887293
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25898/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408147540
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   Sure,  updated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408816429
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
 
 Review comment:
   This corner case does not happen every time,  so that I add this loop to reproduce this problem.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r406693755
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  proc.waitFor()
 
 Review comment:
   Yes, it is a good idea. I have added timeout conf in `SQLConf`. Please have a review.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612889298
 
 
   is https://github.com/apache/spark/pull/27724#issuecomment-607628004 a valid UT for this fix?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612886949
 
 
   **[Test build #121210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121210/testReport)** for PR 27724 at commit [`c997f50`](https://github.com/apache/spark/commit/c997f501a8a891a8b0fd5a6d3e8bf8eb1c103f08).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613246194
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25943/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607867530
 
 
   **[Test build #120726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)** for PR 27724 at commit [`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-592000975
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614015385
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26008/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614015385
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26008/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611971257
 
 
   **[Test build #121078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121078/testReport)** for PR 27724 at commit [`aab62cb`](https://github.com/apache/spark/commit/aab62cb71bb40eaad92f1b2d52858c088163b64d).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611971284
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121078/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614014686
 
 
   **[Test build #121325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121325/testReport)** for PR 27724 at commit [`6a44091`](https://github.com/apache/spark/commit/6a440913d4ac434c6cc32e02bfcd2f93f7116dc0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613246194
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25943/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965569
 
 
   **[Test build #121077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121077/testReport)** for PR 27724 at commit [`9233adc`](https://github.com/apache/spark/commit/9233adc7d6f74b419f59e8498e276e99d665a60c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791841
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r402335000
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  proc.waitFor()
 
 Review comment:
   shall we have a timeout here? just in case the process hangs.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407945243
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   I feel that the logical here is too complex. It really does not make sense to allow user to set 0 timeout which could introduce the potential bug again. Also negative timeout may cause app hang, which breaks our initial motivation to introduce this conf (which is to avoidi app hang).
   
   Thus, I think we should only allow timeout > 0 and add `checkValue(timeout > 0)` to the conf and simplify the code here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791400
 
 
   **[Test build #121185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121185/testReport)** for PR 27724 at commit [`a5628ef`](https://github.com/apache/spark/commit/a5628efb0c95c1b6414f97633cc4b0f3dab7bb52).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612886949
 
 
   **[Test build #121210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121210/testReport)** for PR 27724 at commit [`c997f50`](https://github.com/apache/spark/commit/c997f501a8a891a8b0fd5a6d3e8bf8eb1c103f08).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r386969280
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
 
 Review comment:
   No, I find a way to reproduce. Just run this code in laptop for some times, it will show that when `inputStream` return -1, but the `proc` is alive. Actually, `ScriptTransformationExec ` also uses `inputStream` to check the end of stream.
   
   ``` scala 
   import scala.collection.JavaConverters._
   
   object SparkTEST {
     def main(args: Array[String]): Unit = {
       var i = 0
       for (i <- 0 until (50)) {
         check_script()
       }
     }
   
     def check_script() = {
       val script = "python error_python.py"
       val cmd = List("/bin/bash", "-c", script)
       val builder = new ProcessBuilder(cmd.asJava)
       val proc = builder.start()
       val inputStream = proc.getInputStream
       var msg = ""
       if (inputStream.read() <= 0) {
         msg = s"Cannot read inputStream and proc status is: ${if (proc.isAlive) "alive" else "dead"}"
       }
       println(msg)
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612118396
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407422355
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,12 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
+    .internal()
+    .doc("The timeout for executor to wait for transformation script exit when EOF.")
+    .timeConf(TimeUnit.MILLISECONDS)
 
 Review comment:
   are the users going to tune this config at the millisecond granularity? I don't think so. Maybe second is good enough here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612889688
 
 
   **[Test build #121185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121185/testReport)** for PR 27724 at commit [`a5628ef`](https://github.com/apache/spark/commit/a5628efb0c95c1b6414f97633cc4b0f3dab7bb52).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000619
 
 
   **[Test build #121085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121085/testReport)** for PR 27724 at commit [`10e7de8`](https://github.com/apache/spark/commit/10e7de8f491dbc637108853e34a2e2cc39200954).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407524163
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +163,24 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  // There can be a lag between reader read EOF and the process termination.
+                  // If the script fails to startup, this kind of error may be missed.
+                  // So explicitly waiting for the process termination.
+                  val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+                  try {
+                    if (timeout < 0) {
+                      proc.waitFor()
 
 Review comment:
   I think we can add `checkValue(timeout > 0)` to the conf and simplify the code here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612118405
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121085/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r409458999
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
+      assume(TestUtils.testCommandAvailable("/bin/bash"))
+
+      val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+      val e = intercept[SparkException] {
+        val plan =
+          new ScriptTransformationExec(
+            input = Seq(rowsDf.col("a").expr),
+            script = "some_non_existent_command",
 
 Review comment:
   > I tried some times in my Mac, this issue can not be reproduced by sleeping several seconds in the script. It is easy to reproduce this issue by this way [#27724 (comment)](https://github.com/apache/spark/pull/27724#issuecomment-607628004). So I add an UT by this code.
   
   You should notice that my UT is is without serde, if you want to add UT, I think you'd better add two type UT (with serde / without serde)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965569
 
 
   **[Test build #121077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121077/testReport)** for PR 27724 at commit [`9233adc`](https://github.com/apache/spark/commit/9233adc7d6f74b419f59e8498e276e99d665a60c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-592001890
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r385557034
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
 
 Review comment:
   Does `<=0` always implies a terminated or terminating process?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614014686
 
 
   **[Test build #121325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121325/testReport)** for PR 27724 at commit [`6a44091`](https://github.com/apache/spark/commit/6a440913d4ac434c6cc32e02bfcd2f93f7116dc0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-592001890
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612890703
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407268637
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,11 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
+    .doc("The timeout for executor to wait for transformation script exit when EOF in millisecond.")
+    .intConf
 
 Review comment:
   Use `.timeConf(TimeUnit.MILLISECONDS)`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614015373
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613245095
 
 
   > is [#27724 (comment)](https://github.com/apache/spark/pull/27724#issuecomment-607628004) a valid UT for this fix?
   
   Yes, I have added this UT.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407945243
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   I feel that the logical here is too complex. It really does not make sense to allow user to set 0 timeout which could introduce the potential bug again. Also negative timeout may cause app hang, which breaks our initial motivation to introduce this conf (which is to avoid app hang).
   
   Thus, I think we should consider timeout > 0 case only and add `checkValue(timeout > 0)` to the conf and simplify the code here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966631
 
 
   ok to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r388819267
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
 
 Review comment:
   The process not hangs, but just in a terminating state and alive. It is really hard to reproduce in local mode, but it happened some times in our online cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r409457840
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
+      assume(TestUtils.testCommandAvailable("/bin/bash"))
 
 Review comment:
   I find other UT cases do not have this condition, so I write the same code here. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407269107
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +162,25 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  // There can be a lag between reader read EOF and the process termination.
+                  // If the script fails to startup, this kind of error may be missed.
+                  // So explicitly waiting for the process termination.
+                  val timeout = SparkEnv.get.conf.get("spark.sql.transformation.exit.timeout",
+                    "1000").toInt
 
 Review comment:
   We should use `SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)`. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r386969280
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
 
 Review comment:
   No, I find a way to reproduce. Just run this code in laptop for some times, it will show that when `inputStream` return -1, but the `proc` is alive. Actually, `ScriptTransformationExec ` also uses `inputStream` to check the end of stream.
   
   ``` scala 
   import scala.collection.JavaConverters._
   
   object SparkTEST {
     def main(args: Array[String]): Unit = {
       var i = 0
       for (i <- 0 until (50)) {
         check_script()
       }
     }
   
     def check_script() = {
       val script = "python /Users/buytedance/code/github/error_python.py"
       val cmd = List("/bin/bash", "-c", script)
       val builder = new ProcessBuilder(cmd.asJava)
       val proc = builder.start()
       val inputStream = proc.getInputStream
       var msg = ""
       if (inputStream.read() <= 0) {
         msg = s"Cannot read inputStream and proc status is: ${if (proc.isAlive) "alive" else "dead"}"
       }
       println(msg)
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613544267
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611967933
 
 
   **[Test build #121078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121078/testReport)** for PR 27724 at commit [`aab62cb`](https://github.com/apache/spark/commit/aab62cb71bb40eaad92f1b2d52858c088163b64d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965980
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607863142
 
 
   ok to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607954854
 
 
   **[Test build #120726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)** for PR 27724 at commit [`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120726/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614171578
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121325/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407283036
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,11 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
+    .doc("The timeout for executor to wait for transformation script exit when EOF in millisecond.")
+    .intConf
 
 Review comment:
   Shall we mark it as an internal conf?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612118396
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965980
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611971284
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121078/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612117571
 
 
   **[Test build #121085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121085/testReport)** for PR 27724 at commit [`10e7de8`](https://github.com/apache/spark/commit/10e7de8f491dbc637108853e34a2e2cc39200954).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613263904
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121256/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613021919
 
 
   **[Test build #121210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121210/testReport)** for PR 27724 at commit [`c997f50`](https://github.com/apache/spark/commit/c997f501a8a891a8b0fd5a6d3e8bf8eb1c103f08).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607628004
 
 
   with below test, `noserde` will meet similar problem
   ```
   test("SPARK-30973 ScriptTransformationExec should wait for the termination") {
       (0 until 10).foreach { index =>
         assume(TestUtils.testCommandAvailable("/bin/bash"))
   
         val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
   
         val e = intercept[SparkException] {
           val plan =
             new ScriptTransformationExec(
               input = Seq(rowsDf.col("a").expr),
               script = "some_non_existent_command",
               output = Seq(AttributeReference("a", StringType)()),
               child = rowsDf.queryExecution.sparkPlan,
               ioschema = noSerdeIOSchema)
           SparkPlanTest.executePlan(plan, hiveContext)
         }
         assert(e.getMessage.contains("Subprocess exited with status"))
         assert(uncaughtExceptionHandler.exception.isEmpty)
       }
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000969
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25775/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408815768
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
+      assume(TestUtils.testCommandAvailable("/bin/bash"))
+
+      val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+      val e = intercept[SparkException] {
+        val plan =
+          new ScriptTransformationExec(
+            input = Seq(rowsDf.col("a").expr),
+            script = "some_non_existent_command",
 
 Review comment:
   Is it possible to sleep several seconds in the script so we can reproduce the issue stably?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613452429
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25967/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791848
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25872/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407360748
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,11 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
+    .doc("The timeout for executor to wait for transformation script exit when EOF in millisecond.")
+    .intConf
 
 Review comment:
   Thanks for your comments, code is fixed and resubmitted.   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607863963
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25424/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407958753
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613263778
 
 
   **[Test build #121256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121256/testReport)** for PR 27724 at commit [`cf2bcc2`](https://github.com/apache/spark/commit/cf2bcc2e7a123875ccace1438398e50d69b727d8).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966090
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407465104
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,12 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
 
 Review comment:
   Yes, it is a better name.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000619
 
 
   **[Test build #121085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121085/testReport)** for PR 27724 at commit [`10e7de8`](https://github.com/apache/spark/commit/10e7de8f491dbc637108853e34a2e2cc39200954).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613022800
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613543957
 
 
   **[Test build #121281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121281/testReport)** for PR 27724 at commit [`b48e63a`](https://github.com/apache/spark/commit/b48e63af6372b4c44c3d368d50190ad4ec3810f0).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408560803
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,26 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            proc.waitFor(timeout, TimeUnit.SECONDS)
+          } catch {
+            case t: Throwable =>
+              log.warn(s"Transformation script process exits timeout in ${timeout} seconds", t)
 
 Review comment:
   ```
        * @param timeout the maximum time to wait
        * @param unit the time unit of the {@code timeout} argument
        * @return {@code true} if the subprocess has exited and {@code false} if
        *         the waiting time elapsed before the subprocess has exited.
        * @throws InterruptedException if the current thread is interrupted
        *         while waiting.
        * @throws NullPointerException if unit is null
   ```
   
   It seems that timeout won't throw an exception but only return false?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613245679
 
 
   **[Test build #121256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121256/testReport)** for PR 27724 at commit [`cf2bcc2`](https://github.com/apache/spark/commit/cf2bcc2e7a123875ccace1438398e50d69b727d8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613452424
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407945243
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   I feel that the logical here is too complex. It really does not make sense to allow user to set 0 timeout which could introduce the potential bug again. Also negative timeout may cause app hang, which breaks our initial motivation to introduce this conf (which is to avoid app hang).
   
   Thus, I think we should consider the `timeout > 0` case only and add `checkValue(timeout > 0)` to the conf and simplify the code here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408819108
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
 
 Review comment:
   I know.  But even with 10 times loop it's  still possible that the issue can not be reproduced, right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607863963
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25424/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614171578
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121325/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r409457245
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
+      assume(TestUtils.testCommandAvailable("/bin/bash"))
+
+      val rowsDf = Seq("a", "b", "c").map(Tuple1.apply).toDF("a")
+      val e = intercept[SparkException] {
+        val plan =
+          new ScriptTransformationExec(
+            input = Seq(rowsDf.col("a").expr),
+            script = "some_non_existent_command",
 
 Review comment:
   I tried some times in my Mac, this issue can not be reproduced by sleeping several seconds in the script. It is easy to reproduce this issue by this way https://github.com/apache/spark/pull/27724#issuecomment-607628004. So I add an UT by this code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612887293
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25898/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966076
 
 
   **[Test build #121077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121077/testReport)** for PR 27724 at commit [`9233adc`](https://github.com/apache/spark/commit/9233adc7d6f74b419f59e8498e276e99d665a60c).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955698
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612887282
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407422621
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,12 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
 
 Review comment:
   let's include the unit in the name, like `....timeoutInSeconds`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613263891
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613246184
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611971278
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r388068768
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +161,7 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
 
 Review comment:
   Oh...I mean, is it possible that `scriptOutputReader.next` returns `<=0` but the process hang?
   If it's possible, we may need to destroy the process before `proc.waitFor()`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408813915
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,26 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            proc.waitFor(timeout, TimeUnit.SECONDS)
+          } catch {
+            case t: Throwable =>
+              log.warn(s"Transformation script process exits timeout in ${timeout} seconds", t)
 
 Review comment:
   Yes, it is bad mistake. Code updated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000965
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791400
 
 
   **[Test build #121185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121185/testReport)** for PR 27724 at commit [`a5628ef`](https://github.com/apache/spark/commit/a5628efb0c95c1b6414f97633cc4b0f3dab7bb52).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612000969
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25775/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613451760
 
 
   **[Test build #121281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121281/testReport)** for PR 27724 at commit [`b48e63a`](https://github.com/apache/spark/commit/b48e63af6372b4c44c3d368d50190ad4ec3810f0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614015373
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614171560
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965983
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25768/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611967933
 
 
   **[Test build #121078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121078/testReport)** for PR 27724 at commit [`aab62cb`](https://github.com/apache/spark/commit/aab62cb71bb40eaad92f1b2d52858c088163b64d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607867530
 
 
   **[Test build #120726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120726/testReport)** for PR 27724 at commit [`99010e0`](https://github.com/apache/spark/commit/99010e0a020c7e72cfc0ac89e50b8a5b819a65ba).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791848
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25872/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613245950
 
 
   > is [#27724 (comment)](https://github.com/apache/spark/pull/27724#issuecomment-607628004) a valid UT for this fix?
   
   Yes, I have added this UT.   There are 2 corner cases in `ScriptTransformationExec` which may incur this bug.  I add `waitFor` logic in both cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607863948
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955709
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120726/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612118405
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121085/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407422621
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,12 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
 
 Review comment:
   let's include the unit the the name, like `....timeoutInSeconds`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612887282
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607863948
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614170147
 
 
   **[Test build #121325 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121325/testReport)** for PR 27724 at commit [`6a44091`](https://github.com/apache/spark/commit/6a440913d4ac434c6cc32e02bfcd2f93f7116dc0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613544267
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407945243
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   I feel that the logic here is too complex. It really does not make sense to allow user to set 0 timeout which could introduce the potential bug again. Also negative timeout may cause app hang, which breaks our initial motivation to introduce this conf (which is to avoid app hang).
   
   Thus, I think we should consider the `timeout > 0` case only and add `checkValue(timeout > 0)` to the conf and simplify the code here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611971278
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612791841
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613452429
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25967/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611965983
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25768/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612890710
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121185/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408819564
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
+      assume(TestUtils.testCommandAvailable("/bin/bash"))
 
 Review comment:
   nit: `assume(!Utils.isWindows && TestUtils.testCommandAvailable("/bin/bash"))`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613022809
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121210/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613452424
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-614171560
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613263904
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121256/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407465261
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +163,24 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  // There can be a lag between reader read EOF and the process termination.
+                  // If the script fails to startup, this kind of error may be missed.
+                  // So explicitly waiting for the process termination.
+                  val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+                  try {
+                    if (timeout < 0) {
+                      proc.waitFor()
 
 Review comment:
   doc is fixed and resubmitted.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613263891
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613022800
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r408814892
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
 ##########
 @@ -228,6 +228,26 @@ class ScriptTransformationSuite extends SparkPlanTest with SQLTestUtils with Tes
         'e.cast("string")).collect())
     }
   }
+
+  test("SPARK-30973: ScriptTransformationExec should wait for the termination of script") {
+    (0 until 10).foreach { _ =>
 
 Review comment:
   The loop is not needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407423386
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -161,6 +163,24 @@ case class ScriptTransformationExec(
 
               if (scriptOutputReader != null) {
                 if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
+                  // There can be a lag between reader read EOF and the process termination.
+                  // If the script fails to startup, this kind of error may be missed.
+                  // So explicitly waiting for the process termination.
+                  val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+                  try {
+                    if (timeout < 0) {
+                      proc.waitFor()
 
 Review comment:
   we should mention it in the config doc that, negative timeout means infinite.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966103
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121077/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613245679
 
 
   **[Test build #121256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121256/testReport)** for PR 27724 at commit [`cf2bcc2`](https://github.com/apache/spark/commit/cf2bcc2e7a123875ccace1438398e50d69b727d8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613245095
 
 
   > is [#27724 (comment)](https://github.com/apache/spark/pull/27724#issuecomment-607628004) a valid UT for this fix?
   
   Yes, I have added this UT.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407945243
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##########
 @@ -146,12 +148,32 @@ case class ScriptTransformationExec(
           }
         }
 
+        private def waitForTransformationProcessTermination(): Unit = {
+          val timeout = SQLConf.get.getConf(SQLConf.TRANSFORMATION_EXIT_TIMEOUT)
+          try {
+            if (timeout < 0) {
+              proc.waitFor()
+            } else if (timeout == 0) {
+              // Do nothing
+            } else {
+              proc.waitFor(timeout, TimeUnit.SECONDS)
+            }
 
 Review comment:
   I feel that the logical here is too complex. It really does not make sense to allow user to set 0 timeout which could introduce the potential bug again. Also negative timeout may cause app hang, which breaks our initial motivation of avoiding app hang.
   
   Thus, I think we should only allow timeout > 0 and add `checkValue(timeout > 0)` to the conf and simplify the code here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613544279
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121281/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-612890703
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613022809
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121210/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-607955698
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
slamke commented on a change in pull request #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#discussion_r407465318
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2264,6 +2264,12 @@ object SQLConf {
     .booleanConf
     .createWithDefault(false)
 
+  val TRANSFORMATION_EXIT_TIMEOUT = buildConf("spark.sql.transformation.exit.timeout")
+    .internal()
+    .doc("The timeout for executor to wait for transformation script exit when EOF.")
+    .timeConf(TimeUnit.MILLISECONDS)
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-611966103
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121077/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27724: [SPARK-30973][SQL] ScriptTransformationExec should wait for the termination …
URL: https://github.com/apache/spark/pull/27724#issuecomment-613451760
 
 
   **[Test build #121281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121281/testReport)** for PR 27724 at commit [`b48e63a`](https://github.com/apache/spark/commit/b48e63af6372b4c44c3d368d50190ad4ec3810f0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org