You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liutang123 <gi...@git.apache.org> on 2018/04/26 08:58:02 UTC
[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...
GitHub user liutang123 opened a pull request:
https://github.com/apache/spark/pull/21164
[SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish
When feed thread doesn't set its _exception variable and the progress doesn't exit completely, output Iterator will return false in hasNext function.
## What changes were proposed in this pull request?
wait script process exiting before output iterator finish.
## How was this patch tested?
manual test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liutang123/spark SPARK-24098
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21164.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21164
----
commit a77ac06180cbecc28383f3b73e05259502896613
Author: liutang123 <li...@...>
Date: 2018-04-26T08:54:44Z
[SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/21164
@rxin hi, Do you have time to look at this PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21164#discussion_r185693669
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala ---
@@ -137,13 +137,12 @@ case class ScriptTransformationExec(
throw writerThread.exception.get
}
- if (!proc.isAlive) {
- val exitCode = proc.exitValue()
- if (exitCode != 0) {
- logError(stderrBuffer.toString) // log the stderr circular buffer
- throw new SparkException(s"Subprocess exited with status $exitCode. " +
- s"Error: ${stderrBuffer.toString}", cause)
- }
+ proc.waitFor()
--- End diff --
Although writerThread._exception is a volatile variable, some times writerThread.exception function may be called before writerThread._exception's assignment due to readThread and writerThread are working parallel.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21164
**[Test build #91616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91616/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21164
**[Test build #91616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91616/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/21164
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21164
**[Test build #97563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97563/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec shoul...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21164#discussion_r185511245
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala ---
@@ -137,13 +137,12 @@ case class ScriptTransformationExec(
throw writerThread.exception.get
}
- if (!proc.isAlive) {
- val exitCode = proc.exitValue()
- if (exitCode != 0) {
- logError(stderrBuffer.toString) // log the stderr circular buffer
- throw new SparkException(s"Subprocess exited with status $exitCode. " +
- s"Error: ${stderrBuffer.toString}", cause)
- }
+ proc.waitFor()
--- End diff --
basically you are moving the `proc.waitFor()` to after `if (writerThread.exception.isDefined) ...`, why is it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/21164
Bug Reappearance:
1. Add Thread.sleep(1000 * 600) before assign for _exception.
2. structure a python script witch will throw exception like follow:
test.py
```import sys
for line in sys.stdin:
raise Exception('error')
print line
```
3. use script created in step 2 in transform.
```spark-sql>add files /path_to/test.py;```
```spark-sql>select transform(id) using 'python test.py' as id from city;```
The result is that spark will end successfully.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21164
**[Test build #97563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97563/testReport)** for PR 21164 at commit [`a77ac06`](https://github.com/apache/spark/commit/a77ac06180cbecc28383f3b73e05259502896613).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by cxzl25 <gi...@git.apache.org>.
Github user cxzl25 commented on the issue:
https://github.com/apache/spark/pull/21164
@liutang123 cc @cloud-fan @gatorsmile
I also encountered this problem.
![image](https://user-images.githubusercontent.com/3898450/40981493-9b0b0d46-690d-11e8-8607-c14756610d59.png)
python:
```python
import sys
for line in sys.stdin:
print 1/0
```
sql:
```sql
ADD FILE test.py;
SELECT TRANSFORM(1) USING 'python test.py'
AS (c1)
```
I solved it this way: *writerThread.join()*
```java
if (scriptOutputReader.next(scriptOutputWritable) <= 0) {
writerThread.join()
checkFailureAndPropagate()
return false
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/21164
@cloud-fan hi, fan, do you have time to see this pr? I think this is a critical bug.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21164
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21164
cc @dongjoon-hyun @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91616/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21164
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97563/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait ...
Posted by liutang123 <gi...@git.apache.org>.
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/21164
@gatorsmile Could you please give some comments when you have time? Thanks so much.
In addition, I think this is a critical bug!!!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org