You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/07/29 01:05:04 UTC

[jira] [Resolved] (SPARK-9393) Fix several error-handling bugs in ScriptTransform operator

     [ https://issues.apache.org/jira/browse/SPARK-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-9393.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Issue resolved by pull request 7710
[https://github.com/apache/spark/pull/7710]

> Fix several error-handling bugs in ScriptTransform operator
> -----------------------------------------------------------
>
>                 Key: SPARK-9393
>                 URL: https://issues.apache.org/jira/browse/SPARK-9393
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> SparkSQL's ScriptTransform operator has several serious bugs which make debugging fairly difficult:
> - If exceptions are thrown in the writing thread then the child process will not be killed, leading to a deadlock because the reader thread will block while waiting for input that will never arrive.
> - TaskContext is not propagated to the writer thread, which may cause errors in upstream pipelined operators.
> - Exceptions which occur in the writer thread are not propagated to the main reader thread, which may cause upstream errors to be silently ignored instead of killing the job.  This can lead to silently incorrect query results.
> - The writer thread is not a daemon thread, but it should be.
> In addition, the code in this file is extremely messy:
> - Lots of fields are nullable but the nullability isn't clearly explained.
> - Many confusing variable names: for instance, there are variables named {{iter}} and {{iterator}} that are defined in the same scope.
> - Lots of code was misindented.
> - The {{*serdeClass}} variables are actually expected to be single-quoted strings, which is really confusing: I feel that this parsing / extraction should be performed in the analyzer, not in the operator itself.
> - There were no unit tests for the operator itself, only end-to-end tests.
> I have a pull request that fixes all of these issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org