You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/10/27 00:45:27 UTC

[jira] [Commented] (SPARK-11328) Correctly propagate error message in the case of failures when writing parquet

    [ https://issues.apache.org/jira/browse/SPARK-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975361#comment-14975361 ] 

Yin Huai commented on SPARK-11328:
----------------------------------

The file already exists error was thrown from [this line | https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L237] when we try to create a record writer.

> Correctly propagate error message in the case of failures when writing parquet
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-11328
>                 URL: https://issues.apache.org/jira/browse/SPARK-11328
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Yin Huai
>
> When saving data to S3 (e.g. saving to parquet), if there is an error during the query execution, the partial file generated by the failed task will be uploaded to S3 and the retries of this task will throw file already exist error. It is very confusing to users because they may think that file already exist error is the error causing the job failure. They can only find the real error in the spark ui (in the stage page).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org