You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rahul Bhatia (JIRA)" <ji...@apache.org> on 2016/08/17 13:54:20 UTC

[jira] [Commented] (SPARK-11328) Provide more informative error message when direct parquet output committer is used and there is a file already exists error.

    [ https://issues.apache.org/jira/browse/SPARK-11328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424550#comment-15424550 ] 

Rahul Bhatia commented on SPARK-11328:
--------------------------------------

I'm still finding symptoms of this error in 1.6.2.

To recreate:
1. set spark to use the DirectParquetOutputCommitter
2. set spark.executor.cores high enough to force OOM error 

When the executor's OOM triggers and Yarn kills the container, the next attempt at that task will error with "file already exists"

> Provide more informative error message when direct parquet output committer is used and there is a file already exists error.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11328
>                 URL: https://issues.apache.org/jira/browse/SPARK-11328
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Nong Li
>            Priority: Critical
>             Fix For: 1.5.3, 1.6.0
>
>
> When saving data to S3 (e.g. saving to parquet), if there is an error during the query execution, the partial file generated by the failed task will be uploaded to S3 and the retries of this task will throw file already exist error. It is very confusing to users because they may think that file already exist error is the error causing the job failure. They can only find the real error in the spark ui (in the stage page).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org