You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Yang Li (JIRA)" <ji...@apache.org> on 2016/11/23 01:12:59 UTC

[jira] [Commented] (SPARK-1677) Allow users to avoid Hadoop output checks if desired

    [ https://issues.apache.org/jira/browse/SPARK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688502#comment-15688502 ] 

Yang Li commented on SPARK-1677:
--------------------------------

Hi Spark Community,

I'm curious on the behavior of this "spark.hadoop.validateOutputSpecs" option. If I set it to 'false', will existing files in output directory get wiped out beforehand? For example, if spark job is to output file Y under directory A, which already contain file X, do we expect both file X and Y under folder A? Or just Y will be retained after the job completion.

Thanks!

> Allow users to avoid Hadoop output checks if desired
> ----------------------------------------------------
>
>                 Key: SPARK-1677
>                 URL: https://issues.apache.org/jira/browse/SPARK-1677
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Patrick Wendell
>            Assignee: Nan Zhu
>             Fix For: 1.0.1, 1.1.0
>
>
> For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true) and a description "If set to true, validates the output specification used in saveAsHadoopFile and other variants. This can be disabled to silence exceptions due to pre-existing output directories."
> This would just wrap the checking done in this PR:
> https://issues.apache.org/jira/browse/SPARK-1100
> https://github.com/apache/spark/pull/11
> By first checking the spark conf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org