You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Li (JIRA)" <ji...@apache.org> on 2016/11/23 01:12:59 UTC
[jira] [Commented] (SPARK-1677) Allow users to avoid Hadoop output
checks if desired
[ https://issues.apache.org/jira/browse/SPARK-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688502#comment-15688502 ]
Yang Li commented on SPARK-1677:
--------------------------------
Hi Spark Community,
I'm curious on the behavior of this "spark.hadoop.validateOutputSpecs" option. If I set it to 'false', will existing files in output directory get wiped out beforehand? For example, if spark job is to output file Y under directory A, which already contain file X, do we expect both file X and Y under folder A? Or just Y will be retained after the job completion.
Thanks!
> Allow users to avoid Hadoop output checks if desired
> ----------------------------------------------------
>
> Key: SPARK-1677
> URL: https://issues.apache.org/jira/browse/SPARK-1677
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.0.0
> Reporter: Patrick Wendell
> Assignee: Nan Zhu
> Fix For: 1.0.1, 1.1.0
>
>
> For compatibility with older versions of Spark it would be nice to have an option `spark.hadoop.validateOutputSpecs` (default true) and a description "If set to true, validates the output specification used in saveAsHadoopFile and other variants. This can be disabled to silence exceptions due to pre-existing output directories."
> This would just wrap the checking done in this PR:
> https://issues.apache.org/jira/browse/SPARK-1100
> https://github.com/apache/spark/pull/11
> By first checking the spark conf.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org