You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/03 02:07:51 UTC
[GitHub] spark pull request #18714: [SPARK-20236][SQL] dynamic partition overwrite
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/18714#discussion_r159352867
--- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala ---
@@ -39,8 +39,19 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil
*
* @param jobId the job's or stage's id
* @param path the job's output path, or null if committer acts as a noop
+ * @param dynamicPartitionOverwrite If true, Spark will overwrite partition directories at runtime
+ * dynamically, i.e., we first write files under a staging
+ * directory with partition path, e.g.
+ * /path/to/staging/a=1/b=1/xxx.parquet. When committing the job,
+ * we first clean up the corresponding partition directories at
+ * destination path, e.g. /path/to/destination/a=1/b=1, and move
+ * files from staging directory to the corresponding partition
+ * directories under destination path.
*/
-class HadoopMapReduceCommitProtocol(jobId: String, path: String)
+class HadoopMapReduceCommitProtocol(
+ jobId: String,
+ path: String,
+ dynamicPartitionOverwrite: Boolean = false)
--- End diff --
Indents.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org