You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/03 02:07:51 UTC

[GitHub] spark pull request #18714: [SPARK-20236][SQL] dynamic partition overwrite

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18714#discussion_r159352867
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala ---
    @@ -39,8 +39,19 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil
      *
      * @param jobId the job's or stage's id
      * @param path the job's output path, or null if committer acts as a noop
    + * @param dynamicPartitionOverwrite If true, Spark will overwrite partition directories at runtime
    + *                                  dynamically, i.e., we first write files under a staging
    + *                                  directory with partition path, e.g.
    + *                                  /path/to/staging/a=1/b=1/xxx.parquet. When committing the job,
    + *                                  we first clean up the corresponding partition directories at
    + *                                  destination path, e.g. /path/to/destination/a=1/b=1, and move
    + *                                  files from staging directory to the corresponding partition
    + *                                  directories under destination path.
      */
    -class HadoopMapReduceCommitProtocol(jobId: String, path: String)
    +class HadoopMapReduceCommitProtocol(
    +     jobId: String,
    +     path: String,
    +     dynamicPartitionOverwrite: Boolean = false)
    --- End diff --
    
    Indents.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org