You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2019/10/18 07:03:00 UTC

[jira] [Created] (SPARK-29506) Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table

L. C. Hsieh created SPARK-29506:
-----------------------------------

             Summary: Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table
                 Key: SPARK-29506
                 URL: https://issues.apache.org/jira/browse/SPARK-29506
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: L. C. Hsieh
            Assignee: L. C. Hsieh


When insert overwrite into hive table, enabling dynamicPartitionOverwrite when initializing FileCommitProtocol.

HadoopMapReduceCommitProtocol uses FileOutputCommitter to commit job output files.

FileOutputCommitter continues do FileSystem.listStatus for directories in partitions, recursively, and commits job output leaf files.

It is inefficient when dynamically overwritting many partitions and files.

HadoopMapReduceCommitProtocol, when dynamicPartitionOverwrite is enabled, writes to staging dir dynamically, and commits written partition directories, instead of leaf files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org