You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2019/10/18 07:03:00 UTC
[jira] [Created] (SPARK-29506) Use dynamicPartitionOverwrite in
FileCommitProtocol when insert into hive table
L. C. Hsieh created SPARK-29506:
-----------------------------------
Summary: Use dynamicPartitionOverwrite in FileCommitProtocol when insert into hive table
Key: SPARK-29506
URL: https://issues.apache.org/jira/browse/SPARK-29506
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh
When insert overwrite into hive table, enabling dynamicPartitionOverwrite when initializing FileCommitProtocol.
HadoopMapReduceCommitProtocol uses FileOutputCommitter to commit job output files.
FileOutputCommitter continues do FileSystem.listStatus for directories in partitions, recursively, and commits job output leaf files.
It is inefficient when dynamically overwritting many partitions and files.
HadoopMapReduceCommitProtocol, when dynamicPartitionOverwrite is enabled, writes to staging dir dynamically, and commits written partition directories, instead of leaf files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org