You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2017/03/27 13:00:44 UTC

[jira] [Created] (SPARK-20107) Speed up FileOutputCommitter#commitJob for many output files

Yuming Wang created SPARK-20107:
-----------------------------------

             Summary: Speed up FileOutputCommitter#commitJob for many output files
                 Key: SPARK-20107
                 URL: https://issues.apache.org/jira/browse/SPARK-20107
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Yuming Wang


It can speed up {{11 minutes}} for 216869 output files.

This improvement can effect all cloudera's hadoop cdh5-2.6.0_5.4.0 higher versions,(see: https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433 and https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0) and apache's hadoop 2.7.0 higher versions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org