You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2015/02/16 08:20:15 UTC

[jira] [Issue Comment Deleted] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-4815:
-------------------------------
    Comment: was deleted

(was: I am out of the office for vacation returning Monday, February 16th.

Please contact Brad Pittiglio for any requests relating to resourcing.

For Cloudera support requests, please contact Cloudera Support directly at support.cloudera.com.

Thank you,
Brian Schrameck
Sila Solutions Group
www.silasg.com<http://www.silasg.com>

)

> FileOutputCommitter.commitJob can be very slow for jobs with many output files
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
>            Reporter: Jason Lowe
>            Assignee: Siqi Li
>         Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch
>
>
> If a job generates many files to commit then the commitJob method call at the end of the job can take minutes.  This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do.  The commit work was processed in parallel and overlapped the processing of outstanding tasks.  In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)