You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ted Malaska (JIRA)" <ji...@apache.org> on 2015/05/18 02:40:00 UTC
[jira] [Resolved] (MAPREDUCE-6367) UniformSizeInputFormat skews
left over bytes to last split
[ https://issues.apache.org/jira/browse/MAPREDUCE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Malaska resolved MAPREDUCE-6367.
------------------------------------
Resolution: Invalid
Release Note: Sorry this jira is not needed
> UniformSizeInputFormat skews left over bytes to last split
> ----------------------------------------------------------
>
> Key: MAPREDUCE-6367
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6367
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.6.0, 2.5.2
> Reporter: Ted Malaska
> Assignee: Ted Malaska
> Priority: Minor
>
> In UniformSizeInputFormat it is trying to get equal amount of bytes to every split. But the logic today will result in every split having a little less then the perfect amount and that left over from every split will be put into the last split.
> Resulting in a large skew for the last split.
> Below if the area of the code that is affected:
> https://github.com/apache/hadoop/blob/9ae7f9eb7baeb244e1b95aabc93ad8124870b9a9/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/UniformSizeInputFormat.java#L98
> The fix would be to change the following line:
> currentSplitSize += srcFileStatus.getLen();
> to
> currentSplitSize += srcFileStatus.getLen() + (currentSplitSize - nBytesPerSplit);
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)