You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-commits@hadoop.apache.org by dh...@apache.org on 2010/07/06 08:26:03 UTC
svn commit: r960808 - in /hadoop/mapreduce/trunk: CHANGES.txt
src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java
Author: dhruba
Date: Tue Jul 6 06:26:02 2010
New Revision: 960808
URL: http://svn.apache.org/viewvc?rev=960808&view=rev
Log:
MAPREDUCE-1838. Reduce the time needed for raiding a bunch of files
by randomly assigning files to map tasks. (Ramkumar Vadali via dhruba)
Modified:
hadoop/mapreduce/trunk/CHANGES.txt
hadoop/mapreduce/trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java
Modified: hadoop/mapreduce/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=960808&r1=960807&r2=960808&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/CHANGES.txt (original)
+++ hadoop/mapreduce/trunk/CHANGES.txt Tue Jul 6 06:26:02 2010
@@ -146,6 +146,9 @@ Trunk (unreleased changes)
MAPREDUCE-1894. Fixed a bug in DistributedRaidFileSystem.readFully()
that was causing it to loop infinitely. (Ramkumar Vadali via dhruba)
+ MAPREDUCE-1838. Reduce the time needed for raiding a bunch of files
+ by randomly assigning files to map tasks. (Ramkumar Vadali via dhruba)
+
Release 0.21.0 - Unreleased
INCOMPATIBLE CHANGES
Modified: hadoop/mapreduce/trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java
URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java?rev=960808&r1=960807&r2=960808&view=diff
==============================================================================
--- hadoop/mapreduce/trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java (original)
+++ hadoop/mapreduce/trunk/src/contrib/raid/src/java/org/apache/hadoop/raid/DistRaid.java Tue Jul 6 06:26:02 2010
@@ -324,6 +324,11 @@ public class DistRaid {
opWriter = SequenceFile.createWriter(fs, jobconf, opList, Text.class,
PolicyInfo.class, SequenceFile.CompressionType.NONE);
for (RaidPolicyPathPair p : raidPolicyPathPairList) {
+ // If a large set of files are Raided for the first time, files
+ // in the same directory that tend to have the same size will end up
+ // with the same map. This shuffle mixes things up, allowing a better
+ // mix of files.
+ java.util.Collections.shuffle(p.srcPaths);
for (FileStatus st : p.srcPaths) {
opWriter.append(new Text(st.getPath().toString()), p.policy);
opCount++;