You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/04/22 01:15:22 UTC

[jira] Created: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-3293
                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644224#action_12644224 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Runping's point on aggregating bytes over racks to determine rack locality makes sense.

The problem is that the JobClient is unaware of the topology. Some ways to build the topology awareness are:
# Make the JobClient query the topology service and build its own topology awareness. The problem with this approach is that we need to ensure that the topology script that is used by the JobClient and the JobTracker are always in sync. 
# Let the client get back rack information along with the hosts when it queries the FS for block locations (fs.getFileBlockLocations). We could add a new method fs.getResolvedFileBlockLocations that returns the hosts with the rack information. The default implementation would just return the hosts, DFS would override this method and will return the rack information along with the hosts. We are guranteed correct topology information as the Namenode and JobTracker would be using the same topology information.

The second approach looks better. Thoughts?



> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644244#action_12644244 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Runping, just to make sure I understand this correctly, consider the following hypothetical scenario

Rack 1 has hosts A,B,C
Rack 2 has host D

A contributes 25, B contributes 20, C contributes 15, D contributes 40

Then the preference of hosts would be A,B,C,D as A, B and C have an intra-rack contribution of 60 and external contribution of 40, whereas D has intra rack contribution of 40 and external contribution of 60. 
Is this correct?


> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647785#action_12647785 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Yes, I did

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647913#action_12647913 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3293:
------------------------------------------------

I see.  There must be something wrong in the Hudson machine.  It seems having a lot problems recently.

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan reassigned HADOOP-3293:
-----------------------------------------

    Assignee: Jothi Padmanabhan

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648097#action_12648097 ] 

Hadoop QA commented on HADOOP-3293:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393879/hadoop-3293.patch
  against trunk revision 714107.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3594/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3594/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3594/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3594/console

This message is automatically generated.

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644025#action_12644025 ] 

Runping Qi commented on HADOOP-3293:
------------------------------------

In the above case, I'd say that the prefer hosts for the split should be in the order of A,D,B,E,C,F.
We should also aggregate the bytes over the racks of those hosts.
For example, suppose C,E,F share  the same rack while other nodes are on different rack.
Then host E (F, and even C) will offer better rack locality than other hosts.
In practice, rack locality is almost as good as node locality.
 

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-3293:
--------------------------------------

    Status: Patch Available  (was: Open)

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jothi Padmanabhan updated HADOOP-3293:
--------------------------------------

    Attachment: hadoop-3293.patch

Initial Patch for review

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644273#action_12644273 ] 

Runping Qi commented on HADOOP-3293:
------------------------------------

yeh.


> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643943#action_12643943 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

bq. If we aggregate on a per host basis, host A having contributed 120 bytes would be the ideal choice. However, if we choose Block 1 as the index to be returned, even hosts B &C would be treated as data local, which is sub optimal.

To make this  clear -- Having decided that A is a good host, we now should also have a good way to decide to pick the correct block from the list of blocks that reside in A. In this case, we should choose between Block1 and Block2. If Block 1 is chosen, it is not very optimal as hosts B & C have only 20 bytes with them.


> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643792#action_12643792 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

The reason for this is that  FileInputFormat.getBlockIndex() returns the blockindex of the starting block for the given offset. Instead, it should identify all the blocks that this particular split spans and then choose the block that contributes the maximum data for this split.  

We could use the following approach

{code}
//Calculate the number of blocks the split spans
if (numBlocks == 1)
  return startIndex;
else if (numBlocks == 2)
  return (bytesInFirstBlock > bytesInSecondBlock) ? startIndex:startIndex+1;
else 
  return startIndex + 1;
{code}

The rationale here is that if there are more than two blocks, we are guaranteed that block 2 is contributing its entire block length for this split.

Note that we cannot do the identification of the block index based on the amount of data contributed by the individual host, because of the replication factor.
For example, consider the following example (assume dfs block size = 100)
Block 1 contributes 20 bytes and its hosts are A,B,C
Block 2 contributes 100 bytes and its hosts are A, D,E
Block 3 contributes 10 bytes and its hosts are  D,E,F

If we aggregate on a per host basis, host A having contributed 120 bytes would be the ideal choice. However, if we choose Block 1 as the index to be returned, even hosts B &C would be treated as data local, which is sub optimal.  
Thoughts?

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644460#action_12644460 ] 

dhruba borthakur commented on HADOOP-3293:
------------------------------------------

> DFS would override this method and will return the rack information along with the hosts.

This is a good idea, but returning only rack location might not work in the general case when there are more than 2 levels in the network topology. Knowing the name of a rack might not tell you how close it is to another rack. But getFileBlockLocations could return the complete path of the host in the network topology. I will provide a patch for this one. See HADOOP-4567

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647771#action_12647771 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3293:
------------------------------------------------

> Initial Patch for review 

For Initial Patch, you might not want to submit it since the patch is likely needed to be updated.  Also, you might want to test it locally before submitting it.

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647783#action_12647783 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3293:
------------------------------------------------

> the Patch was locally tested. 
Have you run all unit tests by "ant test"?  It seems that the patch has some problems and it is currently stuck in Hudson.  See http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3593/

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643989#action_12643989 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Since the BlkIndex is used only to identify the hosts,
{code}
        int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining,
                                       splitSize);
          splits.add(new FileSplit(path, length-bytesRemaining, splitSize, 
                                   blkLocations[blkIndex].getHosts()));
{code}

we could also modify getBlockIndex() to return a list of hosts that contain the maximum data for that split. For example, if the split was 
Block1  80Bytes   Hosts-A,B,C
Block2 100Bytes  Hosts A,D,E
Block 3 70Bytes   Hosts D,F,B

We would identify the hosts and their contribution as
A  180
B 150
C 80
D 170
E 100
F 70

We could return A,D,B


> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3293:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Jothi!

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.20.0
>
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3293) When an input split spans cross block boundary, the split location should be the host having most of bytes on it.

Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647781#action_12647781 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

Sorry, I should have said "Patch for review"; the Patch was locally tested. 
I also did a test to demonstrate the performance improvement from the patch. I allocated a 440 node cluster, ran randomwriter with 40 maps, each map output 25G. I then killed the task trackers on the nodes that ran the maps. I then ran a modified sort (no map output, no reduces) with a minimum input split of 10G. If found that, over an average of three runs, patch was about 17 seconds faster than the trunk (175 secs as opposed to 192 secs)

> When an input split spans cross block boundary, the split location should be the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>         Attachments: hadoop-3293.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.