You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Kumar Ravi <ku...@us.ibm.com> on 2012/03/21 21:47:47 UTC

Hadoop-8192 and rackToBlocks ordering


Hello,

 We have been looking at IBM JDK junit failures on Hadoop-1.0.1
independently and have ran into the same failures as reported in this JIRA.
I have a question based upon what I have observed below.

We started debugging the problems in the testcase -
org.apache.hadoop.mapred.lib.TestCombineFileInputFormat
The testcase fails because the number of splits returned back from
CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the
expected return value is 2.

So far, we have found the reason for this difference in number of splits is
because the order in which elements in the nodeToBlocks hashmap get created
is in the reverse order that Sun JDK creates.

The question I have at this point is -- Should there be a strict dependency
in the order in which the rackToBlocks hashmap gets populated, to determine
the number of splits that get should get created in a hadoop cluster? Is
this Working as designed?

Regards,
Kumar



Kumar Ravi
IBM Linux Technology Center

Re: Hadoop-8192 and rackToBlocks ordering

Posted by Devaraj Das <dd...@hortonworks.com>.
Hi Kumar, 
I assume you are referring to HADOOP-8192. I need to look at the code to see if the ordering matters but most likely it wouldn't.
Thanks
Devaraj
On Mar 21, 2012, at 1:47 PM, Kumar Ravi wrote:

> 
> 
> Hello,
> 
> We have been looking at IBM JDK junit failures on Hadoop-1.0.1
> independently and have ran into the same failures as reported in this JIRA.
> I have a question based upon what I have observed below.
> 
> We started debugging the problems in the testcase -
> org.apache.hadoop.mapred.lib.TestCombineFileInputFormat
> The testcase fails because the number of splits returned back from
> CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the
> expected return value is 2.
> 
> So far, we have found the reason for this difference in number of splits is
> because the order in which elements in the nodeToBlocks hashmap get created
> is in the reverse order that Sun JDK creates.
> 
> The question I have at this point is -- Should there be a strict dependency
> in the order in which the rackToBlocks hashmap gets populated, to determine
> the number of splits that get should get created in a hadoop cluster? Is
> this Working as designed?
> 
> Regards,
> Kumar
> 
> 
> 
> Kumar Ravi
> IBM Linux Technology Center