You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sean Hogan <se...@uchicago.edu> on 2011/08/12 21:48:00 UTC

Where is the hadoop-examples source code for the Sort example mapper/reducer?

Hi all,

I was interested in learning from how Hadoop implements their sort algorithm
in the map/reduce framework. Could someone point me to the directory of the
source code that has the mapper/reducer that the Sort example uses by
default when I invoke:

$ hadoop jar hadoop-*-examples.jar sort input output

Thanks. I've found Sort.java here :

http://svn.apache.org/viewvc/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/

But have not been able to track down the mapper/reducer implementation.

-Sean Hogan

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Sean Hogan <se...@gmail.com>.

Yep, got it.

Thanks.

-Sean

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Kai Voigt <k...@123.org>.

Good job, in MapReduce you can build your own Partitioner. That is code determining which reducer will get which keys.

For simplicity, assume you're running 26 reducers. Your custom Partitioner will make sure the first reducer gets all keys starting with 'a', and so on.

Since the keys will be sorted within a single reducer, you can concatenate your 26 output files to get an overall sorted output.

Making sense?

Kai

Am 13.08.2011 um 17:44 schrieb Sean Hogan:

> Oh, okay, got it - if there was more than one reducer then there needs to be
> a way to guarantee that the overall output from multiple reducers will still
> be sorted.
> 
> So I want to look for where the implementation of the shuffle/sort phase is
> located. Or find something on how Hadoop implements the MapReduce
> sort/shuffle phase.
> 
> Thanks!
> 
> -Sean

-- 
Kai Voigt
k@123.org

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Sean Hogan <se...@gmail.com>.

Oh, okay, got it - if there was more than one reducer then there needs to be
a way to guarantee that the overall output from multiple reducers will still
be sorted.

So I want to look for where the implementation of the shuffle/sort phase is
located. Or find something on how Hadoop implements the MapReduce
sort/shuffle phase.

Thanks!

-Sean

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Kai Voigt <k...@123.org>.

Hi,

the Identity Mapper and Reducer do what the name implies, they pretty much return their input as their output.

TeraSort relies on the sorting that is built in Hadoop's Sort&Shuffle phase.

So, the map() method in TeraSort looks like this:

map(offset, line) -> (line, _)

offset is the key to map() and represents the byte offset of the line (which is the value). map() returns the line as the key and some value which is not needed.

reduce() looks like this:

reduce(line, values) -> (line)

Again, the input is returned as is. The sort&shuffle layer between map() and reduce() guarantees that keys (lines) will come in sorted order. That's why the overall output will be the sorted input.

This all is easy when there's just one reducer. Question to make sure you understood things so far: What's the issue with more than one reducer?

Kai

Am 13.08.2011 um 17:10 schrieb Sean Hogan:

> Thanks for the link, but it hasn't helped answer my original question - that
> Sort.java seems to use IdentityMapper and IdentityReducer. Perhaps it is the
> Sort.java that is used when executing the below command, but I can't figure
> out what it actually uses for the mapper and reducer. It's entirely possible
> I'm just missing something obvious.
> 
> I'm interested in seeing how the map and reduce fits into sorting with the
> following command:
> 
> $ hadoop jar hadoop-*-examples.jar sort input output
> 
> I'd appreciate it if someone could explain what mappers/reducers are used in
> that above command (link to the implementation of whatever sort they use and
> how it fits into MapReduce)
> 
> Thanks.
> 
> -Sean

-- 
Kai Voigt
k@123.org

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Sean Hogan <se...@gmail.com>.

Thanks for the link, but it hasn't helped answer my original question - that
Sort.java seems to use IdentityMapper and IdentityReducer. Perhaps it is the
Sort.java that is used when executing the below command, but I can't figure
out what it actually uses for the mapper and reducer. It's entirely possible
I'm just missing something obvious.

I'm interested in seeing how the map and reduce fits into sorting with the
following command:

$ hadoop jar hadoop-*-examples.jar sort input output

I'd appreciate it if someone could explain what mappers/reducers are used in
that above command (link to the implementation of whatever sort they use and
how it fits into MapReduce)

Thanks.

-Sean

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Kai Voigt <k...@123.org>.

Hi,

some search on Google would have told you. Here's one link:

http://code.google.com/p/hop/source/browse/trunk/src/examples/org/apache/hadoop/examples/?r=131

Kai

Am 13.08.2011 um 15:27 schrieb Sean Hogan:

> Hi all,
> 
> I was interested in learning from how Hadoop implements their sort algorithm
> in the map/reduce framework. Could someone point me to the directory of the
> source code that has the mapper/reducer that the Sort example uses by
> default when I invoke:
> 
> $ hadoop jar hadoop-*-examples.jar sort input output
> 
> Thanks. I've found Sort.java here :
> 
> http://svn.apache.org/viewvc/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/
> 
> But have not been able to track down the mapper/reducer implementation.
> 
> -Sean Hogan

-- 
Kai Voigt
k@123.org

Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Sean Hogan <se...@gmail.com>.

Hi all,

I was interested in learning from how Hadoop implements their sort algorithm
in the map/reduce framework. Could someone point me to the directory of the
source code that has the mapper/reducer that the Sort example uses by
default when I invoke:

$ hadoop jar hadoop-*-examples.jar sort input output

Thanks. I've found Sort.java here :

http://svn.apache.org/viewvc/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/

But have not been able to track down the mapper/reducer implementation.

-Sean Hogan

Re: Where is the hadoop-examples source code for the Sort example mapper/reducer?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Sean,

 The sort impl is spread out over many files. I'd start with MapTask and ReduceTask and follow from there on.

 LMK if you need more info.

thanks,
Arun

On Aug 12, 2011, at 12:48 PM, Sean Hogan wrote:

> Hi all,
> 
> I was interested in learning from how Hadoop implements their sort algorithm
> in the map/reduce framework. Could someone point me to the directory of the
> source code that has the mapper/reducer that the Sort example uses by
> default when I invoke:
> 
> $ hadoop jar hadoop-*-examples.jar sort input output
> 
> Thanks. I've found Sort.java here :
> 
> http://svn.apache.org/viewvc/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/
> 
> But have not been able to track down the mapper/reducer implementation.
> 
> -Sean Hogan