You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Samaneh Shokuhi <sa...@gmail.com> on 2013/04/21 13:54:22 UTC

MapTask.java -sort part

Hi All,
I have modified the MapTask.java file and tried to disable sort part for
some reasons . I have removed this line
*sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter); *in
sortAndSpill method but no changes seen in result. I have expected to get
unsorted keys as mapper output but was not like that.

My question is did i perform right action to disable sort or something else
needs to be done ?

Samaneh

Re: MapTask.java -sort part

Posted by Samaneh Shokuhi <sa...@gmail.com>.
Currently i am doing some experiments with WordCount example ,so
considering this TestMerge.java ,it seems i need to change WordCount
example and add  MapOutputCopier and etc like what Mariappan done in
TestMerge   to nullify sort, right ?



Samaneh




On Tue, Apr 23, 2013 at 10:49 AM, Harsh J <ha...@cloudera.com> wrote:

> Yes as Mariappan has already pointed out; the sort is pluggable so the
> plug-in can also nullify it. See [1] for an example implementation of
> the MapOutputCollector interface (class MapOutputCopier) to start
> with. Per Mariappan, this does not do any sorting, but only merges.
>
> [1] -
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMerge.java
>
> On Tue, Apr 23, 2013 at 2:00 PM, Samaneh Shokuhi
> <sa...@gmail.com> wrote:
> > Hi Harsh,
> > That is nice if branch-2 included that .Just one thing i want to be sure
> > about that .In this patch MAPREDUCE-4807 (branch-2)  , sort in mapper is
> > avoided ? Becouse i need a version of hadoop without sorting part in
> mapper
> > and thats the reason i wanted to modify MapTask to disable the mapper
> sort
> > .Do you think branch-2 is appropriate version for me ?
> >
> > Samaneh
> >
> >
> > On Tue, Apr 23, 2013 at 10:01 AM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> Note that the Reducer also does a merge sort over the acquired map
> >> data; but am not sure if you looked there as well.
> >>
> >> Mariappan's pointed JIRA (MAPREDUCE-4807) is already available in the
> >> 2.0.3+ releases out today and in the current branch-2. It would be
> >> simpler to reuse that than do these modifications.
> >>
> >> On Sun, Apr 21, 2013 at 5:24 PM, Samaneh Shokuhi
> >> <sa...@gmail.com> wrote:
> >> > Hi All,
> >> > I have modified the MapTask.java file and tried to disable sort part
> for
> >> > some reasons . I have removed this line
> >> > *sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter);
> *in
> >> > sortAndSpill method but no changes seen in result. I have expected to
> get
> >> > unsorted keys as mapper output but was not like that.
> >> >
> >> > My question is did i perform right action to disable sort or something
> >> else
> >> > needs to be done ?
> >> >
> >> > Samaneh
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: MapTask.java -sort part

Posted by Harsh J <ha...@cloudera.com>.
Yes as Mariappan has already pointed out; the sort is pluggable so the
plug-in can also nullify it. See [1] for an example implementation of
the MapOutputCollector interface (class MapOutputCopier) to start
with. Per Mariappan, this does not do any sorting, but only merges.

[1] - http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMerge.java

On Tue, Apr 23, 2013 at 2:00 PM, Samaneh Shokuhi
<sa...@gmail.com> wrote:
> Hi Harsh,
> That is nice if branch-2 included that .Just one thing i want to be sure
> about that .In this patch MAPREDUCE-4807 (branch-2)  , sort in mapper is
> avoided ? Becouse i need a version of hadoop without sorting part in mapper
> and thats the reason i wanted to modify MapTask to disable the mapper sort
> .Do you think branch-2 is appropriate version for me ?
>
> Samaneh
>
>
> On Tue, Apr 23, 2013 at 10:01 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> Note that the Reducer also does a merge sort over the acquired map
>> data; but am not sure if you looked there as well.
>>
>> Mariappan's pointed JIRA (MAPREDUCE-4807) is already available in the
>> 2.0.3+ releases out today and in the current branch-2. It would be
>> simpler to reuse that than do these modifications.
>>
>> On Sun, Apr 21, 2013 at 5:24 PM, Samaneh Shokuhi
>> <sa...@gmail.com> wrote:
>> > Hi All,
>> > I have modified the MapTask.java file and tried to disable sort part for
>> > some reasons . I have removed this line
>> > *sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter); *in
>> > sortAndSpill method but no changes seen in result. I have expected to get
>> > unsorted keys as mapper output but was not like that.
>> >
>> > My question is did i perform right action to disable sort or something
>> else
>> > needs to be done ?
>> >
>> > Samaneh
>>
>>
>>
>> --
>> Harsh J
>>



--
Harsh J

Re: MapTask.java -sort part

Posted by Samaneh Shokuhi <sa...@gmail.com>.
Hi Harsh,
That is nice if branch-2 included that .Just one thing i want to be sure
about that .In this patch MAPREDUCE-4807 (branch-2)  , sort in mapper is
avoided ? Becouse i need a version of hadoop without sorting part in mapper
and thats the reason i wanted to modify MapTask to disable the mapper sort
.Do you think branch-2 is appropriate version for me ?

Samaneh


On Tue, Apr 23, 2013 at 10:01 AM, Harsh J <ha...@cloudera.com> wrote:

> Note that the Reducer also does a merge sort over the acquired map
> data; but am not sure if you looked there as well.
>
> Mariappan's pointed JIRA (MAPREDUCE-4807) is already available in the
> 2.0.3+ releases out today and in the current branch-2. It would be
> simpler to reuse that than do these modifications.
>
> On Sun, Apr 21, 2013 at 5:24 PM, Samaneh Shokuhi
> <sa...@gmail.com> wrote:
> > Hi All,
> > I have modified the MapTask.java file and tried to disable sort part for
> > some reasons . I have removed this line
> > *sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter); *in
> > sortAndSpill method but no changes seen in result. I have expected to get
> > unsorted keys as mapper output but was not like that.
> >
> > My question is did i perform right action to disable sort or something
> else
> > needs to be done ?
> >
> > Samaneh
>
>
>
> --
> Harsh J
>

Re: MapTask.java -sort part

Posted by Harsh J <ha...@cloudera.com>.
Note that the Reducer also does a merge sort over the acquired map
data; but am not sure if you looked there as well.

Mariappan's pointed JIRA (MAPREDUCE-4807) is already available in the
2.0.3+ releases out today and in the current branch-2. It would be
simpler to reuse that than do these modifications.

On Sun, Apr 21, 2013 at 5:24 PM, Samaneh Shokuhi
<sa...@gmail.com> wrote:
> Hi All,
> I have modified the MapTask.java file and tried to disable sort part for
> some reasons . I have removed this line
> *sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter); *in
> sortAndSpill method but no changes seen in result. I have expected to get
> unsorted keys as mapper output but was not like that.
>
> My question is did i perform right action to disable sort or something else
> needs to be done ?
>
> Samaneh



-- 
Harsh J

Re: MapTask.java -sort part

Posted by Samaneh Shokuhi <sa...@gmail.com>.
Hi Asokan,
Actually i am working on the local version of branch-2 and doing some
experiments for my thesis ,so since working offline  not sure from where i
should get  the MAPREDUCE-4807 ? Could you please guide me how to get this
patch?

Another question is how can i monitor the mapper output to be sure it is
really unsorted ?

Samaneh


On Sun, Apr 21, 2013 at 2:32 PM, Mariappan Asokan <ma...@gmail.com> wrote:

> Hi Samaneh,
>    Please take a look at the patch for MAPREDUCE-4807.  It allows one to
> plug in an external sort implementation in the map phase. There is a test
> TestMerge.java(which is part of the Jira) that has an implementation of a
> map phase sorter that avoids sorting.
>
> -- Asokan
>
>
> On 04/21/2013 07:54 AM, Samaneh Shokuhi wrote:
>
>> Hi All,
>> I have modified the MapTask.java file and tried to disable sort part for
>> some reasons . I have removed this line
>> *sorter.sort(MapOutputBuffer.**this, kvstart, endPosition, reporter); *in
>>
>> sortAndSpill method but no changes seen in result. I have expected to get
>> unsorted keys as mapper output but was not like that.
>>
>> My question is did i perform right action to disable sort or something
>> else
>> needs to be done ?
>>
>> Samaneh
>>
>>
>

Re: MapTask.java -sort part

Posted by Mariappan Asokan <ma...@gmail.com>.
Hi Samaneh,
    Please take a look at the patch for MAPREDUCE-4807.  It allows one 
to plug in an external sort implementation in the map phase. There is a 
test TestMerge.java(which is part of the Jira) that has an 
implementation of a map phase sorter that avoids sorting.

-- Asokan

On 04/21/2013 07:54 AM, Samaneh Shokuhi wrote:
> Hi All,
> I have modified the MapTask.java file and tried to disable sort part for
> some reasons . I have removed this line
> *sorter.sort(MapOutputBuffer.this, kvstart, endPosition, reporter); *in
> sortAndSpill method but no changes seen in result. I have expected to get
> unsorted keys as mapper output but was not like that.
>
> My question is did i perform right action to disable sort or something else
> needs to be done ?
>
> Samaneh
>