You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2011/05/03 00:53:19 UTC

LocalJobRunner and # of reducers

Hi,

i was trying to create a test based on mapreduce job in a local mode
testing various partitioning issues.

But curiously, whenever i switch mapreduce into local node, i can't
seem to be able to configure multiple reduce tasks.

Indeed, upon some investigation i found that the following fragment in
LocalJobRunner resets all reducers to 1 :

/* 177 */         int numReduceTasks = this.job.getNumReduceTasks();
/* 178 */         if ((numReduceTasks > 1) || (numReduceTasks < 0))
/*     */         {
/* 180 */           numReduceTasks = 1;
/* 181 */           this.job.setNumReduceTasks(1);
/*     */         }
/* 183 */         outputCommitter.setupJob(jContext);
/* 184 */         this.status.setSetupProgress(1.0F);
/*     */
/* 186 */         Map mapOutputFiles = new HashMap();
/*     */


Is this a fundamental limitation of the local mapreduce mode? what if
i need to write up a unit test that checks various partitioning
functions? Is there a workaround?

Also, i don't remember these problems when writing tests based on
local mapreduce in previous versions (this is cdh3b4) , although i
cannot be sure if i ran into exactly same situation before.

thanks.
-Dmitriy

Re: LocalJobRunner and # of reducers

Posted by Tom White <to...@cloudera.com>.
See also https://issues.apache.org/jira/browse/MAPREDUCE-434 which has
a patch for this issue.

Cheers,
Tom

On Mon, May 2, 2011 at 5:13 PM, jason <ur...@gmail.com> wrote:
> I am attaching the originals so you could figure out the diffs on your own :)
>
> On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> Thanks a bunch!
>>
>> (is there any chance you could do a diff only ? )
>>
>> -d
>>
>> On Mon, May 2, 2011 at 4:47 PM, jason <ur...@gmail.com> wrote:
>>> Dmitriy,
>>>
>>> I remember I had the same problem with local jobs when I tried to
>>> debug my multi-reducer use cases. So had to create this small patch
>>> that resolves the issue.
>>> You can put these classes into org.apache.hadoop.mapred package in
>>> your local project and make sure they preceed Hadoop's jars in the
>>> class path.
>>>
>>> My patch is based on Cloudera 0.20.2+320 release.
>>>
>>> Hope this helps.
>>>
>>>
>>> On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> i was trying to create a test based on mapreduce job in a local mode
>>>> testing various partitioning issues.
>>>>
>>>> But curiously, whenever i switch mapreduce into local node, i can't
>>>> seem to be able to configure multiple reduce tasks.
>>>>
>>>> Indeed, upon some investigation i found that the following fragment in
>>>> LocalJobRunner resets all reducers to 1 :
>>>>
>>>> /* 177 */         int numReduceTasks = this.job.getNumReduceTasks();
>>>> /* 178 */         if ((numReduceTasks > 1) || (numReduceTasks < 0))
>>>> /*     */         {
>>>> /* 180 */           numReduceTasks = 1;
>>>> /* 181 */           this.job.setNumReduceTasks(1);
>>>> /*     */         }
>>>> /* 183 */         outputCommitter.setupJob(jContext);
>>>> /* 184 */         this.status.setSetupProgress(1.0F);
>>>> /*     */
>>>> /* 186 */         Map mapOutputFiles = new HashMap();
>>>> /*     */
>>>>
>>>>
>>>> Is this a fundamental limitation of the local mapreduce mode? what if
>>>> i need to write up a unit test that checks various partitioning
>>>> functions? Is there a workaround?
>>>>
>>>> Also, i don't remember these problems when writing tests based on
>>>> local mapreduce in previous versions (this is cdh3b4) , although i
>>>> cannot be sure if i ran into exactly same situation before.
>>>>
>>>> thanks.
>>>> -Dmitriy
>>>>
>>>
>>
>

Re: LocalJobRunner and # of reducers

Posted by jason <ur...@gmail.com>.
I am attaching the originals so you could figure out the diffs on your own :)

On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Thanks a bunch!
>
> (is there any chance you could do a diff only ? )
>
> -d
>
> On Mon, May 2, 2011 at 4:47 PM, jason <ur...@gmail.com> wrote:
>> Dmitriy,
>>
>> I remember I had the same problem with local jobs when I tried to
>> debug my multi-reducer use cases. So had to create this small patch
>> that resolves the issue.
>> You can put these classes into org.apache.hadoop.mapred package in
>> your local project and make sure they preceed Hadoop's jars in the
>> class path.
>>
>> My patch is based on Cloudera 0.20.2+320 release.
>>
>> Hope this helps.
>>
>>
>> On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>> Hi,
>>>
>>> i was trying to create a test based on mapreduce job in a local mode
>>> testing various partitioning issues.
>>>
>>> But curiously, whenever i switch mapreduce into local node, i can't
>>> seem to be able to configure multiple reduce tasks.
>>>
>>> Indeed, upon some investigation i found that the following fragment in
>>> LocalJobRunner resets all reducers to 1 :
>>>
>>> /* 177 */         int numReduceTasks = this.job.getNumReduceTasks();
>>> /* 178 */         if ((numReduceTasks > 1) || (numReduceTasks < 0))
>>> /*     */         {
>>> /* 180 */           numReduceTasks = 1;
>>> /* 181 */           this.job.setNumReduceTasks(1);
>>> /*     */         }
>>> /* 183 */         outputCommitter.setupJob(jContext);
>>> /* 184 */         this.status.setSetupProgress(1.0F);
>>> /*     */
>>> /* 186 */         Map mapOutputFiles = new HashMap();
>>> /*     */
>>>
>>>
>>> Is this a fundamental limitation of the local mapreduce mode? what if
>>> i need to write up a unit test that checks various partitioning
>>> functions? Is there a workaround?
>>>
>>> Also, i don't remember these problems when writing tests based on
>>> local mapreduce in previous versions (this is cdh3b4) , although i
>>> cannot be sure if i ran into exactly same situation before.
>>>
>>> thanks.
>>> -Dmitriy
>>>
>>
>

Re: LocalJobRunner and # of reducers

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Thanks a bunch!

(is there any chance you could do a diff only ? )

-d

On Mon, May 2, 2011 at 4:47 PM, jason <ur...@gmail.com> wrote:
> Dmitriy,
>
> I remember I had the same problem with local jobs when I tried to
> debug my multi-reducer use cases. So had to create this small patch
> that resolves the issue.
> You can put these classes into org.apache.hadoop.mapred package in
> your local project and make sure they preceed Hadoop's jars in the
> class path.
>
> My patch is based on Cloudera 0.20.2+320 release.
>
> Hope this helps.
>
>
> On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> Hi,
>>
>> i was trying to create a test based on mapreduce job in a local mode
>> testing various partitioning issues.
>>
>> But curiously, whenever i switch mapreduce into local node, i can't
>> seem to be able to configure multiple reduce tasks.
>>
>> Indeed, upon some investigation i found that the following fragment in
>> LocalJobRunner resets all reducers to 1 :
>>
>> /* 177 */         int numReduceTasks = this.job.getNumReduceTasks();
>> /* 178 */         if ((numReduceTasks > 1) || (numReduceTasks < 0))
>> /*     */         {
>> /* 180 */           numReduceTasks = 1;
>> /* 181 */           this.job.setNumReduceTasks(1);
>> /*     */         }
>> /* 183 */         outputCommitter.setupJob(jContext);
>> /* 184 */         this.status.setSetupProgress(1.0F);
>> /*     */
>> /* 186 */         Map mapOutputFiles = new HashMap();
>> /*     */
>>
>>
>> Is this a fundamental limitation of the local mapreduce mode? what if
>> i need to write up a unit test that checks various partitioning
>> functions? Is there a workaround?
>>
>> Also, i don't remember these problems when writing tests based on
>> local mapreduce in previous versions (this is cdh3b4) , although i
>> cannot be sure if i ran into exactly same situation before.
>>
>> thanks.
>> -Dmitriy
>>
>

Re: LocalJobRunner and # of reducers

Posted by jason <ur...@gmail.com>.
Dmitriy,

I remember I had the same problem with local jobs when I tried to
debug my multi-reducer use cases. So had to create this small patch
that resolves the issue.
You can put these classes into org.apache.hadoop.mapred package in
your local project and make sure they preceed Hadoop's jars in the
class path.

My patch is based on Cloudera 0.20.2+320 release.

Hope this helps.


On 5/2/11, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Hi,
>
> i was trying to create a test based on mapreduce job in a local mode
> testing various partitioning issues.
>
> But curiously, whenever i switch mapreduce into local node, i can't
> seem to be able to configure multiple reduce tasks.
>
> Indeed, upon some investigation i found that the following fragment in
> LocalJobRunner resets all reducers to 1 :
>
> /* 177 */         int numReduceTasks = this.job.getNumReduceTasks();
> /* 178 */         if ((numReduceTasks > 1) || (numReduceTasks < 0))
> /*     */         {
> /* 180 */           numReduceTasks = 1;
> /* 181 */           this.job.setNumReduceTasks(1);
> /*     */         }
> /* 183 */         outputCommitter.setupJob(jContext);
> /* 184 */         this.status.setSetupProgress(1.0F);
> /*     */
> /* 186 */         Map mapOutputFiles = new HashMap();
> /*     */
>
>
> Is this a fundamental limitation of the local mapreduce mode? what if
> i need to write up a unit test that checks various partitioning
> functions? Is there a workaround?
>
> Also, i don't remember these problems when writing tests based on
> local mapreduce in previous versions (this is cdh3b4) , although i
> cannot be sure if i ran into exactly same situation before.
>
> thanks.
> -Dmitriy
>