You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "C.V. Krishnakumar Iyer" <cv...@me.com> on 2012/08/30 06:06:38 UTC

Number of Reducers in PFP Growth is always 1 !!!

Hi,

Quick question regarding PFPGrowth in Mahout 0.6:

I see that there are no options to set the number of reducers in the parallel counting phase of PFP Growth. It is just simple word count - so I'm guessing it should be parallelized. But for some reason it is not!

Is that intentional?

Regards,
Krishnakumar.

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by 戴清灏 <ro...@gmail.com>.
PFP consists of three mapreduce jobs.
which job did you modified?
And do you mean the number of total reduce task or
just reduce task that is running?

Regards,
Q



2012/8/30 C.V.Krishnakumar Iyer <cv...@me.com>

> Hi,
>
> I've already tried setting it in the code using job.setNumReduceTasks()
> and conf.set("mapred.reduce.tasks","100").
> However, it does not seem to take the number of reducers at all, even for
> the job that does parallel counting. Any advice would be appreciated.
> Regards,
> Krishnakumar.
> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>
> > I doubt that you specify the config in hadoop config xml file.
> >
> > --
> > Regards,
> > Q
> >
> >
> >
> > 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
> >
> >> Hi,
> >>
> >> Quick question regarding PFPGrowth in Mahout 0.6:
> >>
> >> I see that there are no options to set the number of reducers in the
> >> parallel counting phase of PFP Growth. It is just simple word count - so
> >> I'm guessing it should be parallelized. But for some reason it is not!
> >>
> >> Is that intentional?
> >>
> >> Regards,
> >> Krishnakumar.
> >>
>
>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by "C.V. Krishnakumar Iyer" <cv...@me.com>.
Hi,

Thanks for the reply. We verified the configurations. When we debugged through the driver locally, we see that the conf *does* have the property  mapred.reduce.tasks set to 100. However, when the job launches, we see that the number of reducers is 1. 
Do you know of any possible places where this property could be overwritten?

Thanks,
Krishnakumar

On Aug 30, 2012, at 2:05 AM, Sean Owen wrote:

> Block size and input size should not matter for the Reducer. You do have to
> explicitly say the number of workers.
> 
> It defaults to 1. You do set it with just these methods. Make sure you are
> setting on the right object and before you run. Look for other things that
> may be overriding it.
> 
> I don't know this job maybe it is forcing 1 for some reason.
> On Aug 30, 2012 9:58 AM, "Paritosh Ranjan" <pr...@xebia.com> wrote:
> 
>> If the problem is only the number of reduce tasks, then you can try to
>> reduce the dfs block size. This might help in triggering multiple reducers.
>> Also check the size of the mapper's output, if its greater than the block
>> size ( or the mapper output is scattered in multiple files ) , then only
>> multiple reducers would be triggered.
>> 
>> HTH,
>> Paritosh
>> 
>> On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
>> 
>>> Hi,
>>> 
>>> I've already tried setting it in the code using job.setNumReduceTasks()
>>> and conf.set("mapred.reduce.tasks"**,"100").
>>> However, it does not seem to take the number of reducers at all, even for
>>> the job that does parallel counting. Any advice would be appreciated.
>>> Regards,
>>> Krishnakumar.
>>> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>>> 
>>> I doubt that you specify the config in hadoop config xml file.
>>>> 
>>>> --
>>>> Regards,
>>>> Q
>>>> 
>>>> 
>>>> 
>>>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>>> 
>>>> Hi,
>>>>> 
>>>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>>> 
>>>>> I see that there are no options to set the number of reducers in the
>>>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>>> 
>>>>> Is that intentional?
>>>>> 
>>>>> Regards,
>>>>> Krishnakumar.
>>>>> 
>>>>> 
>> 
>> 


Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by Sean Owen <sr...@gmail.com>.
Block size and input size should not matter for the Reducer. You do have to
explicitly say the number of workers.

It defaults to 1. You do set it with just these methods. Make sure you are
setting on the right object and before you run. Look for other things that
may be overriding it.

I don't know this job maybe it is forcing 1 for some reason.
 On Aug 30, 2012 9:58 AM, "Paritosh Ranjan" <pr...@xebia.com> wrote:

> If the problem is only the number of reduce tasks, then you can try to
> reduce the dfs block size. This might help in triggering multiple reducers.
> Also check the size of the mapper's output, if its greater than the block
> size ( or the mapper output is scattered in multiple files ) , then only
> multiple reducers would be triggered.
>
> HTH,
> Paritosh
>
> On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
>
>> Hi,
>>
>> I've already tried setting it in the code using job.setNumReduceTasks()
>> and conf.set("mapred.reduce.tasks"**,"100").
>> However, it does not seem to take the number of reducers at all, even for
>> the job that does parallel counting. Any advice would be appreciated.
>> Regards,
>> Krishnakumar.
>> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>>
>>  I doubt that you specify the config in hadoop config xml file.
>>>
>>> --
>>> Regards,
>>> Q
>>>
>>>
>>>
>>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>>
>>>  Hi,
>>>>
>>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>>
>>>> I see that there are no options to set the number of reducers in the
>>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>>
>>>> Is that intentional?
>>>>
>>>> Regards,
>>>> Krishnakumar.
>>>>
>>>>
>
>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by Paritosh Ranjan <pr...@xebia.com>.
If the problem is only the number of reduce tasks, then you can try to 
reduce the dfs block size. This might help in triggering multiple reducers.
Also check the size of the mapper's output, if its greater than the 
block size ( or the mapper output is scattered in multiple files ) , 
then only multiple reducers would be triggered.

HTH,
Paritosh

On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
> Hi,
>
> I've already tried setting it in the code using job.setNumReduceTasks() and conf.set("mapred.reduce.tasks","100").
> However, it does not seem to take the number of reducers at all, even for the job that does parallel counting. Any advice would be appreciated.
> Regards,
> Krishnakumar.
> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>
>> I doubt that you specify the config in hadoop config xml file.
>>
>> --
>> Regards,
>> Q
>>
>>
>>
>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>
>>> Hi,
>>>
>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>
>>> I see that there are no options to set the number of reducers in the
>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>
>>> Is that intentional?
>>>
>>> Regards,
>>> Krishnakumar.
>>>



Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by "C.V.Krishnakumar Iyer" <cv...@me.com>.
Hi,

I've already tried setting it in the code using job.setNumReduceTasks() and conf.set("mapred.reduce.tasks","100"). 
However, it does not seem to take the number of reducers at all, even for the job that does parallel counting. Any advice would be appreciated.
Regards,
Krishnakumar.
On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:

> I doubt that you specify the config in hadoop config xml file.
> 
> --
> Regards,
> Q
> 
> 
> 
> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
> 
>> Hi,
>> 
>> Quick question regarding PFPGrowth in Mahout 0.6:
>> 
>> I see that there are no options to set the number of reducers in the
>> parallel counting phase of PFP Growth. It is just simple word count - so
>> I'm guessing it should be parallelized. But for some reason it is not!
>> 
>> Is that intentional?
>> 
>> Regards,
>> Krishnakumar.
>> 


Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by 戴清灏 <ro...@gmail.com>.
I doubt that you specify the config in hadoop config xml file.

--
Regards,
Q



2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>

> Hi,
>
> Quick question regarding PFPGrowth in Mahout 0.6:
>
> I see that there are no options to set the number of reducers in the
> parallel counting phase of PFP Growth. It is just simple word count - so
> I'm guessing it should be parallelized. But for some reason it is not!
>
> Is that intentional?
>
> Regards,
> Krishnakumar.
>