You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Da Zheng <zh...@gmail.com> on 2010/11/28 08:40:13 UTC

delay the execution of reducers

Hello,

I found in Hadoop that reducers starts when a fraction of the number of mappers
is complete. However, in my case, I hope reducers to start only when all mappers
are complete. I searched for Hadoop configuration parameters, and found
mapred.reduce.slowstart.completed.maps, which seems to do what I want. But no
matter what value (0.99, 1.00, etc) I set to
mapred.reduce.slowstart.completed.maps, reducers always start to execute when
about 10% of mappers are complete.

Do I set the right parameter? Is there any other parameter I can use for this
purpose?

Thanks,
Da

Re: delay the execution of reducers

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,
> Changing the parameter for a specific job works better for me.
>
> But I was asking in general in which configuration file(s) should I change
> the value of the parameters.
> For parameters in hdfs-site.xml, I should changes the configuration file in
> each machine. But for parameters in mapred-site.xml, it seems enough to
> change the configuration file in the machine where the job is launched

Ideally, if you knew which processes need to read the configuration
value, you can set it in the configuration files on nodes running
those processes. For instance, you knew a parameter is only required
on the NameNode, you can set it in the hdfs-site.xml on the NameNode
and so on. If in doubt though, it almost always helps to set the same
value in the configuration files on all nodes.

Thanks
Hemanth

> Thanks,
> Da
>
> On 11/29/2010 01:31 PM, Arun C Murthy wrote:
>>
>> Just set it for you  job.
>>
>> In your launching program do something like:
>>
>> jobConf.setFloat("mapred.reduce.slowstart.completed.maps", 0.5);
>>
>> On Nov 29, 2010, at 9:46 AM, Da Zheng wrote:
>>
>>> On 11/29/2010 05:42 AM, Chandraprakash Bhagtani wrote:
>>>>
>>>> you can see whether your property is in effect by looking at the
>>>> following
>>>> URL
>>>> http://<jobtracker-host>:50030/jobconf.jsp?jobid=<job-id>
>>>>
>>>> replace<jobtracker-host>  with your jobtracker ip and<job-id>  with the
>>>> running job
>>>>
>>>> have you restarted mapreduce after changing mapred-site.xml?
>>>>
>>> It shows me the value is still 0.05. I am a little confused. Since
>>> hadoop in each machine has configuration files, which configuration
>>> files should I change? For mapred-site.xml, I only need to change the
>>> one in the master node? (I always start my MapReduce program from the
>>> master node). What about other configuration files such as core-site.xml
>>> and hdfs-site.xml? I guess I have to change them on all machines in the
>>> cluster.
>>>
>>> Thanks,
>>> Da
>>
>
>

Re: delay the execution of reducers

Posted by Da Zheng <zh...@gmail.com>.
Changing the parameter for a specific job works better for me.

But I was asking in general in which configuration file(s) should I 
change the value of the parameters.
For parameters in hdfs-site.xml, I should changes the configuration file 
in each machine. But for parameters in mapred-site.xml, it seems enough 
to change the configuration file in the machine where the job is launched

Thanks,
Da

On 11/29/2010 01:31 PM, Arun C Murthy wrote:
> Just set it for you  job.
>
> In your launching program do something like:
>
> jobConf.setFloat("mapred.reduce.slowstart.completed.maps", 0.5);
>
> On Nov 29, 2010, at 9:46 AM, Da Zheng wrote:
>
>> On 11/29/2010 05:42 AM, Chandraprakash Bhagtani wrote:
>>> you can see whether your property is in effect by looking at the 
>>> following
>>> URL
>>> http://<jobtracker-host>:50030/jobconf.jsp?jobid=<job-id>
>>>
>>> replace<jobtracker-host>  with your jobtracker ip and<job-id>  with the
>>> running job
>>>
>>> have you restarted mapreduce after changing mapred-site.xml?
>>>
>> It shows me the value is still 0.05. I am a little confused. Since
>> hadoop in each machine has configuration files, which configuration
>> files should I change? For mapred-site.xml, I only need to change the
>> one in the master node? (I always start my MapReduce program from the
>> master node). What about other configuration files such as core-site.xml
>> and hdfs-site.xml? I guess I have to change them on all machines in the
>> cluster.
>>
>> Thanks,
>> Da
>


Re: delay the execution of reducers

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Just set it for you  job.

In your launching program do something like:

jobConf.setFloat("mapred.reduce.slowstart.completed.maps", 0.5);

On Nov 29, 2010, at 9:46 AM, Da Zheng wrote:

> On 11/29/2010 05:42 AM, Chandraprakash Bhagtani wrote:
>> you can see whether your property is in effect by looking at the  
>> following
>> URL
>> http://<jobtracker-host>:50030/jobconf.jsp?jobid=<job-id>
>>
>> replace<jobtracker-host>  with your jobtracker ip and<job-id>  with  
>> the
>> running job
>>
>> have you restarted mapreduce after changing mapred-site.xml?
>>
> It shows me the value is still 0.05. I am a little confused. Since
> hadoop in each machine has configuration files, which configuration
> files should I change? For mapred-site.xml, I only need to change the
> one in the master node? (I always start my MapReduce program from the
> master node). What about other configuration files such as core- 
> site.xml
> and hdfs-site.xml? I guess I have to change them on all machines in  
> the
> cluster.
>
> Thanks,
> Da


Re: delay the execution of reducers

Posted by Da Zheng <zh...@gmail.com>.
On 11/29/2010 05:42 AM, Chandraprakash Bhagtani wrote:
> you can see whether your property is in effect by looking at the following
> URL
> http://<jobtracker-host>:50030/jobconf.jsp?jobid=<job-id>
>
> replace<jobtracker-host>  with your jobtracker ip and<job-id>  with the
> running job
>
> have you restarted mapreduce after changing mapred-site.xml?
>
It shows me the value is still 0.05. I am a little confused. Since 
hadoop in each machine has configuration files, which configuration 
files should I change? For mapred-site.xml, I only need to change the 
one in the master node? (I always start my MapReduce program from the 
master node). What about other configuration files such as core-site.xml 
and hdfs-site.xml? I guess I have to change them on all machines in the 
cluster.

Thanks,
Da

Re: delay the execution of reducers

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
you can see whether your property is in effect by looking at the following
URL
http://<jobtracker-host>:50030/jobconf.jsp?jobid=<job-id>

replace <jobtracker-host> with your jobtracker ip and <job-id> with the
running job

have you restarted mapreduce after changing mapred-site.xml?




On Mon, Nov 29, 2010 at 6:56 AM, li ping <li...@gmail.com> wrote:

> org.apache.hadoop.mapred.JobInProgress
>
> Maybe you find this class.
>
> On Mon, Nov 29, 2010 at 4:36 AM, Da Zheng <zh...@gmail.com> wrote:
>
> > I have a problem with subscribing mapreduce mailing list.
> >
> > I use hadoop-0.20.2. I have added this parameter to mapred-site.xml. Is
> > there any way for me to check whether the parameter has been read and
> > activated?
> >
> > BTW, what do you mean by opening a jira?
> >
> > Thanks,
> > Da
> >
> >
> > On 11/28/2010 05:03 AM, Arun C Murthy wrote:
> >
> >> Moving to mapreduce-user@, bcc common-user@. Please use project
> >> specific lists.
> >>
> >> mapreduce.reduce.slowstart.completed.maps is the right knob. Which
> version
> >> of hadoop are you running? If it isn't working, please open a jira.
> Thanks.
> >>
> >> Arun
> >>
> >> On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:
> >>
> >>  Hello,
> >>>
> >>> I found in Hadoop that reducers starts when a fraction of the number of
> >>> mappers
> >>> is complete. However, in my case, I hope reducers to start only when
> all
> >>> mappers
> >>> are complete. I searched for Hadoop configuration parameters, and found
> >>> mapred.reduce.slowstart.completed.maps, which seems to do what I want.
> >>> But no
> >>> matter what value (0.99, 1.00, etc) I set to
> >>> mapred.reduce.slowstart.completed.maps, reducers always start to
> execute
> >>> when
> >>> about 10% of mappers are complete.
> >>>
> >>> Do I set the right parameter? Is there any other parameter I can use
> for
> >>> this
> >>> purpose?
> >>>
> >>> Thanks,
> >>> Da
> >>>
> >>
> >>
> >
>
>
> --
> -----李平
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Nokia India Pvt. Ltd.

Re: delay the execution of reducers

Posted by li ping <li...@gmail.com>.
org.apache.hadoop.mapred.JobInProgress

Maybe you find this class.

On Mon, Nov 29, 2010 at 4:36 AM, Da Zheng <zh...@gmail.com> wrote:

> I have a problem with subscribing mapreduce mailing list.
>
> I use hadoop-0.20.2. I have added this parameter to mapred-site.xml. Is
> there any way for me to check whether the parameter has been read and
> activated?
>
> BTW, what do you mean by opening a jira?
>
> Thanks,
> Da
>
>
> On 11/28/2010 05:03 AM, Arun C Murthy wrote:
>
>> Moving to mapreduce-user@, bcc common-user@. Please use project
>> specific lists.
>>
>> mapreduce.reduce.slowstart.completed.maps is the right knob. Which version
>> of hadoop are you running? If it isn't working, please open a jira. Thanks.
>>
>> Arun
>>
>> On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:
>>
>>  Hello,
>>>
>>> I found in Hadoop that reducers starts when a fraction of the number of
>>> mappers
>>> is complete. However, in my case, I hope reducers to start only when all
>>> mappers
>>> are complete. I searched for Hadoop configuration parameters, and found
>>> mapred.reduce.slowstart.completed.maps, which seems to do what I want.
>>> But no
>>> matter what value (0.99, 1.00, etc) I set to
>>> mapred.reduce.slowstart.completed.maps, reducers always start to execute
>>> when
>>> about 10% of mappers are complete.
>>>
>>> Do I set the right parameter? Is there any other parameter I can use for
>>> this
>>> purpose?
>>>
>>> Thanks,
>>> Da
>>>
>>
>>
>


-- 
-----李平

Re: delay the execution of reducers

Posted by li ping <li...@gmail.com>.
org.apache.hadoop.mapred.JobInProgress

Maybe you find this class.

On Mon, Nov 29, 2010 at 4:36 AM, Da Zheng <zh...@gmail.com> wrote:

> I have a problem with subscribing mapreduce mailing list.
>
> I use hadoop-0.20.2. I have added this parameter to mapred-site.xml. Is
> there any way for me to check whether the parameter has been read and
> activated?
>
> BTW, what do you mean by opening a jira?
>
> Thanks,
> Da
>
>
> On 11/28/2010 05:03 AM, Arun C Murthy wrote:
>
>> Moving to mapreduce-user@, bcc common-user@. Please use project
>> specific lists.
>>
>> mapreduce.reduce.slowstart.completed.maps is the right knob. Which version
>> of hadoop are you running? If it isn't working, please open a jira. Thanks.
>>
>> Arun
>>
>> On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:
>>
>>  Hello,
>>>
>>> I found in Hadoop that reducers starts when a fraction of the number of
>>> mappers
>>> is complete. However, in my case, I hope reducers to start only when all
>>> mappers
>>> are complete. I searched for Hadoop configuration parameters, and found
>>> mapred.reduce.slowstart.completed.maps, which seems to do what I want.
>>> But no
>>> matter what value (0.99, 1.00, etc) I set to
>>> mapred.reduce.slowstart.completed.maps, reducers always start to execute
>>> when
>>> about 10% of mappers are complete.
>>>
>>> Do I set the right parameter? Is there any other parameter I can use for
>>> this
>>> purpose?
>>>
>>> Thanks,
>>> Da
>>>
>>
>>
>


-- 
-----李平

Re: delay the execution of reducers

Posted by Da Zheng <zh...@gmail.com>.
I have a problem with subscribing mapreduce mailing list.

I use hadoop-0.20.2. I have added this parameter to mapred-site.xml. Is 
there any way for me to check whether the parameter has been read and 
activated?

BTW, what do you mean by opening a jira?

Thanks,
Da

On 11/28/2010 05:03 AM, Arun C Murthy wrote:
> Moving to mapreduce-user@, bcc common-user@. Please use project
> specific lists.
>
> mapreduce.reduce.slowstart.completed.maps is the right knob. Which 
> version of hadoop are you running? If it isn't working, please open a 
> jira. Thanks.
>
> Arun
>
> On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:
>
>> Hello,
>>
>> I found in Hadoop that reducers starts when a fraction of the number 
>> of mappers
>> is complete. However, in my case, I hope reducers to start only when 
>> all mappers
>> are complete. I searched for Hadoop configuration parameters, and found
>> mapred.reduce.slowstart.completed.maps, which seems to do what I 
>> want. But no
>> matter what value (0.99, 1.00, etc) I set to
>> mapred.reduce.slowstart.completed.maps, reducers always start to 
>> execute when
>> about 10% of mappers are complete.
>>
>> Do I set the right parameter? Is there any other parameter I can use 
>> for this
>> purpose?
>>
>> Thanks,
>> Da
>


Re: delay the execution of reducers

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Moving to mapreduce-user@, bcc common-user@. Please use project
specific lists.

mapreduce.reduce.slowstart.completed.maps is the right knob. Which  
version of hadoop are you running? If it isn't working, please open a  
jira. Thanks.

Arun

On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:

> Hello,
>
> I found in Hadoop that reducers starts when a fraction of the number  
> of mappers
> is complete. However, in my case, I hope reducers to start only when  
> all mappers
> are complete. I searched for Hadoop configuration parameters, and  
> found
> mapred.reduce.slowstart.completed.maps, which seems to do what I  
> want. But no
> matter what value (0.99, 1.00, etc) I set to
> mapred.reduce.slowstart.completed.maps, reducers always start to  
> execute when
> about 10% of mappers are complete.
>
> Do I set the right parameter? Is there any other parameter I can use  
> for this
> purpose?
>
> Thanks,
> Da


Re: delay the execution of reducers

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Moving to mapreduce-user@, bcc common-user@. Please use project
specific lists.

mapreduce.reduce.slowstart.completed.maps is the right knob. Which  
version of hadoop are you running? If it isn't working, please open a  
jira. Thanks.

Arun

On Nov 27, 2010, at 11:40 PM, Da Zheng wrote:

> Hello,
>
> I found in Hadoop that reducers starts when a fraction of the number  
> of mappers
> is complete. However, in my case, I hope reducers to start only when  
> all mappers
> are complete. I searched for Hadoop configuration parameters, and  
> found
> mapred.reduce.slowstart.completed.maps, which seems to do what I  
> want. But no
> matter what value (0.99, 1.00, etc) I set to
> mapred.reduce.slowstart.completed.maps, reducers always start to  
> execute when
> about 10% of mappers are complete.
>
> Do I set the right parameter? Is there any other parameter I can use  
> for this
> purpose?
>
> Thanks,
> Da