You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by gortiz <go...@pragsis.com> on 2014/10/03 14:29:12 UTC

How to get the max number of reducers in Yarn

I have been working with MapReduce1, (JobTracker and TaskTrakers).
Some of my jobs I want to define the number of reduces to the maximum
capacity of my cluster.

I did it with this:
int max = new JobClient(new
JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
Job job = new Job(jConf, this.getClass().getName());
job.setNumReduceTasks(max);

Now, I want to work with YARN and it seems that it doesn't work. I think
that YARN manages the number of reducers in real time depending of the
resources it has available. The method getMaxReduceTasks it returns me
just two.
don't know if there's another way to set the number the reducer to the
real capacity of the cluster or what I'm doing wrong. I guess that if I
don't use setNumReduceTaskm, it'll get one because the default value.
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

You should setNumberReducerTask in your job, just there is no such max reducer count in the Yarn any more.
Setting reducer count is kind of art, instead of science. 
I think there is only one rule about it, don't set the reducer number larger than the reducer input group count.
Set the reducer number larger than the reducer input group count will make some reducers have no data to run, then just waste the resource. Other than that, it only depends how fast you want your job to finish vs how many other jobs running in the cluster at the same time.
In my experience, in a multi tenants cluster, set it as high as close to the reducer input group count is a good choice. 
This will require your job asking as many reducers as possible, but not more than the input group count.This will make each reducer process as least input groups as possible, so the reducer can be finished faster, so other jobs can get a fair chance for the reducer slot.
One problem with the above setting is that your job could use all the reducers in the beginning. If your job is really big, your initial reducer tasks could take very long to finish, make your job look bad in the cluster :-)
Yong

> Date: Sun, 5 Oct 2014 15:05:05 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: Re: How to get the max number of reducers in Yarn
> 
> 
> Thanks for all your answers. 
> 
> So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??
> 
> If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.
> 
> 
> 
> ----- Mensaje original -----
> De: "Ramil Malikov" <vi...@gmail.com>
> Para: user@hadoop.apache.org
> Enviados: Viernes, 3 de Octubre 2014 15:36:48
> Asunto: Re: How to get the max number of reducers in Yarn
> 
> Hi.
> 
> (For Hadoop 2.2.0)
> 
> Nope. Number of mappers depends on number of splits (392 line of 
> JobSubmitter).
> Number of reducers depends on property mapreduce.job.reduces.
> 
> So, you can setup like this:
> 
> final Configuration configuration = new Configuration();
> configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");
> 
> Make sure that you don't overwrite this configuration during job setup.
> 
> On 10/03/2014 04:29 PM, gortiz wrote:
> > I have been working with MapReduce1, (JobTracker and TaskTrakers).
> > Some of my jobs I want to define the number of reduces to the maximum 
> > capacity of my cluster.
> >
> > I did it with this:
> > int max = new JobClient(new 
> > JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> > Job job = new Job(jConf, this.getClass().getName());
> > job.setNumReduceTasks(max);
> >
> > Now, I want to work with YARN and it seems that it doesn't work. I 
> > think that YARN manages the number of reducers in real time depending 
> > of the resources it has available. The method getMaxReduceTasks it 
> > returns me just two.
> >  don't know if there's another way to set the number the reducer to 
> > the real capacity of the cluster or what I'm doing wrong. I guess that 
> > if I don't use setNumReduceTaskm, it'll get one because the default 
> > value.
> > AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> > al mismo es privada y confidencial y va dirigida exclusivamente a su 
> > destinatario. Pragsis informa a quien pueda haber recibido este correo 
> > por error que contiene información confidencial cuyo uso, copia, 
> > reproducción o distribución está expresamente prohibida. Si no es Vd. 
> > el destinatario del mismo y recibe este correo por error, le rogamos 
> > lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> > copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> > WARNING.\nThis message and the information contained in or attached to 
> > it are private and confidential and intended exclusively for the 
> > addressee. Pragsis informs to whom it may receive it in error that it 
> > contains privileged information and its use, copy, reproduction or 
> > distribution is prohibited. If you are not an intended recipient of 
> > this E-mail, please notify the sender, delete it and do not read, act 
> > upon, print, disclose, copy, reta
> > in or redistribute any portion of this E-mail
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

You should setNumberReducerTask in your job, just there is no such max reducer count in the Yarn any more.
Setting reducer count is kind of art, instead of science. 
I think there is only one rule about it, don't set the reducer number larger than the reducer input group count.
Set the reducer number larger than the reducer input group count will make some reducers have no data to run, then just waste the resource. Other than that, it only depends how fast you want your job to finish vs how many other jobs running in the cluster at the same time.
In my experience, in a multi tenants cluster, set it as high as close to the reducer input group count is a good choice. 
This will require your job asking as many reducers as possible, but not more than the input group count.This will make each reducer process as least input groups as possible, so the reducer can be finished faster, so other jobs can get a fair chance for the reducer slot.
One problem with the above setting is that your job could use all the reducers in the beginning. If your job is really big, your initial reducer tasks could take very long to finish, make your job look bad in the cluster :-)
Yong

> Date: Sun, 5 Oct 2014 15:05:05 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: Re: How to get the max number of reducers in Yarn
> 
> 
> Thanks for all your answers. 
> 
> So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??
> 
> If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.
> 
> 
> 
> ----- Mensaje original -----
> De: "Ramil Malikov" <vi...@gmail.com>
> Para: user@hadoop.apache.org
> Enviados: Viernes, 3 de Octubre 2014 15:36:48
> Asunto: Re: How to get the max number of reducers in Yarn
> 
> Hi.
> 
> (For Hadoop 2.2.0)
> 
> Nope. Number of mappers depends on number of splits (392 line of 
> JobSubmitter).
> Number of reducers depends on property mapreduce.job.reduces.
> 
> So, you can setup like this:
> 
> final Configuration configuration = new Configuration();
> configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");
> 
> Make sure that you don't overwrite this configuration during job setup.
> 
> On 10/03/2014 04:29 PM, gortiz wrote:
> > I have been working with MapReduce1, (JobTracker and TaskTrakers).
> > Some of my jobs I want to define the number of reduces to the maximum 
> > capacity of my cluster.
> >
> > I did it with this:
> > int max = new JobClient(new 
> > JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> > Job job = new Job(jConf, this.getClass().getName());
> > job.setNumReduceTasks(max);
> >
> > Now, I want to work with YARN and it seems that it doesn't work. I 
> > think that YARN manages the number of reducers in real time depending 
> > of the resources it has available. The method getMaxReduceTasks it 
> > returns me just two.
> >  don't know if there's another way to set the number the reducer to 
> > the real capacity of the cluster or what I'm doing wrong. I guess that 
> > if I don't use setNumReduceTaskm, it'll get one because the default 
> > value.
> > AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> > al mismo es privada y confidencial y va dirigida exclusivamente a su 
> > destinatario. Pragsis informa a quien pueda haber recibido este correo 
> > por error que contiene información confidencial cuyo uso, copia, 
> > reproducción o distribución está expresamente prohibida. Si no es Vd. 
> > el destinatario del mismo y recibe este correo por error, le rogamos 
> > lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> > copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> > WARNING.\nThis message and the information contained in or attached to 
> > it are private and confidential and intended exclusively for the 
> > addressee. Pragsis informs to whom it may receive it in error that it 
> > contains privileged information and its use, copy, reproduction or 
> > distribution is prohibited. If you are not an intended recipient of 
> > this E-mail, please notify the sender, delete it and do not read, act 
> > upon, print, disclose, copy, reta
> > in or redistribute any portion of this E-mail
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

You should setNumberReducerTask in your job, just there is no such max reducer count in the Yarn any more.
Setting reducer count is kind of art, instead of science. 
I think there is only one rule about it, don't set the reducer number larger than the reducer input group count.
Set the reducer number larger than the reducer input group count will make some reducers have no data to run, then just waste the resource. Other than that, it only depends how fast you want your job to finish vs how many other jobs running in the cluster at the same time.
In my experience, in a multi tenants cluster, set it as high as close to the reducer input group count is a good choice. 
This will require your job asking as many reducers as possible, but not more than the input group count.This will make each reducer process as least input groups as possible, so the reducer can be finished faster, so other jobs can get a fair chance for the reducer slot.
One problem with the above setting is that your job could use all the reducers in the beginning. If your job is really big, your initial reducer tasks could take very long to finish, make your job look bad in the cluster :-)
Yong

> Date: Sun, 5 Oct 2014 15:05:05 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: Re: How to get the max number of reducers in Yarn
> 
> 
> Thanks for all your answers. 
> 
> So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??
> 
> If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.
> 
> 
> 
> ----- Mensaje original -----
> De: "Ramil Malikov" <vi...@gmail.com>
> Para: user@hadoop.apache.org
> Enviados: Viernes, 3 de Octubre 2014 15:36:48
> Asunto: Re: How to get the max number of reducers in Yarn
> 
> Hi.
> 
> (For Hadoop 2.2.0)
> 
> Nope. Number of mappers depends on number of splits (392 line of 
> JobSubmitter).
> Number of reducers depends on property mapreduce.job.reduces.
> 
> So, you can setup like this:
> 
> final Configuration configuration = new Configuration();
> configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");
> 
> Make sure that you don't overwrite this configuration during job setup.
> 
> On 10/03/2014 04:29 PM, gortiz wrote:
> > I have been working with MapReduce1, (JobTracker and TaskTrakers).
> > Some of my jobs I want to define the number of reduces to the maximum 
> > capacity of my cluster.
> >
> > I did it with this:
> > int max = new JobClient(new 
> > JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> > Job job = new Job(jConf, this.getClass().getName());
> > job.setNumReduceTasks(max);
> >
> > Now, I want to work with YARN and it seems that it doesn't work. I 
> > think that YARN manages the number of reducers in real time depending 
> > of the resources it has available. The method getMaxReduceTasks it 
> > returns me just two.
> >  don't know if there's another way to set the number the reducer to 
> > the real capacity of the cluster or what I'm doing wrong. I guess that 
> > if I don't use setNumReduceTaskm, it'll get one because the default 
> > value.
> > AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> > al mismo es privada y confidencial y va dirigida exclusivamente a su 
> > destinatario. Pragsis informa a quien pueda haber recibido este correo 
> > por error que contiene información confidencial cuyo uso, copia, 
> > reproducción o distribución está expresamente prohibida. Si no es Vd. 
> > el destinatario del mismo y recibe este correo por error, le rogamos 
> > lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> > copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> > WARNING.\nThis message and the information contained in or attached to 
> > it are private and confidential and intended exclusively for the 
> > addressee. Pragsis informs to whom it may receive it in error that it 
> > contains privileged information and its use, copy, reproduction or 
> > distribution is prohibited. If you are not an intended recipient of 
> > this E-mail, please notify the sender, delete it and do not read, act 
> > upon, print, disclose, copy, reta
> > in or redistribute any portion of this E-mail
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

You should setNumberReducerTask in your job, just there is no such max reducer count in the Yarn any more.
Setting reducer count is kind of art, instead of science. 
I think there is only one rule about it, don't set the reducer number larger than the reducer input group count.
Set the reducer number larger than the reducer input group count will make some reducers have no data to run, then just waste the resource. Other than that, it only depends how fast you want your job to finish vs how many other jobs running in the cluster at the same time.
In my experience, in a multi tenants cluster, set it as high as close to the reducer input group count is a good choice. 
This will require your job asking as many reducers as possible, but not more than the input group count.This will make each reducer process as least input groups as possible, so the reducer can be finished faster, so other jobs can get a fair chance for the reducer slot.
One problem with the above setting is that your job could use all the reducers in the beginning. If your job is really big, your initial reducer tasks could take very long to finish, make your job look bad in the cluster :-)
Yong

> Date: Sun, 5 Oct 2014 15:05:05 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: Re: How to get the max number of reducers in Yarn
> 
> 
> Thanks for all your answers. 
> 
> So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??
> 
> If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.
> 
> 
> 
> ----- Mensaje original -----
> De: "Ramil Malikov" <vi...@gmail.com>
> Para: user@hadoop.apache.org
> Enviados: Viernes, 3 de Octubre 2014 15:36:48
> Asunto: Re: How to get the max number of reducers in Yarn
> 
> Hi.
> 
> (For Hadoop 2.2.0)
> 
> Nope. Number of mappers depends on number of splits (392 line of 
> JobSubmitter).
> Number of reducers depends on property mapreduce.job.reduces.
> 
> So, you can setup like this:
> 
> final Configuration configuration = new Configuration();
> configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");
> 
> Make sure that you don't overwrite this configuration during job setup.
> 
> On 10/03/2014 04:29 PM, gortiz wrote:
> > I have been working with MapReduce1, (JobTracker and TaskTrakers).
> > Some of my jobs I want to define the number of reduces to the maximum 
> > capacity of my cluster.
> >
> > I did it with this:
> > int max = new JobClient(new 
> > JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> > Job job = new Job(jConf, this.getClass().getName());
> > job.setNumReduceTasks(max);
> >
> > Now, I want to work with YARN and it seems that it doesn't work. I 
> > think that YARN manages the number of reducers in real time depending 
> > of the resources it has available. The method getMaxReduceTasks it 
> > returns me just two.
> >  don't know if there's another way to set the number the reducer to 
> > the real capacity of the cluster or what I'm doing wrong. I guess that 
> > if I don't use setNumReduceTaskm, it'll get one because the default 
> > value.
> > AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> > al mismo es privada y confidencial y va dirigida exclusivamente a su 
> > destinatario. Pragsis informa a quien pueda haber recibido este correo 
> > por error que contiene información confidencial cuyo uso, copia, 
> > reproducción o distribución está expresamente prohibida. Si no es Vd. 
> > el destinatario del mismo y recibe este correo por error, le rogamos 
> > lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> > copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> > WARNING.\nThis message and the information contained in or attached to 
> > it are private and confidential and intended exclusively for the 
> > addressee. Pragsis informs to whom it may receive it in error that it 
> > contains privileged information and its use, copy, reproduction or 
> > distribution is prohibited. If you are not an intended recipient of 
> > this E-mail, please notify the sender, delete it and do not read, act 
> > upon, print, disclose, copy, reta
> > in or redistribute any portion of this E-mail
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Guillermo Ortiz <go...@pragsis.com>.

Thanks for all your answers. 

So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??

If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.



----- Mensaje original -----
De: "Ramil Malikov" <vi...@gmail.com>
Para: user@hadoop.apache.org
Enviados: Viernes, 3 de Octubre 2014 15:36:48
Asunto: Re: How to get the max number of reducers in Yarn

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
 in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Guillermo Ortiz <go...@pragsis.com>.

Thanks for all your answers. 

So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??

If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.



----- Mensaje original -----
De: "Ramil Malikov" <vi...@gmail.com>
Para: user@hadoop.apache.org
Enviados: Viernes, 3 de Octubre 2014 15:36:48
Asunto: Re: How to get the max number of reducers in Yarn

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
 in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Guillermo Ortiz <go...@pragsis.com>.

Thanks for all your answers. 

So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??

If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.



----- Mensaje original -----
De: "Ramil Malikov" <vi...@gmail.com>
Para: user@hadoop.apache.org
Enviados: Viernes, 3 de Octubre 2014 15:36:48
Asunto: Re: How to get the max number of reducers in Yarn

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
 in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Guillermo Ortiz <go...@pragsis.com>.

Thanks for all your answers. 

So, if I don't ask for any concrete number of reduce, and I don't call setNumberReduceTask, how many reduces would I get?? the default value??

If I want to get the maximum number of reducers possible on any time, should I just set the number to maximum integer and RM would get me the maximum value on that time? I understand if I ask for one million of reducers but it can just give me 16, it doesn't produce any error.



----- Mensaje original -----
De: "Ramil Malikov" <vi...@gmail.com>
Para: user@hadoop.apache.org
Enviados: Viernes, 3 de Octubre 2014 15:36:48
Asunto: Re: How to get the max number of reducers in Yarn

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
 in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Ramil Malikov <vi...@gmail.com>.

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail

Re: How to get the max number of reducers in Yarn

Posted by Ramil Malikov <vi...@gmail.com>.

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail

Re: How to get the max number of reducers in Yarn

Posted by Ramil Malikov <vi...@gmail.com>.

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

In the MR1, the max reducer is a static value set in the mapred-site.xml. That is the value you get in the API.
In the YARN, there is no such static value any more, so you can set any value you like, it is up to RM to decide at runtime, how many reducer tasks are available or can be granted to you. It is a number you can ask, but no guarantee. In fact, it is the same as MR1.  You can ask the max reducer count, but the reducer slots could be not available at that time.
What change is that in Yarn, there is no this static value any more.
No matter you programming in Hive query, or Pig script, or plain Java MR code, the best way to handle how many reducers it should ask, is to make it a runtime parameter. Whoever runs the job should have a better idea what is the best number of reducer needed, instead of depending this static value.
Yong

> Date: Fri, 3 Oct 2014 14:29:12 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: How to get the max number of reducers in Yarn
> 
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
> 
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
> 
> Now, I want to work with YARN and it seems that it doesn't work. I think 
> that YARN manages the number of reducers in real time depending of the 
> resources it has available. The method getMaxReduceTasks it returns me 
> just two.
>   don't know if there's another way to set the number the reducer to the 
> real capacity of the cluster or what I'm doing wrong. I guess that if I 
> don't use setNumReduceTaskm, it'll get one because the default value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

In the MR1, the max reducer is a static value set in the mapred-site.xml. That is the value you get in the API.
In the YARN, there is no such static value any more, so you can set any value you like, it is up to RM to decide at runtime, how many reducer tasks are available or can be granted to you. It is a number you can ask, but no guarantee. In fact, it is the same as MR1.  You can ask the max reducer count, but the reducer slots could be not available at that time.
What change is that in Yarn, there is no this static value any more.
No matter you programming in Hive query, or Pig script, or plain Java MR code, the best way to handle how many reducers it should ask, is to make it a runtime parameter. Whoever runs the job should have a better idea what is the best number of reducer needed, instead of depending this static value.
Yong

> Date: Fri, 3 Oct 2014 14:29:12 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: How to get the max number of reducers in Yarn
> 
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
> 
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
> 
> Now, I want to work with YARN and it seems that it doesn't work. I think 
> that YARN manages the number of reducers in real time depending of the 
> resources it has available. The method getMaxReduceTasks it returns me 
> just two.
>   don't know if there's another way to set the number the reducer to the 
> real capacity of the cluster or what I'm doing wrong. I guess that if I 
> don't use setNumReduceTaskm, it'll get one because the default value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

Re: How to get the max number of reducers in Yarn

Posted by Ramil Malikov <vi...@gmail.com>.

Hi.

(For Hadoop 2.2.0)

Nope. Number of mappers depends on number of splits (392 line of 
JobSubmitter).
Number of reducers depends on property mapreduce.job.reduces.

So, you can setup like this:

final Configuration configuration = new Configuration();
configuration.set("mapreduce.job.reduces", "NUMBER_OF_REDUCERS", "COMMENT");

Make sure that you don't overwrite this configuration during job setup.

On 10/03/2014 04:29 PM, gortiz wrote:
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
>
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
>
> Now, I want to work with YARN and it seems that it doesn't work. I 
> think that YARN manages the number of reducers in real time depending 
> of the resources it has available. The method getMaxReduceTasks it 
> returns me just two.
>  don't know if there's another way to set the number the reducer to 
> the real capacity of the cluster or what I'm doing wrong. I guess that 
> if I don't use setNumReduceTaskm, it'll get one because the default 
> value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta 
> al mismo es privada y confidencial y va dirigida exclusivamente a su 
> destinatario. Pragsis informa a quien pueda haber recibido este correo 
> por error que contiene información confidencial cuyo uso, copia, 
> reproducción o distribución está expresamente prohibida. Si no es Vd. 
> el destinatario del mismo y recibe este correo por error, le rogamos 
> lo ponga en conocimiento del emisor y proceda a su eliminación sin 
> copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY 
> WARNING.\nThis message and the information contained in or attached to 
> it are private and confidential and intended exclusively for the 
> addressee. Pragsis informs to whom it may receive it in error that it 
> contains privileged information and its use, copy, reproduction or 
> distribution is prohibited. If you are not an intended recipient of 
> this E-mail, please notify the sender, delete it and do not read, act 
> upon, print, disclose, copy, reta
> in or redistribute any portion of this E-mail

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

In the MR1, the max reducer is a static value set in the mapred-site.xml. That is the value you get in the API.
In the YARN, there is no such static value any more, so you can set any value you like, it is up to RM to decide at runtime, how many reducer tasks are available or can be granted to you. It is a number you can ask, but no guarantee. In fact, it is the same as MR1.  You can ask the max reducer count, but the reducer slots could be not available at that time.
What change is that in Yarn, there is no this static value any more.
No matter you programming in Hive query, or Pig script, or plain Java MR code, the best way to handle how many reducers it should ask, is to make it a runtime parameter. Whoever runs the job should have a better idea what is the best number of reducer needed, instead of depending this static value.
Yong

> Date: Fri, 3 Oct 2014 14:29:12 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: How to get the max number of reducers in Yarn
> 
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
> 
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
> 
> Now, I want to work with YARN and it seems that it doesn't work. I think 
> that YARN manages the number of reducers in real time depending of the 
> resources it has available. The method getMaxReduceTasks it returns me 
> just two.
>   don't know if there's another way to set the number the reducer to the 
> real capacity of the cluster or what I'm doing wrong. I guess that if I 
> don't use setNumReduceTaskm, it'll get one because the default value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.

RE: How to get the max number of reducers in Yarn

Posted by java8964 <ja...@hotmail.com>.

In the MR1, the max reducer is a static value set in the mapred-site.xml. That is the value you get in the API.
In the YARN, there is no such static value any more, so you can set any value you like, it is up to RM to decide at runtime, how many reducer tasks are available or can be granted to you. It is a number you can ask, but no guarantee. In fact, it is the same as MR1.  You can ask the max reducer count, but the reducer slots could be not available at that time.
What change is that in Yarn, there is no this static value any more.
No matter you programming in Hive query, or Pig script, or plain Java MR code, the best way to handle how many reducers it should ask, is to make it a runtime parameter. Whoever runs the job should have a better idea what is the best number of reducer needed, instead of depending this static value.
Yong

> Date: Fri, 3 Oct 2014 14:29:12 +0200
> From: gortiz@pragsis.com
> To: user@hadoop.apache.org
> Subject: How to get the max number of reducers in Yarn
> 
> I have been working with MapReduce1, (JobTracker and TaskTrakers).
> Some of my jobs I want to define the number of reduces to the maximum 
> capacity of my cluster.
> 
> I did it with this:
> int max = new JobClient(new 
> JobConf(jConf)).getClusterStatus().getMaxReduceTasks();
> Job job = new Job(jConf, this.getClass().getName());
> job.setNumReduceTasks(max);
> 
> Now, I want to work with YARN and it seems that it doesn't work. I think 
> that YARN manages the number of reducers in real time depending of the 
> resources it has available. The method getMaxReduceTasks it returns me 
> just two.
>   don't know if there's another way to set the number the reducer to the 
> real capacity of the cluster or what I'm doing wrong. I guess that if I 
> don't use setNumReduceTaskm, it'll get one because the default value.
> AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo es privada y confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a quien pueda haber recibido este correo por error que contiene información confidencial cuyo uso, copia, reproducción o distribución está expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in or attached to it are private and confidential and intended exclusively for the addressee. Pragsis informs to whom it may receive it in error that it contains privileged information and its use, copy, reproduction or distribution is prohibited. If you are not an intended recipient of this E-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, reta
>  in or redistribute any portion of this E-mail.