You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/11/08 03:22:52 UTC

About conception and usage of Uber

Hi Experts,

In previous discussions, I found following descriptions:
"mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
sequentially within a single JVM. "Small" is defined by the following
maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"

Basing on above description, I set "mapreduce.job.ubertask.enable" to true
and also configured other uber related parameters, and then I did some
practices and have following understanding.
1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
run in uber mode):
   - Each MR job corresponds to an application, like
application_1383815949546_0006
   - Each application has its own container, like
container_1383815949546_0010_01_000001
   - When a container launched by nodemanager, it will launch a JVM too.
When the container stops, the JVM will stop as well. A container only has
one JVM in its whole life cycle.
   - Each application_1383815949546_0006 includes some map tasks and reduce
tasks
   - In uber mode, all the map tasks and reduce tasks of
application_1383815949546_0006 will be executed in a the same and only
container container_1383815949546_0010_01_000001. It also means that all
map tasks and reduce tasks will be executed in a single JVM.
   - A container could not be shared among different applications(jobs)

2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
run and NOT in uber mode):
   - Each map task and reduce task of application_1383815949546_0006 will
be executed in its own container. It means that
application_1383815949546_0006 will have lots of containers.

I am not sure whether above undertandings are correct or not, so any
comments/corrections will be appreciated!

Re: About conception and usage of Uber

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yes, your understanding is correct. WIth 'uber' mode enabled, you'll run everything within the container of the AM itself which leads to significantly faster runtimes for 'small' jobs (we've observed 2x-3x speedup for some jobs).

hth,
Arun

On Nov 10, 2013, at 5:35 PM, sam liu <sa...@gmail.com> wrote:

> Any comments/corrections on my understanding on Uber? 
> 
> Thanks in advance!
> 
> 
> 2013/11/8 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
> 
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true and also configured other uber related parameters, and then I did some practices and have following understanding. 
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will run in uber mode):
>    - Each MR job corresponds to an application, like application_1383815949546_0006
>    - Each application has its own container, like container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too. When the container stops, the JVM will stop as well. A container only has one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of application_1383815949546_0006 will be executed in a the same and only container container_1383815949546_0010_01_000001. It also means that all map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>    
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will be executed in its own container. It means that application_1383815949546_0006 will have lots of containers.
> 
> I am not sure whether above undertandings are correct or not, so any comments/corrections will be appreciated! 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: About conception and usage of Uber

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yes, your understanding is correct. WIth 'uber' mode enabled, you'll run everything within the container of the AM itself which leads to significantly faster runtimes for 'small' jobs (we've observed 2x-3x speedup for some jobs).

hth,
Arun

On Nov 10, 2013, at 5:35 PM, sam liu <sa...@gmail.com> wrote:

> Any comments/corrections on my understanding on Uber? 
> 
> Thanks in advance!
> 
> 
> 2013/11/8 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
> 
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true and also configured other uber related parameters, and then I did some practices and have following understanding. 
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will run in uber mode):
>    - Each MR job corresponds to an application, like application_1383815949546_0006
>    - Each application has its own container, like container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too. When the container stops, the JVM will stop as well. A container only has one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of application_1383815949546_0006 will be executed in a the same and only container container_1383815949546_0010_01_000001. It also means that all map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>    
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will be executed in its own container. It means that application_1383815949546_0006 will have lots of containers.
> 
> I am not sure whether above undertandings are correct or not, so any comments/corrections will be appreciated! 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: About conception and usage of Uber

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yes, your understanding is correct. WIth 'uber' mode enabled, you'll run everything within the container of the AM itself which leads to significantly faster runtimes for 'small' jobs (we've observed 2x-3x speedup for some jobs).

hth,
Arun

On Nov 10, 2013, at 5:35 PM, sam liu <sa...@gmail.com> wrote:

> Any comments/corrections on my understanding on Uber? 
> 
> Thanks in advance!
> 
> 
> 2013/11/8 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
> 
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true and also configured other uber related parameters, and then I did some practices and have following understanding. 
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will run in uber mode):
>    - Each MR job corresponds to an application, like application_1383815949546_0006
>    - Each application has its own container, like container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too. When the container stops, the JVM will stop as well. A container only has one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of application_1383815949546_0006 will be executed in a the same and only container container_1383815949546_0010_01_000001. It also means that all map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>    
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will be executed in its own container. It means that application_1383815949546_0006 will have lots of containers.
> 
> I am not sure whether above undertandings are correct or not, so any comments/corrections will be appreciated! 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: About conception and usage of Uber

Posted by Arun C Murthy <ac...@hortonworks.com>.

Yes, your understanding is correct. WIth 'uber' mode enabled, you'll run everything within the container of the AM itself which leads to significantly faster runtimes for 'small' jobs (we've observed 2x-3x speedup for some jobs).

hth,
Arun

On Nov 10, 2013, at 5:35 PM, sam liu <sa...@gmail.com> wrote:

> Any comments/corrections on my understanding on Uber? 
> 
> Thanks in advance!
> 
> 
> 2013/11/8 sam liu <sa...@gmail.com>
> Hi Experts,
> 
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the small-jobs "ubertask" optimization, which runs "sufficiently small" jobs sequentially within a single JVM. "Small" is defined by the following maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
> 
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true and also configured other uber related parameters, and then I did some practices and have following understanding. 
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will run in uber mode):
>    - Each MR job corresponds to an application, like application_1383815949546_0006
>    - Each application has its own container, like container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too. When the container stops, the JVM will stop as well. A container only has one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of application_1383815949546_0006 will be executed in a the same and only container container_1383815949546_0010_01_000001. It also means that all map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>    
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will be executed in its own container. It means that application_1383815949546_0006 will have lots of containers.
> 
> I am not sure whether above undertandings are correct or not, so any comments/corrections will be appreciated! 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: About conception and usage of Uber

Posted by sam liu <sa...@gmail.com>.

Any comments/corrections on my understanding on Uber?

Thanks in advance!


2013/11/8 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
> small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
> sequentially within a single JVM. "Small" is defined by the following
> maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
>
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true
> and also configured other uber related parameters, and then I did some
> practices and have following understanding.
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
> run in uber mode):
>    - Each MR job corresponds to an application, like
> application_1383815949546_0006
>    - Each application has its own container, like
> container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too.
> When the container stops, the JVM will stop as well. A container only has
> one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and
> reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of
> application_1383815949546_0006 will be executed in a the same and only
> container container_1383815949546_0010_01_000001. It also means that all
> map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
> run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will
> be executed in its own container. It means that
> application_1383815949546_0006 will have lots of containers.
>
> I am not sure whether above undertandings are correct or not, so any
> comments/corrections will be appreciated!
>

Re: About conception and usage of Uber

Posted by sam liu <sa...@gmail.com>.

Any comments/corrections on my understanding on Uber?

Thanks in advance!


2013/11/8 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
> small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
> sequentially within a single JVM. "Small" is defined by the following
> maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
>
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true
> and also configured other uber related parameters, and then I did some
> practices and have following understanding.
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
> run in uber mode):
>    - Each MR job corresponds to an application, like
> application_1383815949546_0006
>    - Each application has its own container, like
> container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too.
> When the container stops, the JVM will stop as well. A container only has
> one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and
> reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of
> application_1383815949546_0006 will be executed in a the same and only
> container container_1383815949546_0010_01_000001. It also means that all
> map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
> run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will
> be executed in its own container. It means that
> application_1383815949546_0006 will have lots of containers.
>
> I am not sure whether above undertandings are correct or not, so any
> comments/corrections will be appreciated!
>

Re: About conception and usage of Uber

Posted by sam liu <sa...@gmail.com>.

Any comments/corrections on my understanding on Uber?

Thanks in advance!


2013/11/8 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
> small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
> sequentially within a single JVM. "Small" is defined by the following
> maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
>
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true
> and also configured other uber related parameters, and then I did some
> practices and have following understanding.
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
> run in uber mode):
>    - Each MR job corresponds to an application, like
> application_1383815949546_0006
>    - Each application has its own container, like
> container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too.
> When the container stops, the JVM will stop as well. A container only has
> one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and
> reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of
> application_1383815949546_0006 will be executed in a the same and only
> container container_1383815949546_0010_01_000001. It also means that all
> map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
> run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will
> be executed in its own container. It means that
> application_1383815949546_0006 will have lots of containers.
>
> I am not sure whether above undertandings are correct or not, so any
> comments/corrections will be appreciated!
>

Re: About conception and usage of Uber

Posted by sam liu <sa...@gmail.com>.

Any comments/corrections on my understanding on Uber?

Thanks in advance!


2013/11/8 sam liu <sa...@gmail.com>

> Hi Experts,
>
> In previous discussions, I found following descriptions:
> "mapreduce.job.ubertask.enable | (false) | 'Whether to enable the
> small-jobs "ubertask" optimization, which runs "sufficiently small" jobs
> sequentially within a single JVM. "Small" is defined by the following
> maxmaps, maxreduces, and maxbytes settings. Users may override this value.'"
>
> Basing on above description, I set "mapreduce.job.ubertask.enable" to true
> and also configured other uber related parameters, and then I did some
> practices and have following understanding.
> 1) If I submit a bunch of small MR jobs to Hadoop cluster(each MR job will
> run in uber mode):
>    - Each MR job corresponds to an application, like
> application_1383815949546_0006
>    - Each application has its own container, like
> container_1383815949546_0010_01_000001
>    - When a container launched by nodemanager, it will launch a JVM too.
> When the container stops, the JVM will stop as well. A container only has
> one JVM in its whole life cycle.
>    - Each application_1383815949546_0006 includes some map tasks and
> reduce tasks
>    - In uber mode, all the map tasks and reduce tasks of
> application_1383815949546_0006 will be executed in a the same and only
> container container_1383815949546_0010_01_000001. It also means that all
> map tasks and reduce tasks will be executed in a single JVM.
>    - A container could not be shared among different applications(jobs)
>
> 2) If I submit a bunch of big MR jobs to Hadoop cluster(each MR job will
> run and NOT in uber mode):
>    - Each map task and reduce task of application_1383815949546_0006 will
> be executed in its own container. It means that
> application_1383815949546_0006 will have lots of containers.
>
> I am not sure whether above undertandings are correct or not, so any
> comments/corrections will be appreciated!
>