You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by sam liu <sa...@gmail.com> on 2013/06/19 04:46:21 UTC

How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task
together, with a typical process(map > shuffle > reduce). In Yarn, as I
know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like
mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting
'mapred.tasktracker.map.tasks.maximum' and
'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses
container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using
'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
physical mem of a container?
- How to set the maximum size of physical mem of a container? By the
parameter of 'mapred.child.java.opts'?

Thanks!

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Got it, and thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> HBase-0.94.* does support hadoop-2.x, do you look at the web site i
> provided?
>
> Hive-0.9.0  doesn't  support hadoop-2.x
>
>
>
>
> On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> I'd use hive-0.11.
>>
>> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Azurry,
>>
>> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
>> does not support hadoop 2.x, right?
>>
>> Thanks!
>>
>>
>> 2013/6/20 Azuryy Yu <az...@gmail.com>
>>
>>> Hi Sam,
>>> please look at :http://hbase.apache.org/book.html#d2617e499
>>>
>>> generally, we said YARN is Hadoop-2.x, you can download
>>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Thanks Arun!
>>>>
>>>> #1, Yes, I did tests and found that the MRv1 jobs could run against
>>>> YARN directly, without recompiling
>>>>
>>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>>> and only some special versions of them can run against YARN? If yes, how
>>>> can I get the versions for YARN?
>>>>
>>>>
>>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>>
>>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Appreciating for the detailed answers! Here are three further
>>>>> questions:
>>>>>
>>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>>> we should recompile the MRv1 job?
>>>>>
>>>>>
>>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>>
>>>>> - Which yarn jar files are required in the recompiling?
>>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>>> related components again with yarn jar files? Without any code change?
>>>>>
>>>>>
>>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>>> jobs, hive queries, pig scripts etc.)
>>>>>
>>>>> Arun
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>>
>>>>>> Thanks Arun and Devraj , good to know.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>>
>>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory
>>>>>>> now.
>>>>>>>
>>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Devaraj,
>>>>>>>
>>>>>>> As for the container request request for yarn container , currently
>>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>>
>>>>>>>>  Hi Sam,****
>>>>>>>>
>>>>>>>>   Please find the answers for your queries. ****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>>
>>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?****
>>>>>>>>
>>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>>> failed tasks.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?****
>>>>>>>>
>>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?****
>>>>>>>>
>>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> It can be set based on the resources requested for that container.*
>>>>>>>> ***
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Thanks****
>>>>>>>>
>>>>>>>> Devaraj K****
>>>>>>>>
>>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>>> *To:* user@hadoop.apache.org
>>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> Thanks!****
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Got it, and thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> HBase-0.94.* does support hadoop-2.x, do you look at the web site i
> provided?
>
> Hive-0.9.0  doesn't  support hadoop-2.x
>
>
>
>
> On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> I'd use hive-0.11.
>>
>> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Azurry,
>>
>> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
>> does not support hadoop 2.x, right?
>>
>> Thanks!
>>
>>
>> 2013/6/20 Azuryy Yu <az...@gmail.com>
>>
>>> Hi Sam,
>>> please look at :http://hbase.apache.org/book.html#d2617e499
>>>
>>> generally, we said YARN is Hadoop-2.x, you can download
>>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Thanks Arun!
>>>>
>>>> #1, Yes, I did tests and found that the MRv1 jobs could run against
>>>> YARN directly, without recompiling
>>>>
>>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>>> and only some special versions of them can run against YARN? If yes, how
>>>> can I get the versions for YARN?
>>>>
>>>>
>>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>>
>>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Appreciating for the detailed answers! Here are three further
>>>>> questions:
>>>>>
>>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>>> we should recompile the MRv1 job?
>>>>>
>>>>>
>>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>>
>>>>> - Which yarn jar files are required in the recompiling?
>>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>>> related components again with yarn jar files? Without any code change?
>>>>>
>>>>>
>>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>>> jobs, hive queries, pig scripts etc.)
>>>>>
>>>>> Arun
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>>
>>>>>> Thanks Arun and Devraj , good to know.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>>
>>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory
>>>>>>> now.
>>>>>>>
>>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Devaraj,
>>>>>>>
>>>>>>> As for the container request request for yarn container , currently
>>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>>
>>>>>>>>  Hi Sam,****
>>>>>>>>
>>>>>>>>   Please find the answers for your queries. ****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>>
>>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?****
>>>>>>>>
>>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>>> failed tasks.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?****
>>>>>>>>
>>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?****
>>>>>>>>
>>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> It can be set based on the resources requested for that container.*
>>>>>>>> ***
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Thanks****
>>>>>>>>
>>>>>>>> Devaraj K****
>>>>>>>>
>>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>>> *To:* user@hadoop.apache.org
>>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> Thanks!****
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Got it, and thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> HBase-0.94.* does support hadoop-2.x, do you look at the web site i
> provided?
>
> Hive-0.9.0  doesn't  support hadoop-2.x
>
>
>
>
> On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> I'd use hive-0.11.
>>
>> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Azurry,
>>
>> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
>> does not support hadoop 2.x, right?
>>
>> Thanks!
>>
>>
>> 2013/6/20 Azuryy Yu <az...@gmail.com>
>>
>>> Hi Sam,
>>> please look at :http://hbase.apache.org/book.html#d2617e499
>>>
>>> generally, we said YARN is Hadoop-2.x, you can download
>>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Thanks Arun!
>>>>
>>>> #1, Yes, I did tests and found that the MRv1 jobs could run against
>>>> YARN directly, without recompiling
>>>>
>>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>>> and only some special versions of them can run against YARN? If yes, how
>>>> can I get the versions for YARN?
>>>>
>>>>
>>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>>
>>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Appreciating for the detailed answers! Here are three further
>>>>> questions:
>>>>>
>>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>>> we should recompile the MRv1 job?
>>>>>
>>>>>
>>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>>
>>>>> - Which yarn jar files are required in the recompiling?
>>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>>> related components again with yarn jar files? Without any code change?
>>>>>
>>>>>
>>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>>> jobs, hive queries, pig scripts etc.)
>>>>>
>>>>> Arun
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>>
>>>>>> Thanks Arun and Devraj , good to know.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>>
>>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory
>>>>>>> now.
>>>>>>>
>>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Devaraj,
>>>>>>>
>>>>>>> As for the container request request for yarn container , currently
>>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>>
>>>>>>>>  Hi Sam,****
>>>>>>>>
>>>>>>>>   Please find the answers for your queries. ****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>>
>>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?****
>>>>>>>>
>>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>>> failed tasks.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?****
>>>>>>>>
>>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?****
>>>>>>>>
>>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> It can be set based on the resources requested for that container.*
>>>>>>>> ***
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Thanks****
>>>>>>>>
>>>>>>>> Devaraj K****
>>>>>>>>
>>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>>> *To:* user@hadoop.apache.org
>>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> Thanks!****
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Got it, and thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> HBase-0.94.* does support hadoop-2.x, do you look at the web site i
> provided?
>
> Hive-0.9.0  doesn't  support hadoop-2.x
>
>
>
>
> On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> I'd use hive-0.11.
>>
>> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Hi Azurry,
>>
>> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
>> does not support hadoop 2.x, right?
>>
>> Thanks!
>>
>>
>> 2013/6/20 Azuryy Yu <az...@gmail.com>
>>
>>> Hi Sam,
>>> please look at :http://hbase.apache.org/book.html#d2617e499
>>>
>>> generally, we said YARN is Hadoop-2.x, you can download
>>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>>
>>>
>>>
>>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>>> Thanks Arun!
>>>>
>>>> #1, Yes, I did tests and found that the MRv1 jobs could run against
>>>> YARN directly, without recompiling
>>>>
>>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>>> and only some special versions of them can run against YARN? If yes, how
>>>> can I get the versions for YARN?
>>>>
>>>>
>>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>>
>>>>>
>>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>>
>>>>> Appreciating for the detailed answers! Here are three further
>>>>> questions:
>>>>>
>>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>>> we should recompile the MRv1 job?
>>>>>
>>>>>
>>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>>
>>>>> - Which yarn jar files are required in the recompiling?
>>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>>> related components again with yarn jar files? Without any code change?
>>>>>
>>>>>
>>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>>> jobs, hive queries, pig scripts etc.)
>>>>>
>>>>> Arun
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>>>>
>>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>>
>>>>>> Thanks Arun and Devraj , good to know.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>>
>>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory
>>>>>>> now.
>>>>>>>
>>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Devaraj,
>>>>>>>
>>>>>>> As for the container request request for yarn container , currently
>>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rahul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>>
>>>>>>>>  Hi Sam,****
>>>>>>>>
>>>>>>>>   Please find the answers for your queries. ****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>>
>>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?****
>>>>>>>>
>>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>>> failed tasks.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?****
>>>>>>>>
>>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>>
>>>>>>>>
>>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?****
>>>>>>>>
>>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> It can be set based on the resources requested for that container.*
>>>>>>>> ***
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Thanks****
>>>>>>>>
>>>>>>>> Devaraj K****
>>>>>>>>
>>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>>> *To:* user@hadoop.apache.org
>>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>>
>>>>>>>> ** **
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>> execute a job?
>>>>>>>>
>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>>> container instead, right?
>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>>> size of physical mem of a container?
>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>>
>>>>>>>> Thanks!****
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Arun C. Murthy
>>>>>>> Hortonworks Inc.
>>>>>>> http://hortonworks.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

HBase-0.94.* does support hadoop-2.x, do you look at the web site i
provided?

Hive-0.9.0  doesn't  support hadoop-2.x




On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I'd use hive-0.11.
>
> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>
> Hi Azurry,
>
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
> does not support hadoop 2.x, right?
>
> Thanks!
>
>
> 2013/6/20 Azuryy Yu <az...@gmail.com>
>
>> Hi Sam,
>> please look at :http://hbase.apache.org/book.html#d2617e499
>>
>> generally, we said YARN is Hadoop-2.x, you can download
>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Thanks Arun!
>>>
>>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>>> directly, without recompiling
>>>
>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>> and only some special versions of them can run against YARN? If yes, how
>>> can I get the versions for YARN?
>>>
>>>
>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>>
>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Appreciating for the detailed answers! Here are three further questions:
>>>>
>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>> we should recompile the MRv1 job?
>>>>
>>>>
>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>
>>>> - Which yarn jar files are required in the recompiling?
>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>> related components again with yarn jar files? Without any code change?
>>>>
>>>>
>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>> jobs, hive queries, pig scripts etc.)
>>>>
>>>> Arun
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>
>>>>> Thanks Arun and Devraj , good to know.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>
>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>>
>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>
>>>>>> Hi Devaraj,
>>>>>>
>>>>>> As for the container request request for yarn container , currently
>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>
>>>>>>>  Hi Sam,****
>>>>>>>
>>>>>>>   Please find the answers for your queries. ****
>>>>>>>
>>>>>>>
>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>
>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>
>>>>>>>
>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?****
>>>>>>>
>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>> failed tasks.****
>>>>>>>
>>>>>>>
>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?****
>>>>>>>
>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>
>>>>>>>
>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?****
>>>>>>>
>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> It can be set based on the resources requested for that container.**
>>>>>>> **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Thanks****
>>>>>>>
>>>>>>> Devaraj K****
>>>>>>>
>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> Thanks!****
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

HBase-0.94.* does support hadoop-2.x, do you look at the web site i
provided?

Hive-0.9.0  doesn't  support hadoop-2.x




On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I'd use hive-0.11.
>
> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>
> Hi Azurry,
>
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
> does not support hadoop 2.x, right?
>
> Thanks!
>
>
> 2013/6/20 Azuryy Yu <az...@gmail.com>
>
>> Hi Sam,
>> please look at :http://hbase.apache.org/book.html#d2617e499
>>
>> generally, we said YARN is Hadoop-2.x, you can download
>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Thanks Arun!
>>>
>>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>>> directly, without recompiling
>>>
>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>> and only some special versions of them can run against YARN? If yes, how
>>> can I get the versions for YARN?
>>>
>>>
>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>>
>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Appreciating for the detailed answers! Here are three further questions:
>>>>
>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>> we should recompile the MRv1 job?
>>>>
>>>>
>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>
>>>> - Which yarn jar files are required in the recompiling?
>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>> related components again with yarn jar files? Without any code change?
>>>>
>>>>
>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>> jobs, hive queries, pig scripts etc.)
>>>>
>>>> Arun
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>
>>>>> Thanks Arun and Devraj , good to know.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>
>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>>
>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>
>>>>>> Hi Devaraj,
>>>>>>
>>>>>> As for the container request request for yarn container , currently
>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>
>>>>>>>  Hi Sam,****
>>>>>>>
>>>>>>>   Please find the answers for your queries. ****
>>>>>>>
>>>>>>>
>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>
>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>
>>>>>>>
>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?****
>>>>>>>
>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>> failed tasks.****
>>>>>>>
>>>>>>>
>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?****
>>>>>>>
>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>
>>>>>>>
>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?****
>>>>>>>
>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> It can be set based on the resources requested for that container.**
>>>>>>> **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Thanks****
>>>>>>>
>>>>>>> Devaraj K****
>>>>>>>
>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> Thanks!****
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

HBase-0.94.* does support hadoop-2.x, do you look at the web site i
provided?

Hive-0.9.0  doesn't  support hadoop-2.x




On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I'd use hive-0.11.
>
> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>
> Hi Azurry,
>
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
> does not support hadoop 2.x, right?
>
> Thanks!
>
>
> 2013/6/20 Azuryy Yu <az...@gmail.com>
>
>> Hi Sam,
>> please look at :http://hbase.apache.org/book.html#d2617e499
>>
>> generally, we said YARN is Hadoop-2.x, you can download
>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Thanks Arun!
>>>
>>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>>> directly, without recompiling
>>>
>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>> and only some special versions of them can run against YARN? If yes, how
>>> can I get the versions for YARN?
>>>
>>>
>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>>
>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Appreciating for the detailed answers! Here are three further questions:
>>>>
>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>> we should recompile the MRv1 job?
>>>>
>>>>
>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>
>>>> - Which yarn jar files are required in the recompiling?
>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>> related components again with yarn jar files? Without any code change?
>>>>
>>>>
>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>> jobs, hive queries, pig scripts etc.)
>>>>
>>>> Arun
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>
>>>>> Thanks Arun and Devraj , good to know.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>
>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>>
>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>
>>>>>> Hi Devaraj,
>>>>>>
>>>>>> As for the container request request for yarn container , currently
>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>
>>>>>>>  Hi Sam,****
>>>>>>>
>>>>>>>   Please find the answers for your queries. ****
>>>>>>>
>>>>>>>
>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>
>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>
>>>>>>>
>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?****
>>>>>>>
>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>> failed tasks.****
>>>>>>>
>>>>>>>
>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?****
>>>>>>>
>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>
>>>>>>>
>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?****
>>>>>>>
>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> It can be set based on the resources requested for that container.**
>>>>>>> **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Thanks****
>>>>>>>
>>>>>>> Devaraj K****
>>>>>>>
>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> Thanks!****
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

HBase-0.94.* does support hadoop-2.x, do you look at the web site i
provided?

Hive-0.9.0  doesn't  support hadoop-2.x




On Thu, Jun 20, 2013 at 2:59 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> I'd use hive-0.11.
>
> On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:
>
> Hi Azurry,
>
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
> does not support hadoop 2.x, right?
>
> Thanks!
>
>
> 2013/6/20 Azuryy Yu <az...@gmail.com>
>
>> Hi Sam,
>> please look at :http://hbase.apache.org/book.html#d2617e499
>>
>> generally, we said YARN is Hadoop-2.x, you can download
>> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>>
>>
>>
>> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>>
>>> Thanks Arun!
>>>
>>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>>> directly, without recompiling
>>>
>>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>>> and only some special versions of them can run against YARN? If yes, how
>>> can I get the versions for YARN?
>>>
>>>
>>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>>
>>>>
>>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>>
>>>> Appreciating for the detailed answers! Here are three further questions:
>>>>
>>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>>> we should recompile the MRv1 job?
>>>>
>>>>
>>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>>
>>>> - Which yarn jar files are required in the recompiling?
>>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>>> related components again with yarn jar files? Without any code change?
>>>>
>>>>
>>>> You will need versions of HBase, Hive etc. which are integrated with
>>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>>> jobs, hive queries, pig scripts etc.)
>>>>
>>>> Arun
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>>
>>>>
>>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>>
>>>>> Thanks Arun and Devraj , good to know.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>>
>>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>>
>>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>>
>>>>>> Hi Devaraj,
>>>>>>
>>>>>> As for the container request request for yarn container , currently
>>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>>
>>>>>> Thanks,
>>>>>> Rahul
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>>
>>>>>>>  Hi Sam,****
>>>>>>>
>>>>>>>   Please find the answers for your queries. ****
>>>>>>>
>>>>>>>
>>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>>> application). If we want to run different kinds of applications we should
>>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>>
>>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>>
>>>>>>>
>>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?****
>>>>>>>
>>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>>> failed tasks.****
>>>>>>>
>>>>>>>
>>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?****
>>>>>>>
>>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>>> for resources to complete the tasks for that application.****
>>>>>>>
>>>>>>>
>>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?****
>>>>>>>
>>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> >- How to set the maximum size of physical mem of a container? By
>>>>>>> the parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> It can be set based on the resources requested for that container.**
>>>>>>> **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Thanks****
>>>>>>>
>>>>>>> Devaraj K****
>>>>>>>
>>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>>> *Sent:* 19 June 2013 08:16
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>>
>>>>>>> ** **
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>> execute a job?
>>>>>>>
>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>>> container instead, right?
>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>>> size of physical mem of a container?
>>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>>
>>>>>>> Thanks!****
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Arun C. Murthy
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

I'd use hive-0.11.

On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:

> Hi Azurry,
> 
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0, does not support hadoop 2.x, right?
> 
> Thanks!
> 
> 
> 2013/6/20 Azuryy Yu <az...@gmail.com>
> Hi Sam, 
> please look at :http://hbase.apache.org/book.html#d2617e499
> 
> generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
> 
> 
> 
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
> Thanks Arun!
> 
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN directly, without recompiling
> 
> #2, do you mean the old versions of HBase/Hive can not run agains YARN, and only some special versions of them can run against YARN? If yes, how can I get the versions for YARN?
> 
> 
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
> 
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
> 
>> Appreciating for the detailed answers! Here are three further questions:
>> 
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?
> 
> You don't need to recompile MRv1 jobs to run against YARN.
> 
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?
> 
> You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)
> 
> Arun
> 
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>> Thanks Arun and Devraj , good to know.
>> 
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>> 
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> 
>>> Hi Devaraj,
>>> 
>>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>>> 
>>> Thanks,
>>> Rahul
>>> 
>>> 
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>> Hi Sam,
>>> 
>>>   Please find the answers for your queries.
>>> 
>>> 
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> 
>>>  
>>> 
>>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>>> 
>>>  
>>> 
>>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> 
>>> These configurations still work for MR Job in Yarn.
>>> 
>>> 
>>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>>> 
>>> 
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> 
>>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>>> 
>>> 
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> 
>>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>>> 
>>>  
>>> 
>>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> It can be set based on the resources requested for that container.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Devaraj K
>>> 
>>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>>> Sent: 19 June 2013 08:16
>>> To: user@hadoop.apache.org
>>> Subject: How Yarn execute MRv1 job?
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> Thanks!
>>> 
>>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

I'd use hive-0.11.

On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:

> Hi Azurry,
> 
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0, does not support hadoop 2.x, right?
> 
> Thanks!
> 
> 
> 2013/6/20 Azuryy Yu <az...@gmail.com>
> Hi Sam, 
> please look at :http://hbase.apache.org/book.html#d2617e499
> 
> generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
> 
> 
> 
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
> Thanks Arun!
> 
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN directly, without recompiling
> 
> #2, do you mean the old versions of HBase/Hive can not run agains YARN, and only some special versions of them can run against YARN? If yes, how can I get the versions for YARN?
> 
> 
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
> 
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
> 
>> Appreciating for the detailed answers! Here are three further questions:
>> 
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?
> 
> You don't need to recompile MRv1 jobs to run against YARN.
> 
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?
> 
> You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)
> 
> Arun
> 
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>> Thanks Arun and Devraj , good to know.
>> 
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>> 
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> 
>>> Hi Devaraj,
>>> 
>>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>>> 
>>> Thanks,
>>> Rahul
>>> 
>>> 
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>> Hi Sam,
>>> 
>>>   Please find the answers for your queries.
>>> 
>>> 
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> 
>>>  
>>> 
>>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>>> 
>>>  
>>> 
>>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> 
>>> These configurations still work for MR Job in Yarn.
>>> 
>>> 
>>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>>> 
>>> 
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> 
>>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>>> 
>>> 
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> 
>>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>>> 
>>>  
>>> 
>>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> It can be set based on the resources requested for that container.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Devaraj K
>>> 
>>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>>> Sent: 19 June 2013 08:16
>>> To: user@hadoop.apache.org
>>> Subject: How Yarn execute MRv1 job?
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> Thanks!
>>> 
>>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

I'd use hive-0.11.

On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:

> Hi Azurry,
> 
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0, does not support hadoop 2.x, right?
> 
> Thanks!
> 
> 
> 2013/6/20 Azuryy Yu <az...@gmail.com>
> Hi Sam, 
> please look at :http://hbase.apache.org/book.html#d2617e499
> 
> generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
> 
> 
> 
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
> Thanks Arun!
> 
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN directly, without recompiling
> 
> #2, do you mean the old versions of HBase/Hive can not run agains YARN, and only some special versions of them can run against YARN? If yes, how can I get the versions for YARN?
> 
> 
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
> 
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
> 
>> Appreciating for the detailed answers! Here are three further questions:
>> 
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?
> 
> You don't need to recompile MRv1 jobs to run against YARN.
> 
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?
> 
> You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)
> 
> Arun
> 
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>> Thanks Arun and Devraj , good to know.
>> 
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>> 
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> 
>>> Hi Devaraj,
>>> 
>>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>>> 
>>> Thanks,
>>> Rahul
>>> 
>>> 
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>> Hi Sam,
>>> 
>>>   Please find the answers for your queries.
>>> 
>>> 
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> 
>>>  
>>> 
>>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>>> 
>>>  
>>> 
>>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> 
>>> These configurations still work for MR Job in Yarn.
>>> 
>>> 
>>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>>> 
>>> 
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> 
>>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>>> 
>>> 
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> 
>>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>>> 
>>>  
>>> 
>>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> It can be set based on the resources requested for that container.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Devaraj K
>>> 
>>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>>> Sent: 19 June 2013 08:16
>>> To: user@hadoop.apache.org
>>> Subject: How Yarn execute MRv1 job?
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> Thanks!
>>> 
>>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

I'd use hive-0.11.

On Jun 19, 2013, at 11:56 PM, sam liu <sa...@gmail.com> wrote:

> Hi Azurry,
> 
> So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0, does not support hadoop 2.x, right?
> 
> Thanks!
> 
> 
> 2013/6/20 Azuryy Yu <az...@gmail.com>
> Hi Sam, 
> please look at :http://hbase.apache.org/book.html#d2617e499
> 
> generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
> 
> 
> 
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
> Thanks Arun!
> 
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN directly, without recompiling
> 
> #2, do you mean the old versions of HBase/Hive can not run agains YARN, and only some special versions of them can run against YARN? If yes, how can I get the versions for YARN?
> 
> 
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
> 
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
> 
>> Appreciating for the detailed answers! Here are three further questions:
>> 
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?
> 
> You don't need to recompile MRv1 jobs to run against YARN.
> 
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?
> 
> You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)
> 
> Arun
> 
>> 
>> Thanks in advance!
>> 
>> 
>> 
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>> Thanks Arun and Devraj , good to know.
>> 
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>> 
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> 
>>> Hi Devaraj,
>>> 
>>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>>> 
>>> Thanks,
>>> Rahul
>>> 
>>> 
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>> Hi Sam,
>>> 
>>>   Please find the answers for your queries.
>>> 
>>> 
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> 
>>>  
>>> 
>>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>>> 
>>>  
>>> 
>>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> 
>>> These configurations still work for MR Job in Yarn.
>>> 
>>> 
>>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>>> 
>>> 
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> 
>>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>>> 
>>> 
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> 
>>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>>> 
>>>  
>>> 
>>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> It can be set based on the resources requested for that container.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Thanks
>>> 
>>> Devaraj K
>>> 
>>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>>> Sent: 19 June 2013 08:16
>>> To: user@hadoop.apache.org
>>> Subject: How Yarn execute MRv1 job?
>>> 
>>>  
>>> 
>>> Hi,
>>> 
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>>> 
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>>> 
>>> Thanks!
>>> 
>>> 
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Hi Azurry,

So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
does not support hadoop 2.x, right?

Thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> Hi Sam,
> please look at :http://hbase.apache.org/book.html#d2617e499
>
> generally, we said YARN is Hadoop-2.x, you can download
> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>
>
>
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>
>> Thanks Arun!
>>
>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>> directly, without recompiling
>>
>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>> and only some special versions of them can run against YARN? If yes, how
>> can I get the versions for YARN?
>>
>>
>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>
>>>
>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Appreciating for the detailed answers! Here are three further questions:
>>>
>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>> we should recompile the MRv1 job?
>>>
>>>
>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>
>>> - Which yarn jar files are required in the recompiling?
>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>> related components again with yarn jar files? Without any code change?
>>>
>>>
>>> You will need versions of HBase, Hive etc. which are integrated with
>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>> jobs, hive queries, pig scripts etc.)
>>>
>>> Arun
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>
>>>> Thanks Arun and Devraj , good to know.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>
>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>
>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>
>>>>> Hi Devaraj,
>>>>>
>>>>> As for the container request request for yarn container , currently
>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>
>>>>>>  Hi Sam,****
>>>>>>
>>>>>>   Please find the answers for your queries. ****
>>>>>>
>>>>>>
>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>> application). If we want to run different kinds of applications we should
>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>
>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>
>>>>>>
>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>> execute a job?****
>>>>>>
>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>> failed tasks.****
>>>>>>
>>>>>>
>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?****
>>>>>>
>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>> for resources to complete the tasks for that application.****
>>>>>>
>>>>>>
>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?****
>>>>>>
>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> It can be set based on the resources requested for that container.***
>>>>>> *
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks****
>>>>>>
>>>>>> Devaraj K****
>>>>>>
>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>> *Sent:* 19 June 2013 08:16
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>>> physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> Thanks!****
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Hi Azurry,

So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
does not support hadoop 2.x, right?

Thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> Hi Sam,
> please look at :http://hbase.apache.org/book.html#d2617e499
>
> generally, we said YARN is Hadoop-2.x, you can download
> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>
>
>
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>
>> Thanks Arun!
>>
>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>> directly, without recompiling
>>
>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>> and only some special versions of them can run against YARN? If yes, how
>> can I get the versions for YARN?
>>
>>
>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>
>>>
>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Appreciating for the detailed answers! Here are three further questions:
>>>
>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>> we should recompile the MRv1 job?
>>>
>>>
>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>
>>> - Which yarn jar files are required in the recompiling?
>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>> related components again with yarn jar files? Without any code change?
>>>
>>>
>>> You will need versions of HBase, Hive etc. which are integrated with
>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>> jobs, hive queries, pig scripts etc.)
>>>
>>> Arun
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>
>>>> Thanks Arun and Devraj , good to know.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>
>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>
>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>
>>>>> Hi Devaraj,
>>>>>
>>>>> As for the container request request for yarn container , currently
>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>
>>>>>>  Hi Sam,****
>>>>>>
>>>>>>   Please find the answers for your queries. ****
>>>>>>
>>>>>>
>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>> application). If we want to run different kinds of applications we should
>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>
>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>
>>>>>>
>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>> execute a job?****
>>>>>>
>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>> failed tasks.****
>>>>>>
>>>>>>
>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?****
>>>>>>
>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>> for resources to complete the tasks for that application.****
>>>>>>
>>>>>>
>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?****
>>>>>>
>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> It can be set based on the resources requested for that container.***
>>>>>> *
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks****
>>>>>>
>>>>>> Devaraj K****
>>>>>>
>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>> *Sent:* 19 June 2013 08:16
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>>> physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> Thanks!****
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Hi Azurry,

So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
does not support hadoop 2.x, right?

Thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> Hi Sam,
> please look at :http://hbase.apache.org/book.html#d2617e499
>
> generally, we said YARN is Hadoop-2.x, you can download
> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>
>
>
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>
>> Thanks Arun!
>>
>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>> directly, without recompiling
>>
>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>> and only some special versions of them can run against YARN? If yes, how
>> can I get the versions for YARN?
>>
>>
>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>
>>>
>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Appreciating for the detailed answers! Here are three further questions:
>>>
>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>> we should recompile the MRv1 job?
>>>
>>>
>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>
>>> - Which yarn jar files are required in the recompiling?
>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>> related components again with yarn jar files? Without any code change?
>>>
>>>
>>> You will need versions of HBase, Hive etc. which are integrated with
>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>> jobs, hive queries, pig scripts etc.)
>>>
>>> Arun
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>
>>>> Thanks Arun and Devraj , good to know.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>
>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>
>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>
>>>>> Hi Devaraj,
>>>>>
>>>>> As for the container request request for yarn container , currently
>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>
>>>>>>  Hi Sam,****
>>>>>>
>>>>>>   Please find the answers for your queries. ****
>>>>>>
>>>>>>
>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>> application). If we want to run different kinds of applications we should
>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>
>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>
>>>>>>
>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>> execute a job?****
>>>>>>
>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>> failed tasks.****
>>>>>>
>>>>>>
>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?****
>>>>>>
>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>> for resources to complete the tasks for that application.****
>>>>>>
>>>>>>
>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?****
>>>>>>
>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> It can be set based on the resources requested for that container.***
>>>>>> *
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks****
>>>>>>
>>>>>> Devaraj K****
>>>>>>
>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>> *Sent:* 19 June 2013 08:16
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>>> physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> Thanks!****
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Hi Azurry,

So, older versions of HBase and Hive, like HBase 0.94.0 and Hive 0.9.0,
does not support hadoop 2.x, right?

Thanks!


2013/6/20 Azuryy Yu <az...@gmail.com>

> Hi Sam,
> please look at :http://hbase.apache.org/book.html#d2617e499
>
> generally, we said YARN is Hadoop-2.x, you can download
> hadoop-2.0.4-alpha. and Hive-0.10 supports hadoop-2.x very well.
>
>
>
> On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:
>
>> Thanks Arun!
>>
>> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
>> directly, without recompiling
>>
>> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
>> and only some special versions of them can run against YARN? If yes, how
>> can I get the versions for YARN?
>>
>>
>> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>>
>>>
>>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>>
>>> Appreciating for the detailed answers! Here are three further questions:
>>>
>>> - Yarn maintains backwards compatibility, and MRv1 job could run on
>>> Yarn. If yarn does not ask existing MRv1 job to do any code change, but why
>>> we should recompile the MRv1 job?
>>>
>>>
>>> You don't need to recompile MRv1 jobs to run against YARN.
>>>
>>> - Which yarn jar files are required in the recompiling?
>>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>>> related components again with yarn jar files? Without any code change?
>>>
>>>
>>> You will need versions of HBase, Hive etc. which are integrated with
>>> hadoop-2.x, but not need to change any of your end-user applications (MR
>>> jobs, hive queries, pig scripts etc.)
>>>
>>> Arun
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>>
>>>> Thanks Arun and Devraj , good to know.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>>
>>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>>
>>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>>
>>>>> Hi Devaraj,
>>>>>
>>>>> As for the container request request for yarn container , currently
>>>>> only memory is considered as resource , not cpu. Please correct.
>>>>>
>>>>> Thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>>
>>>>>>  Hi Sam,****
>>>>>>
>>>>>>   Please find the answers for your queries. ****
>>>>>>
>>>>>>
>>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>>> application). If we want to run different kinds of applications we should
>>>>>> have ApplicationMaster for each kind of application.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>>
>>>>>> These configurations still work for MR Job in Yarn.****
>>>>>>
>>>>>>
>>>>>> >- What's the general process for ApplicationMaster of Yarn to
>>>>>> execute a job?****
>>>>>>
>>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>>> which includes getting the containers for maps & reducers, launch the
>>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>>> failed tasks.****
>>>>>>
>>>>>>
>>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?****
>>>>>>
>>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>>> for resources to complete the tasks for that application.****
>>>>>>
>>>>>>
>>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager
>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the default
>>>>>> size of physical mem of a container?****
>>>>>>
>>>>>> ApplicationMaster is responsible for getting the containers from RM
>>>>>> by sending the resource requests. For MR Job, you can use
>>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>>> for specifying the map & reduce container memory sizes.****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> It can be set based on the resources requested for that container.***
>>>>>> *
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Thanks****
>>>>>>
>>>>>> Devaraj K****
>>>>>>
>>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>>> *Sent:* 19 June 2013 08:16
>>>>>> *To:* user@hadoop.apache.org
>>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>>
>>>>>> ** **
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>>> a job?
>>>>>>
>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>>> container instead, right?
>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>>> physical mem of a container?
>>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>>
>>>>>> Thanks!****
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Arun C. Murthy
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>> for resources to complete the tasks for that application.****
>>>>>
>>>>>
>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?****
>>>>>
>>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>>> sending the resource requests. For MR Job, you can use
>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>> for specifying the map & reduce container memory sizes.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> It can be set based on the resources requested for that container.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks****
>>>>>
>>>>> Devaraj K****
>>>>>
>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>> *Sent:* 19 June 2013 08:16
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> Thanks!****
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>> for resources to complete the tasks for that application.****
>>>>>
>>>>>
>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?****
>>>>>
>>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>>> sending the resource requests. For MR Job, you can use
>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>> for specifying the map & reduce container memory sizes.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> It can be set based on the resources requested for that container.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks****
>>>>>
>>>>> Devaraj K****
>>>>>
>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>> *Sent:* 19 June 2013 08:16
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> Thanks!****
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>> for resources to complete the tasks for that application.****
>>>>>
>>>>>
>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?****
>>>>>
>>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>>> sending the resource requests. For MR Job, you can use
>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>> for specifying the map & reduce container memory sizes.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> It can be set based on the resources requested for that container.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks****
>>>>>
>>>>> Devaraj K****
>>>>>
>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>> *Sent:* 19 June 2013 08:16
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> Thanks!****
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Azuryy Yu <az...@gmail.com>.

Hi Sam,
please look at :http://hbase.apache.org/book.html#d2617e499

generally, we said YARN is Hadoop-2.x, you can download hadoop-2.0.4-alpha.
and Hive-0.10 supports hadoop-2.x very well.



On Thu, Jun 20, 2013 at 2:11 PM, sam liu <sa...@gmail.com> wrote:

> Thanks Arun!
>
> #1, Yes, I did tests and found that the MRv1 jobs could run against YARN
> directly, without recompiling
>
> #2, do you mean the old versions of HBase/Hive can not run agains YARN,
> and only some special versions of them can run against YARN? If yes, how
> can I get the versions for YARN?
>
>
> 2013/6/20 Arun C Murthy <ac...@hortonworks.com>
>
>>
>> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>>
>> Appreciating for the detailed answers! Here are three further questions:
>>
>> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
>> If yarn does not ask existing MRv1 job to do any code change, but why we
>> should recompile the MRv1 job?
>>
>>
>> You don't need to recompile MRv1 jobs to run against YARN.
>>
>> - Which yarn jar files are required in the recompiling?
>> - In a cluster with Hadoop 1.1.1 and other Hadoop related
>> components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to
>> replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop
>> related components again with yarn jar files? Without any code change?
>>
>>
>> You will need versions of HBase, Hive etc. which are integrated with
>> hadoop-2.x, but not need to change any of your end-user applications (MR
>> jobs, hive queries, pig scripts etc.)
>>
>> Arun
>>
>>
>> Thanks in advance!
>>
>>
>>
>> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>>
>>> Thanks Arun and Devraj , good to know.
>>>
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>>
>>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>>
>>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>> Hi Devaraj,
>>>>
>>>> As for the container request request for yarn container , currently
>>>> only memory is considered as resource , not cpu. Please correct.
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>>
>>>>>  Hi Sam,****
>>>>>
>>>>>   Please find the answers for your queries. ****
>>>>>
>>>>>
>>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>>
>>>>> ** **
>>>>>
>>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>>> application). If we want to run different kinds of applications we should
>>>>> have ApplicationMaster for each kind of application.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>>
>>>>> These configurations still work for MR Job in Yarn.****
>>>>>
>>>>>
>>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?****
>>>>>
>>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>>> which includes getting the containers for maps & reducers, launch the
>>>>> containers using NM, tacks the tasks status till completion, manage the
>>>>> failed tasks.****
>>>>>
>>>>>
>>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?****
>>>>>
>>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>>> for resources to complete the tasks for that application.****
>>>>>
>>>>>
>>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?****
>>>>>
>>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>>> sending the resource requests. For MR Job, you can use
>>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>>> for specifying the map & reduce container memory sizes.****
>>>>>
>>>>> ** **
>>>>>
>>>>> >- How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> It can be set based on the resources requested for that container.****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks****
>>>>>
>>>>> Devaraj K****
>>>>>
>>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>>> *Sent:* 19 June 2013 08:16
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi,
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>>> physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?****
>>>>>
>>>>> Thanks!****
>>>>>
>>>>
>>>>
>>>>  --
>>>> Arun C. Murthy
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>>
>>>>
>>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?


2013/6/20 Arun C Murthy <ac...@hortonworks.com>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>> sending the resource requests. For MR Job, you can use
>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>> for specifying the map & reduce container memory sizes.****
>>>>
>>>> ** **
>>>>
>>>> >- How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> It can be set based on the resources requested for that container.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks****
>>>>
>>>> Devaraj K****
>>>>
>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>> *Sent:* 19 June 2013 08:16
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>
>>>> ** **
>>>>
>>>> Hi,
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> Thanks!****
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?


2013/6/20 Arun C Murthy <ac...@hortonworks.com>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>> sending the resource requests. For MR Job, you can use
>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>> for specifying the map & reduce container memory sizes.****
>>>>
>>>> ** **
>>>>
>>>> >- How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> It can be set based on the resources requested for that container.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks****
>>>>
>>>> Devaraj K****
>>>>
>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>> *Sent:* 19 June 2013 08:16
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>
>>>> ** **
>>>>
>>>> Hi,
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> Thanks!****
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?


2013/6/20 Arun C Murthy <ac...@hortonworks.com>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>> sending the resource requests. For MR Job, you can use
>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>> for specifying the map & reduce container memory sizes.****
>>>>
>>>> ** **
>>>>
>>>> >- How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> It can be set based on the resources requested for that container.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks****
>>>>
>>>> Devaraj K****
>>>>
>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>> *Sent:* 19 June 2013 08:16
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>
>>>> ** **
>>>>
>>>> Hi,
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> Thanks!****
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Thanks Arun!

#1, Yes, I did tests and found that the MRv1 jobs could run against YARN
directly, without recompiling

#2, do you mean the old versions of HBase/Hive can not run agains YARN, and
only some special versions of them can run against YARN? If yes, how can I
get the versions for YARN?


2013/6/20 Arun C Murthy <ac...@hortonworks.com>

>
> On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:
>
> Appreciating for the detailed answers! Here are three further questions:
>
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
> If yarn does not ask existing MRv1 job to do any code change, but why we
> should recompile the MRv1 job?
>
>
> You don't need to recompile MRv1 jobs to run against YARN.
>
> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
> 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
> 1.1.1 with yarn, do we need to recompile all other Hadoop related
> components again with yarn jar files? Without any code change?
>
>
> You will need versions of HBase, Hive etc. which are integrated with
> hadoop-2.x, but not need to change any of your end-user applications (MR
> jobs, hive queries, pig scripts etc.)
>
> Arun
>
>
> Thanks in advance!
>
>
>
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
>
>> Thanks Arun and Devraj , good to know.
>>
>>
>>
>> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>>
>>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>>
>>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>> Hi Devaraj,
>>>
>>> As for the container request request for yarn container , currently only
>>> memory is considered as resource , not cpu. Please correct.
>>>
>>> Thanks,
>>> Rahul
>>>
>>>
>>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com>wrote:
>>>
>>>>  Hi Sam,****
>>>>
>>>>   Please find the answers for your queries. ****
>>>>
>>>>
>>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>>
>>>> ** **
>>>>
>>>> In Yarn, it is a concept of application. MR Job is one kind of
>>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>>> application). If we want to run different kinds of applications we should
>>>> have ApplicationMaster for each kind of application.****
>>>>
>>>> ** **
>>>>
>>>> >- Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>>
>>>> These configurations still work for MR Job in Yarn.****
>>>>
>>>>
>>>> >- What's the general process for ApplicationMaster of Yarn to execute
>>>> a job?****
>>>>
>>>> MRAppMaster(Application Master for MR Job) does the Job life cycle
>>>> which includes getting the containers for maps & reducers, launch the
>>>> containers using NM, tacks the tasks status till completion, manage the
>>>> failed tasks.****
>>>>
>>>>
>>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?****
>>>>
>>>> Correct, these params don’t work in yarn. In Yarn it is completely
>>>> based on the resources(memory, cpu). Application Master can request the RM
>>>> for resources to complete the tasks for that application.****
>>>>
>>>>
>>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?****
>>>>
>>>> ApplicationMaster is responsible for getting the containers from RM by
>>>> sending the resource requests. For MR Job, you can use
>>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>>> for specifying the map & reduce container memory sizes.****
>>>>
>>>> ** **
>>>>
>>>> >- How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> It can be set based on the resources requested for that container.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Thanks****
>>>>
>>>> Devaraj K****
>>>>
>>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>>> *Sent:* 19 June 2013 08:16
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* How Yarn execute MRv1 job?****
>>>>
>>>> ** **
>>>>
>>>> Hi,
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>>> physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?****
>>>>
>>>> Thanks!****
>>>>
>>>
>>>
>>>  --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:

> Appreciating for the detailed answers! Here are three further questions:
> 
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?

You don't need to recompile MRv1 jobs to run against YARN.

> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?

You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)

Arun

> 
> Thanks in advance!
> 
> 
> 
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
> Thanks Arun and Devraj , good to know.
> 
> 
> 
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Not true, the CapacityScheduler has support for both CPU & Memory now.
> 
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> 
>> Hi Devaraj,
>> 
>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>> 
>> Thanks,
>> Rahul
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>> Hi Sam,
>> 
>>   Please find the answers for your queries.
>> 
>> 
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> 
>>  
>> 
>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>> 
>>  
>> 
>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> 
>> These configurations still work for MR Job in Yarn.
>> 
>> 
>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>> 
>> 
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> 
>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>> 
>> 
>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> 
>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>> 
>>  
>> 
>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> It can be set based on the resources requested for that container.
>> 
>>  
>> 
>>  
>> 
>> Thanks
>> 
>> Devaraj K
>> 
>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>> Sent: 19 June 2013 08:16
>> To: user@hadoop.apache.org
>> Subject: How Yarn execute MRv1 job?
>> 
>>  
>> 
>> Hi,
>> 
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> Thanks!
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:

> Appreciating for the detailed answers! Here are three further questions:
> 
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?

You don't need to recompile MRv1 jobs to run against YARN.

> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?

You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)

Arun

> 
> Thanks in advance!
> 
> 
> 
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
> Thanks Arun and Devraj , good to know.
> 
> 
> 
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Not true, the CapacityScheduler has support for both CPU & Memory now.
> 
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> 
>> Hi Devaraj,
>> 
>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>> 
>> Thanks,
>> Rahul
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>> Hi Sam,
>> 
>>   Please find the answers for your queries.
>> 
>> 
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> 
>>  
>> 
>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>> 
>>  
>> 
>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> 
>> These configurations still work for MR Job in Yarn.
>> 
>> 
>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>> 
>> 
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> 
>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>> 
>> 
>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> 
>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>> 
>>  
>> 
>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> It can be set based on the resources requested for that container.
>> 
>>  
>> 
>>  
>> 
>> Thanks
>> 
>> Devaraj K
>> 
>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>> Sent: 19 June 2013 08:16
>> To: user@hadoop.apache.org
>> Subject: How Yarn execute MRv1 job?
>> 
>>  
>> 
>> Hi,
>> 
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> Thanks!
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:

> Appreciating for the detailed answers! Here are three further questions:
> 
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?

You don't need to recompile MRv1 jobs to run against YARN.

> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?

You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)

Arun

> 
> Thanks in advance!
> 
> 
> 
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
> Thanks Arun and Devraj , good to know.
> 
> 
> 
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Not true, the CapacityScheduler has support for both CPU & Memory now.
> 
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> 
>> Hi Devaraj,
>> 
>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>> 
>> Thanks,
>> Rahul
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>> Hi Sam,
>> 
>>   Please find the answers for your queries.
>> 
>> 
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> 
>>  
>> 
>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>> 
>>  
>> 
>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> 
>> These configurations still work for MR Job in Yarn.
>> 
>> 
>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>> 
>> 
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> 
>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>> 
>> 
>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> 
>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>> 
>>  
>> 
>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> It can be set based on the resources requested for that container.
>> 
>>  
>> 
>>  
>> 
>> Thanks
>> 
>> Devaraj K
>> 
>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>> Sent: 19 June 2013 08:16
>> To: user@hadoop.apache.org
>> Subject: How Yarn execute MRv1 job?
>> 
>>  
>> 
>> Hi,
>> 
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> Thanks!
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Jun 19, 2013, at 6:45 PM, sam liu <sa...@gmail.com> wrote:

> Appreciating for the detailed answers! Here are three further questions:
> 
> - Yarn maintains backwards compatibility, and MRv1 job could run on Yarn. If yarn does not ask existing MRv1 job to do any code change, but why we should recompile the MRv1 job?

You don't need to recompile MRv1 jobs to run against YARN.

> - Which yarn jar files are required in the recompiling?
> - In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase 0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop 1.1.1 with yarn, do we need to recompile all other Hadoop related components again with yarn jar files? Without any code change?

You will need versions of HBase, Hive etc. which are integrated with hadoop-2.x, but not need to change any of your end-user applications (MR jobs, hive queries, pig scripts etc.)

Arun

> 
> Thanks in advance!
> 
> 
> 
> 2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>
> Thanks Arun and Devraj , good to know.
> 
> 
> 
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Not true, the CapacityScheduler has support for both CPU & Memory now.
> 
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> 
>> Hi Devaraj,
>> 
>> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
>> 
>> Thanks,
>> Rahul
>> 
>> 
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>> Hi Sam,
>> 
>>   Please find the answers for your queries.
>> 
>> 
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> 
>>  
>> 
>> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
>> 
>>  
>> 
>> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> 
>> These configurations still work for MR Job in Yarn.
>> 
>> 
>> >- What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
>> 
>> 
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> 
>> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
>> 
>> 
>> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> 
>> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
>> 
>>  
>> 
>> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> It can be set based on the resources requested for that container.
>> 
>>  
>> 
>>  
>> 
>> Thanks
>> 
>> Devaraj K
>> 
>> From: sam liu [mailto:samliuhadoop@gmail.com] 
>> Sent: 19 June 2013 08:16
>> To: user@hadoop.apache.org
>> Subject: How Yarn execute MRv1 job?
>> 
>>  
>> 
>> Hi,
>> 
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a job?
>> 
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
>> 
>> Thanks!
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Appreciating for the detailed answers! Here are three further questions:

- Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
If yarn does not ask existing MRv1 job to do any code change, but why we
should recompile the MRv1 job?
- Which yarn jar files are required in the recompiling?
- In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
1.1.1 with yarn, do we need to recompile all other Hadoop related
components again with yarn jar files? Without any code change?

Thanks in advance!



2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>

> Thanks Arun and Devraj , good to know.
>
>
>
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> Hi Devaraj,
>>
>> As for the container request request for yarn container , currently only
>> memory is considered as resource , not cpu. Please correct.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>
>>>  Hi Sam,****
>>>
>>>   Please find the answers for your queries. ****
>>>
>>>
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>
>>> ** **
>>>
>>> In Yarn, it is a concept of application. MR Job is one kind of
>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>> application). If we want to run different kinds of applications we should
>>> have ApplicationMaster for each kind of application.****
>>>
>>> ** **
>>>
>>> >- Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>
>>> These configurations still work for MR Job in Yarn.****
>>>
>>>
>>> >- What's the general process for ApplicationMaster of Yarn to execute a
>>> job?****
>>>
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>>> includes getting the containers for maps & reducers, launch the containers
>>> using NM, tacks the tasks status till completion, manage the failed tasks.
>>> ****
>>>
>>>
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?****
>>>
>>> Correct, these params don’t work in yarn. In Yarn it is completely based
>>> on the resources(memory, cpu). Application Master can request the RM for
>>> resources to complete the tasks for that application.****
>>>
>>>
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?****
>>>
>>> ApplicationMaster is responsible for getting the containers from RM by
>>> sending the resource requests. For MR Job, you can use
>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>> for specifying the map & reduce container memory sizes.****
>>>
>>> ** **
>>>
>>> >- How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> It can be set based on the resources requested for that container.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj K****
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* 19 June 2013 08:16
>>> *To:* user@hadoop.apache.org
>>> *Subject:* How Yarn execute MRv1 job?****
>>>
>>> ** **
>>>
>>> Hi,
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> Thanks!****
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Appreciating for the detailed answers! Here are three further questions:

- Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
If yarn does not ask existing MRv1 job to do any code change, but why we
should recompile the MRv1 job?
- Which yarn jar files are required in the recompiling?
- In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
1.1.1 with yarn, do we need to recompile all other Hadoop related
components again with yarn jar files? Without any code change?

Thanks in advance!



2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>

> Thanks Arun and Devraj , good to know.
>
>
>
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> Hi Devaraj,
>>
>> As for the container request request for yarn container , currently only
>> memory is considered as resource , not cpu. Please correct.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>
>>>  Hi Sam,****
>>>
>>>   Please find the answers for your queries. ****
>>>
>>>
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>
>>> ** **
>>>
>>> In Yarn, it is a concept of application. MR Job is one kind of
>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>> application). If we want to run different kinds of applications we should
>>> have ApplicationMaster for each kind of application.****
>>>
>>> ** **
>>>
>>> >- Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>
>>> These configurations still work for MR Job in Yarn.****
>>>
>>>
>>> >- What's the general process for ApplicationMaster of Yarn to execute a
>>> job?****
>>>
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>>> includes getting the containers for maps & reducers, launch the containers
>>> using NM, tacks the tasks status till completion, manage the failed tasks.
>>> ****
>>>
>>>
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?****
>>>
>>> Correct, these params don’t work in yarn. In Yarn it is completely based
>>> on the resources(memory, cpu). Application Master can request the RM for
>>> resources to complete the tasks for that application.****
>>>
>>>
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?****
>>>
>>> ApplicationMaster is responsible for getting the containers from RM by
>>> sending the resource requests. For MR Job, you can use
>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>> for specifying the map & reduce container memory sizes.****
>>>
>>> ** **
>>>
>>> >- How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> It can be set based on the resources requested for that container.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj K****
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* 19 June 2013 08:16
>>> *To:* user@hadoop.apache.org
>>> *Subject:* How Yarn execute MRv1 job?****
>>>
>>> ** **
>>>
>>> Hi,
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> Thanks!****
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Appreciating for the detailed answers! Here are three further questions:

- Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
If yarn does not ask existing MRv1 job to do any code change, but why we
should recompile the MRv1 job?
- Which yarn jar files are required in the recompiling?
- In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
1.1.1 with yarn, do we need to recompile all other Hadoop related
components again with yarn jar files? Without any code change?

Thanks in advance!



2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>

> Thanks Arun and Devraj , good to know.
>
>
>
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> Hi Devaraj,
>>
>> As for the container request request for yarn container , currently only
>> memory is considered as resource , not cpu. Please correct.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>
>>>  Hi Sam,****
>>>
>>>   Please find the answers for your queries. ****
>>>
>>>
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>
>>> ** **
>>>
>>> In Yarn, it is a concept of application. MR Job is one kind of
>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>> application). If we want to run different kinds of applications we should
>>> have ApplicationMaster for each kind of application.****
>>>
>>> ** **
>>>
>>> >- Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>
>>> These configurations still work for MR Job in Yarn.****
>>>
>>>
>>> >- What's the general process for ApplicationMaster of Yarn to execute a
>>> job?****
>>>
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>>> includes getting the containers for maps & reducers, launch the containers
>>> using NM, tacks the tasks status till completion, manage the failed tasks.
>>> ****
>>>
>>>
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?****
>>>
>>> Correct, these params don’t work in yarn. In Yarn it is completely based
>>> on the resources(memory, cpu). Application Master can request the RM for
>>> resources to complete the tasks for that application.****
>>>
>>>
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?****
>>>
>>> ApplicationMaster is responsible for getting the containers from RM by
>>> sending the resource requests. For MR Job, you can use
>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>> for specifying the map & reduce container memory sizes.****
>>>
>>> ** **
>>>
>>> >- How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> It can be set based on the resources requested for that container.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj K****
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* 19 June 2013 08:16
>>> *To:* user@hadoop.apache.org
>>> *Subject:* How Yarn execute MRv1 job?****
>>>
>>> ** **
>>>
>>> Hi,
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> Thanks!****
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by sam liu <sa...@gmail.com>.

Appreciating for the detailed answers! Here are three further questions:

- Yarn maintains backwards compatibility, and MRv1 job could run on Yarn.
If yarn does not ask existing MRv1 job to do any code change, but why we
should recompile the MRv1 job?
- Which yarn jar files are required in the recompiling?
- In a cluster with Hadoop 1.1.1 and other Hadoop related components(HBase
0.94.3,  Hive 0.9.0, Zookeeper 3.4.5,...), if we want to replace Hadoop
1.1.1 with yarn, do we need to recompile all other Hadoop related
components again with yarn jar files? Without any code change?

Thanks in advance!



2013/6/19 Rahul Bhattacharjee <ra...@gmail.com>

> Thanks Arun and Devraj , good to know.
>
>
>
> On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com>wrote:
>
>> Not true, the CapacityScheduler has support for both CPU & Memory now.
>>
>> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>> Hi Devaraj,
>>
>> As for the container request request for yarn container , currently only
>> memory is considered as resource , not cpu. Please correct.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>>
>>>  Hi Sam,****
>>>
>>>   Please find the answers for your queries. ****
>>>
>>>
>>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x, and
>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>> 1.x, like map, sort, merge, combine and shuffle?****
>>>
>>> ** **
>>>
>>> In Yarn, it is a concept of application. MR Job is one kind of
>>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>>> application). If we want to run different kinds of applications we should
>>> have ApplicationMaster for each kind of application.****
>>>
>>> ** **
>>>
>>> >- Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>>
>>> These configurations still work for MR Job in Yarn.****
>>>
>>>
>>> >- What's the general process for ApplicationMaster of Yarn to execute a
>>> job?****
>>>
>>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>>> includes getting the containers for maps & reducers, launch the containers
>>> using NM, tacks the tasks status till completion, manage the failed tasks.
>>> ****
>>>
>>>
>>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?****
>>>
>>> Correct, these params don’t work in yarn. In Yarn it is completely based
>>> on the resources(memory, cpu). Application Master can request the RM for
>>> resources to complete the tasks for that application.****
>>>
>>>
>>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?****
>>>
>>> ApplicationMaster is responsible for getting the containers from RM by
>>> sending the resource requests. For MR Job, you can use
>>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>>> for specifying the map & reduce container memory sizes.****
>>>
>>> ** **
>>>
>>> >- How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> It can be set based on the resources requested for that container.****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj K****
>>>
>>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>>> *Sent:* 19 June 2013 08:16
>>> *To:* user@hadoop.apache.org
>>> *Subject:* How Yarn execute MRv1 job?****
>>>
>>> ** **
>>>
>>> Hi,
>>>
>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>>> like map, sort, merge, combine and shuffle?
>>> - Do the MRv1 parameters still work for Yarn? Like
>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>> job?
>>>
>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>> 'mapred.tasktracker.map.tasks.maximum' and
>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>> container instead, right?
>>> - For Yarn, we can set the whole physical mem for a NodeManager using
>>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>>> physical mem of a container?
>>> - How to set the maximum size of physical mem of a container? By the
>>> parameter of 'mapred.child.java.opts'?****
>>>
>>> Thanks!****
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Arun and Devraj , good to know.



On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Not true, the CapacityScheduler has support for both CPU & Memory now.
>
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com>
> wrote:
>
> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Arun and Devraj , good to know.



On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Not true, the CapacityScheduler has support for both CPU & Memory now.
>
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com>
> wrote:
>
> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Arun and Devraj , good to know.



On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Not true, the CapacityScheduler has support for both CPU & Memory now.
>
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com>
> wrote:
>
> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Thanks Arun and Devraj , good to know.



On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Not true, the CapacityScheduler has support for both CPU & Memory now.
>
> On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com>
> wrote:
>
> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>
>  --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Not true, the CapacityScheduler has support for both CPU & Memory now.

On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

> Hi Devaraj,
> 
> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
> 
> Thanks,
> Rahul
> 
> 
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Sam,
> 
>   Please find the answers for your queries.
> 
> 
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> 
>  
> 
> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
> 
>  
> 
> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> 
> These configurations still work for MR Job in Yarn.
> 
> 
> >- What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
> 
> 
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> 
> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
> 
> 
> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> 
> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
> 
>  
> 
> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> It can be set based on the resources requested for that container.
> 
>  
> 
>  
> 
> Thanks
> 
> Devaraj K
> 
> From: sam liu [mailto:samliuhadoop@gmail.com] 
> Sent: 19 June 2013 08:16
> To: user@hadoop.apache.org
> Subject: How Yarn execute MRv1 job?
> 
>  
> 
> Hi,
> 
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> Thanks!
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

by please correct , i meant  - please correct me if my statement is wrong.


On Wed, Jun 19, 2013 at 11:11 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

by please correct , i meant  - please correct me if my statement is wrong.


On Wed, Jun 19, 2013 at 11:11 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

by please correct , i meant  - please correct me if my statement is wrong.


On Wed, Jun 19, 2013 at 11:11 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Not true, the CapacityScheduler has support for both CPU & Memory now.

On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

> Hi Devaraj,
> 
> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
> 
> Thanks,
> Rahul
> 
> 
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Sam,
> 
>   Please find the answers for your queries.
> 
> 
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> 
>  
> 
> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
> 
>  
> 
> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> 
> These configurations still work for MR Job in Yarn.
> 
> 
> >- What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
> 
> 
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> 
> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
> 
> 
> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> 
> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
> 
>  
> 
> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> It can be set based on the resources requested for that container.
> 
>  
> 
>  
> 
> Thanks
> 
> Devaraj K
> 
> From: sam liu [mailto:samliuhadoop@gmail.com] 
> Sent: 19 June 2013 08:16
> To: user@hadoop.apache.org
> Subject: How Yarn execute MRv1 job?
> 
>  
> 
> Hi,
> 
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> Thanks!
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Not true, the CapacityScheduler has support for both CPU & Memory now.

On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

> Hi Devaraj,
> 
> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
> 
> Thanks,
> Rahul
> 
> 
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Sam,
> 
>   Please find the answers for your queries.
> 
> 
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> 
>  
> 
> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
> 
>  
> 
> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> 
> These configurations still work for MR Job in Yarn.
> 
> 
> >- What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
> 
> 
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> 
> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
> 
> 
> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> 
> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
> 
>  
> 
> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> It can be set based on the resources requested for that container.
> 
>  
> 
>  
> 
> Thanks
> 
> Devaraj K
> 
> From: sam liu [mailto:samliuhadoop@gmail.com] 
> Sent: 19 June 2013 08:16
> To: user@hadoop.apache.org
> Subject: How Yarn execute MRv1 job?
> 
>  
> 
> Hi,
> 
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> Thanks!
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

by please correct , i meant  - please correct me if my statement is wrong.


On Wed, Jun 19, 2013 at 11:11 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi Devaraj,
>
> As for the container request request for yarn container , currently only
> memory is considered as resource , not cpu. Please correct.
>
> Thanks,
> Rahul
>
>
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
>
>>  Hi Sam,****
>>
>>   Please find the answers for your queries. ****
>>
>>
>> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?****
>>
>> ** **
>>
>> In Yarn, it is a concept of application. MR Job is one kind of
>> application which makes use of MRAppMaster(i.e ApplicationMaster for the
>> application). If we want to run different kinds of applications we should
>> have ApplicationMaster for each kind of application.****
>>
>> ** **
>>
>> >- Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>>
>> These configurations still work for MR Job in Yarn.****
>>
>>
>> >- What's the general process for ApplicationMaster of Yarn to execute a
>> job?****
>>
>> MRAppMaster(Application Master for MR Job) does the Job life cycle which
>> includes getting the containers for maps & reducers, launch the containers
>> using NM, tacks the tasks status till completion, manage the failed tasks.
>> ****
>>
>>
>> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> >- For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?****
>>
>> Correct, these params don’t work in yarn. In Yarn it is completely based
>> on the resources(memory, cpu). Application Master can request the RM for
>> resources to complete the tasks for that application.****
>>
>>
>> >- For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?****
>>
>> ApplicationMaster is responsible for getting the containers from RM by
>> sending the resource requests. For MR Job, you can use
>> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
>> for specifying the map & reduce container memory sizes.****
>>
>> ** **
>>
>> >- How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> It can be set based on the resources requested for that container.****
>>
>> ** **
>>
>> ** **
>>
>> Thanks****
>>
>> Devaraj K****
>>
>> *From:* sam liu [mailto:samliuhadoop@gmail.com]
>> *Sent:* 19 June 2013 08:16
>> *To:* user@hadoop.apache.org
>> *Subject:* How Yarn execute MRv1 job?****
>>
>> ** **
>>
>> Hi,
>>
>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>> together, with a typical process(map > shuffle > reduce). In Yarn, as I
>> know, a MRv1 job will be executed only by ApplicationMaster.
>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
>> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
>> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
>> like map, sort, merge, combine and shuffle?
>> - Do the MRv1 parameters still work for Yarn? Like
>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>> - What's the general process for ApplicationMaster of Yarn to execute a
>> job?
>>
>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>> 'mapred.tasktracker.map.tasks.maximum' and
>> 'mapred.tasktracker.reduce.tasks.maximum'
>> - For Yarn, above tow parameter do not work any more, as yarn uses
>> container instead, right?
>> - For Yarn, we can set the whole physical mem for a NodeManager using
>> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
>> physical mem of a container?
>> - How to set the maximum size of physical mem of a container? By the
>> parameter of 'mapred.child.java.opts'?****
>>
>> Thanks!****
>>
>
>

Re: How Yarn execute MRv1 job?

Posted by Arun C Murthy <ac...@hortonworks.com>.

Not true, the CapacityScheduler has support for both CPU & Memory now.

On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:

> Hi Devaraj,
> 
> As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct.
> 
> Thanks,
> Rahul
> 
> 
> On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Sam,
> 
>   Please find the answers for your queries.
> 
> 
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> 
>  
> 
> In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.
> 
>  
> 
> >- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> 
> These configurations still work for MR Job in Yarn.
> 
> 
> >- What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.
> 
> 
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> 
> Correct, these params don’t work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.
> 
> 
> >- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> 
> ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.
> 
>  
> 
> >- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> It can be set based on the resources requested for that container.
> 
>  
> 
>  
> 
> Thanks
> 
> Devaraj K
> 
> From: sam liu [mailto:samliuhadoop@gmail.com] 
> Sent: 19 June 2013 08:16
> To: user@hadoop.apache.org
> Subject: How Yarn execute MRv1 job?
> 
>  
> 
> Hi,
> 
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a job?
> 
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
> 
> Thanks!
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Hi Devaraj,

As for the container request request for yarn container , currently only
memory is considered as resource , not cpu. Please correct.

Thanks,
Rahul


On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:

>  Hi Sam,****
>
>   Please find the answers for your queries. ****
>
>
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?****
>
> ** **
>
> In Yarn, it is a concept of application. MR Job is one kind of application
> which makes use of MRAppMaster(i.e ApplicationMaster for the application).
> If we want to run different kinds of applications we should have
> ApplicationMaster for each kind of application.****
>
> ** **
>
> >- Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>
> These configurations still work for MR Job in Yarn.****
>
>
> >- What's the general process for ApplicationMaster of Yarn to execute a
> job?****
>
> MRAppMaster(Application Master for MR Job) does the Job life cycle which
> includes getting the containers for maps & reducers, launch the containers
> using NM, tacks the tasks status till completion, manage the failed tasks.
> ****
>
>
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?****
>
> Correct, these params don’t work in yarn. In Yarn it is completely based
> on the resources(memory, cpu). Application Master can request the RM for
> resources to complete the tasks for that application.****
>
>
> >- For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?****
>
> ApplicationMaster is responsible for getting the containers from RM by
> sending the resource requests. For MR Job, you can use
> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
> for specifying the map & reduce container memory sizes.****
>
> ** **
>
> >- How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> It can be set based on the resources requested for that container.****
>
> ** **
>
> ** **
>
> Thanks****
>
> Devaraj K****
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* 19 June 2013 08:16
> *To:* user@hadoop.apache.org
> *Subject:* How Yarn execute MRv1 job?****
>
> ** **
>
> Hi,
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> Thanks!****
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Hi Devaraj,

As for the container request request for yarn container , currently only
memory is considered as resource , not cpu. Please correct.

Thanks,
Rahul


On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:

>  Hi Sam,****
>
>   Please find the answers for your queries. ****
>
>
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?****
>
> ** **
>
> In Yarn, it is a concept of application. MR Job is one kind of application
> which makes use of MRAppMaster(i.e ApplicationMaster for the application).
> If we want to run different kinds of applications we should have
> ApplicationMaster for each kind of application.****
>
> ** **
>
> >- Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>
> These configurations still work for MR Job in Yarn.****
>
>
> >- What's the general process for ApplicationMaster of Yarn to execute a
> job?****
>
> MRAppMaster(Application Master for MR Job) does the Job life cycle which
> includes getting the containers for maps & reducers, launch the containers
> using NM, tacks the tasks status till completion, manage the failed tasks.
> ****
>
>
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?****
>
> Correct, these params don’t work in yarn. In Yarn it is completely based
> on the resources(memory, cpu). Application Master can request the RM for
> resources to complete the tasks for that application.****
>
>
> >- For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?****
>
> ApplicationMaster is responsible for getting the containers from RM by
> sending the resource requests. For MR Job, you can use
> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
> for specifying the map & reduce container memory sizes.****
>
> ** **
>
> >- How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> It can be set based on the resources requested for that container.****
>
> ** **
>
> ** **
>
> Thanks****
>
> Devaraj K****
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* 19 June 2013 08:16
> *To:* user@hadoop.apache.org
> *Subject:* How Yarn execute MRv1 job?****
>
> ** **
>
> Hi,
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> Thanks!****
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Hi Devaraj,

As for the container request request for yarn container , currently only
memory is considered as resource , not cpu. Please correct.

Thanks,
Rahul


On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:

>  Hi Sam,****
>
>   Please find the answers for your queries. ****
>
>
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?****
>
> ** **
>
> In Yarn, it is a concept of application. MR Job is one kind of application
> which makes use of MRAppMaster(i.e ApplicationMaster for the application).
> If we want to run different kinds of applications we should have
> ApplicationMaster for each kind of application.****
>
> ** **
>
> >- Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>
> These configurations still work for MR Job in Yarn.****
>
>
> >- What's the general process for ApplicationMaster of Yarn to execute a
> job?****
>
> MRAppMaster(Application Master for MR Job) does the Job life cycle which
> includes getting the containers for maps & reducers, launch the containers
> using NM, tacks the tasks status till completion, manage the failed tasks.
> ****
>
>
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?****
>
> Correct, these params don’t work in yarn. In Yarn it is completely based
> on the resources(memory, cpu). Application Master can request the RM for
> resources to complete the tasks for that application.****
>
>
> >- For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?****
>
> ApplicationMaster is responsible for getting the containers from RM by
> sending the resource requests. For MR Job, you can use
> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
> for specifying the map & reduce container memory sizes.****
>
> ** **
>
> >- How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> It can be set based on the resources requested for that container.****
>
> ** **
>
> ** **
>
> Thanks****
>
> Devaraj K****
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* 19 June 2013 08:16
> *To:* user@hadoop.apache.org
> *Subject:* How Yarn execute MRv1 job?****
>
> ** **
>
> Hi,
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> Thanks!****
>

Re: How Yarn execute MRv1 job?

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Hi Devaraj,

As for the container request request for yarn container , currently only
memory is considered as resource , not cpu. Please correct.

Thanks,
Rahul


On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k <de...@huawei.com> wrote:

>  Hi Sam,****
>
>   Please find the answers for your queries. ****
>
>
> >- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?****
>
> ** **
>
> In Yarn, it is a concept of application. MR Job is one kind of application
> which makes use of MRAppMaster(i.e ApplicationMaster for the application).
> If we want to run different kinds of applications we should have
> ApplicationMaster for each kind of application.****
>
> ** **
>
> >- Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?****
>
> These configurations still work for MR Job in Yarn.****
>
>
> >- What's the general process for ApplicationMaster of Yarn to execute a
> job?****
>
> MRAppMaster(Application Master for MR Job) does the Job life cycle which
> includes getting the containers for maps & reducers, launch the containers
> using NM, tacks the tasks status till completion, manage the failed tasks.
> ****
>
>
> >2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> >- For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?****
>
> Correct, these params don’t work in yarn. In Yarn it is completely based
> on the resources(memory, cpu). Application Master can request the RM for
> resources to complete the tasks for that application.****
>
>
> >- For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?****
>
> ApplicationMaster is responsible for getting the containers from RM by
> sending the resource requests. For MR Job, you can use
> "mapreduce.map.memory.mb" and “mapreduce.reduce.memory.mb" configurations
> for specifying the map & reduce container memory sizes.****
>
> ** **
>
> >- How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> It can be set based on the resources requested for that container.****
>
> ** **
>
> ** **
>
> Thanks****
>
> Devaraj K****
>
> *From:* sam liu [mailto:samliuhadoop@gmail.com]
> *Sent:* 19 June 2013 08:16
> *To:* user@hadoop.apache.org
> *Subject:* How Yarn execute MRv1 job?****
>
> ** **
>
> Hi,
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using
> 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?****
>
> Thanks!****
>

RE: How Yarn execute MRv1 job?

Posted by Devaraj k <de...@huawei.com>.

Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
Correct, these params don't work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
Thanks!

RE: How Yarn execute MRv1 job?

Posted by Devaraj k <de...@huawei.com>.

Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
Correct, these params don't work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
Thanks!

RE: How Yarn execute MRv1 job?

Posted by Devaraj k <de...@huawei.com>.

Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
Correct, these params don't work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
Thanks!

RE: How Yarn execute MRv1 job?

Posted by Devaraj k <de...@huawei.com>.

Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application which makes use of MRAppMaster(i.e ApplicationMaster for the application). If we want to run different kinds of applications we should have ApplicationMaster for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which includes getting the containers for maps & reducers, launch the containers using NM, tacks the tasks status till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
Correct, these params don't work in yarn. In Yarn it is completely based on the resources(memory, cpu). Application Master can request the RM for resources to complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb" configurations for specifying the map & reduce container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhadoop@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 'mapred.tasktracker.map.tasks.maximum' and 'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 'yarn.nodemanager.resource.memory-mb'. But how to set the default size of physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter of 'mapred.child.java.opts'?
Thanks!