You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Nirmal Kumar <ni...@impetus.co.in> on 2014/01/15 14:21:36 UTC

Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot for the help.

-Nirmal
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Friday, January 17, 2014 9:09 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster


On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
*         {yarn.scheduler.minimum-allocation-mb}
*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:
*         {yarn.nodemanager.resource.memory-mb}



Exactly! :)


My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.
*         {mapreduce.map.memory.mb}
*         {mapreduce.map.java.opts}
*         {mapreduce.reduce.memory.mb}
*         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 ...
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 ...

hth,
Arun



Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com<http://hortonworks.com>]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:

Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot for the help.

-Nirmal
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Friday, January 17, 2014 9:09 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster


On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
*         {yarn.scheduler.minimum-allocation-mb}
*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:
*         {yarn.nodemanager.resource.memory-mb}



Exactly! :)


My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.
*         {mapreduce.map.memory.mb}
*         {mapreduce.map.java.opts}
*         {mapreduce.reduce.memory.mb}
*         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 ...
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 ...

hth,
Arun



Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com<http://hortonworks.com>]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:

Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot for the help.

-Nirmal
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Friday, January 17, 2014 9:09 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster


On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
*         {yarn.scheduler.minimum-allocation-mb}
*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:
*         {yarn.nodemanager.resource.memory-mb}



Exactly! :)


My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.
*         {mapreduce.map.memory.mb}
*         {mapreduce.map.java.opts}
*         {mapreduce.reduce.memory.mb}
*         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 ...
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 ...

hth,
Arun



Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com<http://hortonworks.com>]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:

Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot for the help.

-Nirmal
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Friday, January 17, 2014 9:09 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster


On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
*         {yarn.scheduler.minimum-allocation-mb}
*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:
*         {yarn.nodemanager.resource.memory-mb}



Exactly! :)


My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.
*         {mapreduce.map.memory.mb}
*         {mapreduce.map.java.opts}
*         {mapreduce.reduce.memory.mb}
*         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 ...
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 ...

hth,
Arun



Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com<http://hortonworks.com>]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:

Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi Arun,
>  
> Thanks a lot for the clarification.
>  
> I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
> ·         {yarn.scheduler.minimum-allocation-mb}
> ·         {yarn.scheduler.maximum-allocation-mb}
>  
> Then at each of the node I can set the NM memory using:
> ·         {yarn.nodemanager.resource.memory-mb}
>  
> 

Exactly! :)

> My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
> That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine’s power we can adjust the memory options for the M\R tasks.
> ·         {mapreduce.map.memory.mb}
> ·         {mapreduce.map.java.opts}
> ·         {mapreduce.reduce.memory.mb}
> ·         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 …
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 …

hth,
Arun

>  
> Thanks,
> -Nirmal
>  
> From: Arun C Murthy [mailto:acm@hortonworks.com] 
> Sent: Thursday, January 16, 2014 7:43 PM
> To: user@hadoop.apache.org
> Subject: Re: Doubts: Deployment and Configuration of YARN cluster
>  
> No, you can set resources available in each node to be different…
>  
> For e.g. Node A: 10G, Node B: 12G.
>  
> Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.
>  
> hth,
> Arun
>  
> On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:
> 
> 
> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi Arun,
>  
> Thanks a lot for the clarification.
>  
> I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
> ·         {yarn.scheduler.minimum-allocation-mb}
> ·         {yarn.scheduler.maximum-allocation-mb}
>  
> Then at each of the node I can set the NM memory using:
> ·         {yarn.nodemanager.resource.memory-mb}
>  
> 

Exactly! :)

> My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
> That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine’s power we can adjust the memory options for the M\R tasks.
> ·         {mapreduce.map.memory.mb}
> ·         {mapreduce.map.java.opts}
> ·         {mapreduce.reduce.memory.mb}
> ·         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 …
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 …

hth,
Arun

>  
> Thanks,
> -Nirmal
>  
> From: Arun C Murthy [mailto:acm@hortonworks.com] 
> Sent: Thursday, January 16, 2014 7:43 PM
> To: user@hadoop.apache.org
> Subject: Re: Doubts: Deployment and Configuration of YARN cluster
>  
> No, you can set resources available in each node to be different…
>  
> For e.g. Node A: 10G, Node B: 12G.
>  
> Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.
>  
> hth,
> Arun
>  
> On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:
> 
> 
> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi Arun,
>  
> Thanks a lot for the clarification.
>  
> I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
> ·         {yarn.scheduler.minimum-allocation-mb}
> ·         {yarn.scheduler.maximum-allocation-mb}
>  
> Then at each of the node I can set the NM memory using:
> ·         {yarn.nodemanager.resource.memory-mb}
>  
> 

Exactly! :)

> My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
> That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine’s power we can adjust the memory options for the M\R tasks.
> ·         {mapreduce.map.memory.mb}
> ·         {mapreduce.map.java.opts}
> ·         {mapreduce.reduce.memory.mb}
> ·         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 …
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 …

hth,
Arun

>  
> Thanks,
> -Nirmal
>  
> From: Arun C Murthy [mailto:acm@hortonworks.com] 
> Sent: Thursday, January 16, 2014 7:43 PM
> To: user@hadoop.apache.org
> Subject: Re: Doubts: Deployment and Configuration of YARN cluster
>  
> No, you can set resources available in each node to be different…
>  
> For e.g. Node A: 10G, Node B: 12G.
>  
> Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.
>  
> hth,
> Arun
>  
> On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:
> 
> 
> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jan 16, 2014, at 9:14 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi Arun,
>  
> Thanks a lot for the clarification.
>  
> I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:
> ·         {yarn.scheduler.minimum-allocation-mb}
> ·         {yarn.scheduler.maximum-allocation-mb}
>  
> Then at each of the node I can set the NM memory using:
> ·         {yarn.nodemanager.resource.memory-mb}
>  
> 

Exactly! :)

> My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
> That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine’s power we can adjust the memory options for the M\R tasks.
> ·         {mapreduce.map.memory.mb}
> ·         {mapreduce.map.java.opts}
> ·         {mapreduce.reduce.memory.mb}
> ·         {mapreduce.reduce.java.opts}

You can change these for every single job, so each job can have different requirements.

$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=1024 …
$ bin/hadoop jar hadoop-examples.jar word count -Dmapreduce.map.memory.mb=2048 …

hth,
Arun

>  
> Thanks,
> -Nirmal
>  
> From: Arun C Murthy [mailto:acm@hortonworks.com] 
> Sent: Thursday, January 16, 2014 7:43 PM
> To: user@hadoop.apache.org
> Subject: Re: Doubts: Deployment and Configuration of YARN cluster
>  
> No, you can set resources available in each node to be different…
>  
> For e.g. Node A: 10G, Node B: 12G.
>  
> Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.
>  
> hth,
> Arun
>  
> On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:
> 
> 
> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
>  
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:

*         {yarn.scheduler.minimum-allocation-mb}

*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:

*         {yarn.nodemanager.resource.memory-mb}

My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.

*         {mapreduce.map.memory.mb}

*         {mapreduce.map.java.opts}

*         {mapreduce.reduce.memory.mb}

*         {mapreduce.reduce.java.opts}

Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:

*         {yarn.scheduler.minimum-allocation-mb}

*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:

*         {yarn.nodemanager.resource.memory-mb}

My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.

*         {mapreduce.map.memory.mb}

*         {mapreduce.map.java.opts}

*         {mapreduce.reduce.memory.mb}

*         {mapreduce.reduce.java.opts}

Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:

*         {yarn.scheduler.minimum-allocation-mb}

*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:

*         {yarn.nodemanager.resource.memory-mb}

My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.

*         {mapreduce.map.memory.mb}

*         {mapreduce.map.java.opts}

*         {mapreduce.reduce.memory.mb}

*         {mapreduce.reduce.java.opts}

Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi Arun,

Thanks a lot for the clarification.

I understand it like in *yarn-site.xml* first I can set the max. and min. container size GLOBALLY for all the nodes in the cluster through:

*         {yarn.scheduler.minimum-allocation-mb}

*         {yarn.scheduler.maximum-allocation-mb}

Then at each of the node I can set the NM memory using:

*         {yarn.nodemanager.resource.memory-mb}

My 2nd doubt is whether we can run Mappers\Reducers tasks with varying memory options at each of the slave nodes.
That is, can we change the following properties in *mapred-site.xml* at each of the slave nodes? This is because depending on the machine's power we can adjust the memory options for the M\R tasks.

*         {mapreduce.map.memory.mb}

*         {mapreduce.map.java.opts}

*         {mapreduce.reduce.memory.mb}

*         {mapreduce.reduce.java.opts}

Thanks,
-Nirmal

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, January 16, 2014 7:43 PM
To: user@hadoop.apache.org
Subject: Re: Doubts: Deployment and Configuration of YARN cluster

No, you can set resources available in each node to be different...

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in>> wrote:


Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
*         conf/core-site.xml
*         conf/hdfs-site.xml
*         conf/yarn-site.xml
*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:
*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
No, you can set resources available in each node to be different…

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
No, you can set resources available in each node to be different…

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
No, you can set resources available in each node to be different…

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by Arun C Murthy <ac...@hortonworks.com>.
No, you can set resources available in each node to be different…

For e.g. Node A: 10G, Node B: 12G.

Now, if min. container size is 1G, the RM will allocate 10 containers to Node A and 12 containers to Node B.

hth,
Arun

On Jan 15, 2014, at 11:03 PM, Nirmal Kumar <ni...@impetus.co.in> wrote:

> Hi German,
>  
> I went through the links for memory configuration settings/best-practices.
> It considers the cluster to be homogenous i.e. same RAM size in all the nodes.
>  
> Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
> This resource model serves current applications well
> in homogeneous environments, but we expect it to
> evolve over time as the ecosystem matures and new requirements
> emerge.
>  
> Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
> How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?
>  
> Thanks,
> -Nirmal
>  
> From: Nirmal Kumar 
> Sent: Wednesday, January 15, 2014 8:22 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Thanks a lot German.
>  
> Will go through the links and see if that answers my questions\doubts.
>  
> -Nirmal
>  
> From: German Florez-Larrahondo [mailto:german.fl@samsung.com] 
> Sent: Wednesday, January 15, 2014 7:20 PM
> To: user@hadoop.apache.org
> Subject: RE: Doubts: Deployment and Configuration of YARN cluster
>  
> Nirmal
>  
> -A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>  
> -A good intro to YARN configuration is this:
> http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html
>  
> Regards
> .g
>  
>  
>  
> From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
> Sent: Wednesday, January 15, 2014 7:22 AM
> To: user@hadoop.apache.org
> Subject: Doubts: Deployment and Configuration of YARN cluster
>  
> All,
>  
> I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.
>  
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:
> ·         conf/core-site.xml
> ·         conf/hdfs-site.xml
> ·         conf/yarn-site.xml
> ·         conf/mapred-site.xml
>  
> Also we need to ONLY change the following file on each slave nodes:
> ·         conf/hdfs-site.xml
> Need to mention the {dfs.datanode.name.dir} value
>  
> Do we need to change any other config file on the slave nodes?
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
> This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.
>  
> Also,
> {mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
> {mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
> {mapreduce.reduce.memory.mb}
> {mapreduce.reduce.java.opts}
> are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
> or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.
>  
> I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.
>  
> Thanks,
> -Nirmal
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Hi German,

I went through the links for memory configuration settings/best-practices.
It considers the cluster to be homogenous i.e. same RAM size in all the nodes.

Also on the Yarn whitepaper(Section 3.2 Page 6) I see:
This resource model serves current applications well
in homogeneous environments, but we expect it to
evolve over time as the ecosystem matures and new requirements
emerge.

Does that mean in YARN in order to configure processing capacity like Container Size, No. of Containers, No. of Mappers\Reducers the cluster has to be homogenous?
How about if I have a *heterogeneous cluster* with varying RAM, disks , cores?

Thanks,
-Nirmal

From: Nirmal Kumar
Sent: Wednesday, January 15, 2014 8:22 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by Nirmal Kumar <ni...@impetus.co.in>.
Thanks a lot German.

Will go through the links and see if that answers my questions\doubts.

-Nirmal

From: German Florez-Larrahondo [mailto:german.fl@samsung.com]
Sent: Wednesday, January 15, 2014 7:20 PM
To: user@hadoop.apache.org
Subject: RE: Doubts: Deployment and Configuration of YARN cluster

Nirmal

-A good summary regarding memory configuration settings/best-practices can be found here. Note that in YARN, the way you configure resource limits dictates number of containers in the nodes and in the cluster:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html

-A good intro to YARN configuration is this:
http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce_11.html

Regards
.g



From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in]
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Doubts: Deployment and Configuration of YARN cluster

All,

I am new to YARN and have certain doubts regarding the deployment and configuration of YARN on a cluster.

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need to distribute the below files to all the slave nodes in the cluster:

*         conf/core-site.xml

*         conf/hdfs-site.xml

*         conf/yarn-site.xml

*         conf/mapred-site.xml

Also we need to ONLY change the following file on each slave nodes:

*         conf/hdfs-site.xml
Need to mention the {dfs.datanode.name.dir} value

Do we need to change any other config file on the slave nodes?
Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on the slave nodes?
This is since I might have a *heterogeneous environment* i.e. different nodes with different memory and cores. For NM1 I might have 40GB memory and for the other say 20GB.

Also,
{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a Hadoop task subprocess.
{mapreduce.map.java.opts}         specify the *max. heap space* of the allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
{mapreduce.reduce.memory.mb}
{mapreduce.reduce.java.opts}
are the above properties applicable to all the Map\Reduce tasks(from different Map Reduce applications) in general, running on different slave nodes?
or Can I change these for a particular slave node.? For e.g. say for a SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the map task with 8GB. Same with the reduce task.

I need some understanding to *configure processing capacity* in the cluster like Container Size, No. of Containers, No. of Mappers\Reducers.

Thanks,
-Nirmal

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Nirmal

 

-A good summary regarding memory configuration settings/best-practices can
be found here. Note that in YARN, the way you configure resource limits
dictates number of containers in the nodes and in the cluster:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk
_installing_manually_book/content/rpm-chap1-11.html

 

-A good intro to YARN configuration is this:

http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce
_11.html

 

Regards

.g

 

 

 

From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org
Subject: Doubts: Deployment and Configuration of YARN cluster

 

All,

 

I am new to YARN and have certain doubts regarding the deployment and
configuration of YARN on a cluster.

 

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need
to distribute the below files to all the slave nodes in the cluster:

.         conf/core-site.xml

.         conf/hdfs-site.xml

.         conf/yarn-site.xml

.         conf/mapred-site.xml

 

Also we need to ONLY change the following file on each slave nodes:

.         conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

 

Do we need to change any other config file on the slave nodes?

Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
the slave nodes? 

This is since I might have a *heterogeneous environment* i.e. different
nodes with different memory and cores. For NM1 I might have 40GB memory and
for the other say 20GB.

 

Also,

{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a
Hadoop task subprocess.

{mapreduce.map.java.opts}         specify the *max. heap space* of the
allocated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memory.mb}

{mapreduce.reduce.java.opts}

are the above properties applicable to all the Map\Reduce tasks(from
different Map Reduce applications) in general, running on different slave
nodes?

or Can I change these for a particular slave node.? For e.g. say for a
SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
map task with 8GB. Same with the reduce task.

 

I need some understanding to *configure processing capacity* in the cluster
like Container Size, No. of Containers, No. of Mappers\Reducers. 

 

Thanks,

-Nirmal

 

  _____  







NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.


Re: Doubts: Deployment and Configuration of YARN cluster

Posted by sudhakara st <su...@gmail.com>.
Hello Nirmal,

    No specific config file changes required for slave node. For
{dfs.datanode.name.dir}
variable changes also not require if have same kind mount point in all
slave. If mount are different then you have edit this variable in specific
to slave node. Running heterogeneous hardware among the slave nodes is not
recommended, sure that it has lot impact on when your running MR jobs in
Hadoop.1. I am have not much clear on how it works in Resouce manager in
Hadoop.2.
      Diffirent values {mapreduce.map.memory.mb}  and
{mapreduce.reduce.memory.mb}  is going to create long tail problem,
inefficient usage resources, starvation in the cluster.  Changes in the
{mapreduce.reduce.java.opts} and {mapreduce.map.java.opts}  going to impact
less but chance of task failure is more when your job are I/O intensive and
you allocated less memory and allocation of memory leads memory is
allocated but not used, not available for required.


On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  All,
>
>
>
> I am new to YARN and have certain doubts regarding the deployment and
> configuration of YARN on a cluster.
>
>
>
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we
> need to distribute the below files to all the slave nodes in the cluster:
>
> ·         conf/core-site.xml
>
> ·         conf/hdfs-site.xml
>
> ·         conf/yarn-site.xml
>
> ·         conf/mapred-site.xml
>
>
>
> Also we need to ONLY change the following file on each slave nodes:
>
> ·         conf/hdfs-site.xml
>
> Need to mention the {dfs.datanode.name.dir} value
>
>
>
> Do we need to change any other config file on the slave nodes?
>
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
> the slave nodes?
>
> This is since I might have a **heterogeneous environment** i.e. different
> nodes with different memory and cores. For NM1 I might have 40GB memory and
> for the other say 20GB.
>
>
>
> Also,
>
> {mapreduce.map.memory.mb}   specifies the **max. virtual memory** allowed
> by a Hadoop task subprocess.
>
> {mapreduce.map.java.opts}         specify the **max. heap space** of the
> allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
>
> {mapreduce.reduce.memory.mb}
>
> {mapreduce.reduce.java.opts}
>
> are the above properties applicable to all the Map\Reduce tasks(from
> different Map Reduce applications) in general, running on different slave
> nodes?
>
> or Can I change these for a particular slave node.? For e.g. say for a
> SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
> map task with 8GB. Same with the reduce task.
>
>
>
> I need some understanding to **configure processing capacity** in the
> cluster like *Container Size, No. of Containers, No. of Mappers\Reducers*.
>
>
>
>
> Thanks,
>
> -Nirmal
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 

Regards,
...Sudhakara.st

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by sudhakara st <su...@gmail.com>.
Hello Nirmal,

    No specific config file changes required for slave node. For
{dfs.datanode.name.dir}
variable changes also not require if have same kind mount point in all
slave. If mount are different then you have edit this variable in specific
to slave node. Running heterogeneous hardware among the slave nodes is not
recommended, sure that it has lot impact on when your running MR jobs in
Hadoop.1. I am have not much clear on how it works in Resouce manager in
Hadoop.2.
      Diffirent values {mapreduce.map.memory.mb}  and
{mapreduce.reduce.memory.mb}  is going to create long tail problem,
inefficient usage resources, starvation in the cluster.  Changes in the
{mapreduce.reduce.java.opts} and {mapreduce.map.java.opts}  going to impact
less but chance of task failure is more when your job are I/O intensive and
you allocated less memory and allocation of memory leads memory is
allocated but not used, not available for required.


On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  All,
>
>
>
> I am new to YARN and have certain doubts regarding the deployment and
> configuration of YARN on a cluster.
>
>
>
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we
> need to distribute the below files to all the slave nodes in the cluster:
>
> ·         conf/core-site.xml
>
> ·         conf/hdfs-site.xml
>
> ·         conf/yarn-site.xml
>
> ·         conf/mapred-site.xml
>
>
>
> Also we need to ONLY change the following file on each slave nodes:
>
> ·         conf/hdfs-site.xml
>
> Need to mention the {dfs.datanode.name.dir} value
>
>
>
> Do we need to change any other config file on the slave nodes?
>
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
> the slave nodes?
>
> This is since I might have a **heterogeneous environment** i.e. different
> nodes with different memory and cores. For NM1 I might have 40GB memory and
> for the other say 20GB.
>
>
>
> Also,
>
> {mapreduce.map.memory.mb}   specifies the **max. virtual memory** allowed
> by a Hadoop task subprocess.
>
> {mapreduce.map.java.opts}         specify the **max. heap space** of the
> allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
>
> {mapreduce.reduce.memory.mb}
>
> {mapreduce.reduce.java.opts}
>
> are the above properties applicable to all the Map\Reduce tasks(from
> different Map Reduce applications) in general, running on different slave
> nodes?
>
> or Can I change these for a particular slave node.? For e.g. say for a
> SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
> map task with 8GB. Same with the reduce task.
>
>
>
> I need some understanding to **configure processing capacity** in the
> cluster like *Container Size, No. of Containers, No. of Mappers\Reducers*.
>
>
>
>
> Thanks,
>
> -Nirmal
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 

Regards,
...Sudhakara.st

Re: Doubts: Deployment and Configuration of YARN cluster

Posted by sudhakara st <su...@gmail.com>.
Hello Nirmal,

    No specific config file changes required for slave node. For
{dfs.datanode.name.dir}
variable changes also not require if have same kind mount point in all
slave. If mount are different then you have edit this variable in specific
to slave node. Running heterogeneous hardware among the slave nodes is not
recommended, sure that it has lot impact on when your running MR jobs in
Hadoop.1. I am have not much clear on how it works in Resouce manager in
Hadoop.2.
      Diffirent values {mapreduce.map.memory.mb}  and
{mapreduce.reduce.memory.mb}  is going to create long tail problem,
inefficient usage resources, starvation in the cluster.  Changes in the
{mapreduce.reduce.java.opts} and {mapreduce.map.java.opts}  going to impact
less but chance of task failure is more when your job are I/O intensive and
you allocated less memory and allocation of memory leads memory is
allocated but not used, not available for required.


On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  All,
>
>
>
> I am new to YARN and have certain doubts regarding the deployment and
> configuration of YARN on a cluster.
>
>
>
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we
> need to distribute the below files to all the slave nodes in the cluster:
>
> ·         conf/core-site.xml
>
> ·         conf/hdfs-site.xml
>
> ·         conf/yarn-site.xml
>
> ·         conf/mapred-site.xml
>
>
>
> Also we need to ONLY change the following file on each slave nodes:
>
> ·         conf/hdfs-site.xml
>
> Need to mention the {dfs.datanode.name.dir} value
>
>
>
> Do we need to change any other config file on the slave nodes?
>
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
> the slave nodes?
>
> This is since I might have a **heterogeneous environment** i.e. different
> nodes with different memory and cores. For NM1 I might have 40GB memory and
> for the other say 20GB.
>
>
>
> Also,
>
> {mapreduce.map.memory.mb}   specifies the **max. virtual memory** allowed
> by a Hadoop task subprocess.
>
> {mapreduce.map.java.opts}         specify the **max. heap space** of the
> allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
>
> {mapreduce.reduce.memory.mb}
>
> {mapreduce.reduce.java.opts}
>
> are the above properties applicable to all the Map\Reduce tasks(from
> different Map Reduce applications) in general, running on different slave
> nodes?
>
> or Can I change these for a particular slave node.? For e.g. say for a
> SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
> map task with 8GB. Same with the reduce task.
>
>
>
> I need some understanding to **configure processing capacity** in the
> cluster like *Container Size, No. of Containers, No. of Mappers\Reducers*.
>
>
>
>
> Thanks,
>
> -Nirmal
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 

Regards,
...Sudhakara.st

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Nirmal

 

-A good summary regarding memory configuration settings/best-practices can
be found here. Note that in YARN, the way you configure resource limits
dictates number of containers in the nodes and in the cluster:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk
_installing_manually_book/content/rpm-chap1-11.html

 

-A good intro to YARN configuration is this:

http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce
_11.html

 

Regards

.g

 

 

 

From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org
Subject: Doubts: Deployment and Configuration of YARN cluster

 

All,

 

I am new to YARN and have certain doubts regarding the deployment and
configuration of YARN on a cluster.

 

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need
to distribute the below files to all the slave nodes in the cluster:

.         conf/core-site.xml

.         conf/hdfs-site.xml

.         conf/yarn-site.xml

.         conf/mapred-site.xml

 

Also we need to ONLY change the following file on each slave nodes:

.         conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

 

Do we need to change any other config file on the slave nodes?

Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
the slave nodes? 

This is since I might have a *heterogeneous environment* i.e. different
nodes with different memory and cores. For NM1 I might have 40GB memory and
for the other say 20GB.

 

Also,

{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a
Hadoop task subprocess.

{mapreduce.map.java.opts}         specify the *max. heap space* of the
allocated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memory.mb}

{mapreduce.reduce.java.opts}

are the above properties applicable to all the Map\Reduce tasks(from
different Map Reduce applications) in general, running on different slave
nodes?

or Can I change these for a particular slave node.? For e.g. say for a
SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
map task with 8GB. Same with the reduce task.

 

I need some understanding to *configure processing capacity* in the cluster
like Container Size, No. of Containers, No. of Mappers\Reducers. 

 

Thanks,

-Nirmal

 

  _____  







NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.


Re: Doubts: Deployment and Configuration of YARN cluster

Posted by sudhakara st <su...@gmail.com>.
Hello Nirmal,

    No specific config file changes required for slave node. For
{dfs.datanode.name.dir}
variable changes also not require if have same kind mount point in all
slave. If mount are different then you have edit this variable in specific
to slave node. Running heterogeneous hardware among the slave nodes is not
recommended, sure that it has lot impact on when your running MR jobs in
Hadoop.1. I am have not much clear on how it works in Resouce manager in
Hadoop.2.
      Diffirent values {mapreduce.map.memory.mb}  and
{mapreduce.reduce.memory.mb}  is going to create long tail problem,
inefficient usage resources, starvation in the cluster.  Changes in the
{mapreduce.reduce.java.opts} and {mapreduce.map.java.opts}  going to impact
less but chance of task failure is more when your job are I/O intensive and
you allocated less memory and allocation of memory leads memory is
allocated but not used, not available for required.


On Wed, Jan 15, 2014 at 6:51 PM, Nirmal Kumar <ni...@impetus.co.in>wrote:

>  All,
>
>
>
> I am new to YARN and have certain doubts regarding the deployment and
> configuration of YARN on a cluster.
>
>
>
> As per my understanding to deploy Hadoop 2.x using YARN on a cluster we
> need to distribute the below files to all the slave nodes in the cluster:
>
> ·         conf/core-site.xml
>
> ·         conf/hdfs-site.xml
>
> ·         conf/yarn-site.xml
>
> ·         conf/mapred-site.xml
>
>
>
> Also we need to ONLY change the following file on each slave nodes:
>
> ·         conf/hdfs-site.xml
>
> Need to mention the {dfs.datanode.name.dir} value
>
>
>
> Do we need to change any other config file on the slave nodes?
>
> Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
> the slave nodes?
>
> This is since I might have a **heterogeneous environment** i.e. different
> nodes with different memory and cores. For NM1 I might have 40GB memory and
> for the other say 20GB.
>
>
>
> Also,
>
> {mapreduce.map.memory.mb}   specifies the **max. virtual memory** allowed
> by a Hadoop task subprocess.
>
> {mapreduce.map.java.opts}         specify the **max. heap space** of the
> allocated jvm. If you exceed the max heap size, the JVM throws an OOM.
>
> {mapreduce.reduce.memory.mb}
>
> {mapreduce.reduce.java.opts}
>
> are the above properties applicable to all the Map\Reduce tasks(from
> different Map Reduce applications) in general, running on different slave
> nodes?
>
> or Can I change these for a particular slave node.? For e.g. say for a
> SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
> map task with 8GB. Same with the reduce task.
>
>
>
> I need some understanding to **configure processing capacity** in the
> cluster like *Container Size, No. of Containers, No. of Mappers\Reducers*.
>
>
>
>
> Thanks,
>
> -Nirmal
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 

Regards,
...Sudhakara.st

RE: Doubts: Deployment and Configuration of YARN cluster

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Nirmal

 

-A good summary regarding memory configuration settings/best-practices can
be found here. Note that in YARN, the way you configure resource limits
dictates number of containers in the nodes and in the cluster:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk
_installing_manually_book/content/rpm-chap1-11.html

 

-A good intro to YARN configuration is this:

http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce
_11.html

 

Regards

.g

 

 

 

From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org
Subject: Doubts: Deployment and Configuration of YARN cluster

 

All,

 

I am new to YARN and have certain doubts regarding the deployment and
configuration of YARN on a cluster.

 

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need
to distribute the below files to all the slave nodes in the cluster:

.         conf/core-site.xml

.         conf/hdfs-site.xml

.         conf/yarn-site.xml

.         conf/mapred-site.xml

 

Also we need to ONLY change the following file on each slave nodes:

.         conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

 

Do we need to change any other config file on the slave nodes?

Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
the slave nodes? 

This is since I might have a *heterogeneous environment* i.e. different
nodes with different memory and cores. For NM1 I might have 40GB memory and
for the other say 20GB.

 

Also,

{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a
Hadoop task subprocess.

{mapreduce.map.java.opts}         specify the *max. heap space* of the
allocated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memory.mb}

{mapreduce.reduce.java.opts}

are the above properties applicable to all the Map\Reduce tasks(from
different Map Reduce applications) in general, running on different slave
nodes?

or Can I change these for a particular slave node.? For e.g. say for a
SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
map task with 8GB. Same with the reduce task.

 

I need some understanding to *configure processing capacity* in the cluster
like Container Size, No. of Containers, No. of Mappers\Reducers. 

 

Thanks,

-Nirmal

 

  _____  







NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.


RE: Doubts: Deployment and Configuration of YARN cluster

Posted by German Florez-Larrahondo <ge...@samsung.com>.
Nirmal

 

-A good summary regarding memory configuration settings/best-practices can
be found here. Note that in YARN, the way you configure resource limits
dictates number of containers in the nodes and in the cluster:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk
_installing_manually_book/content/rpm-chap1-11.html

 

-A good intro to YARN configuration is this:

http://www.thecloudavenue.com/2012/01/getting-started-with-nextgen-mapreduce
_11.html

 

Regards

.g

 

 

 

From: Nirmal Kumar [mailto:nirmal.kumar@impetus.co.in] 
Sent: Wednesday, January 15, 2014 7:22 AM
To: user@hadoop.apache.org
Subject: Doubts: Deployment and Configuration of YARN cluster

 

All,

 

I am new to YARN and have certain doubts regarding the deployment and
configuration of YARN on a cluster.

 

As per my understanding to deploy Hadoop 2.x using YARN on a cluster we need
to distribute the below files to all the slave nodes in the cluster:

.         conf/core-site.xml

.         conf/hdfs-site.xml

.         conf/yarn-site.xml

.         conf/mapred-site.xml

 

Also we need to ONLY change the following file on each slave nodes:

.         conf/hdfs-site.xml

Need to mention the {dfs.datanode.name.dir} value

 

Do we need to change any other config file on the slave nodes?

Can I change {yarn.nodemanager.resource.memory-mb} for each NM running on
the slave nodes? 

This is since I might have a *heterogeneous environment* i.e. different
nodes with different memory and cores. For NM1 I might have 40GB memory and
for the other say 20GB.

 

Also,

{mapreduce.map.memory.mb}   specifies the *max. virtual memory* allowed by a
Hadoop task subprocess.

{mapreduce.map.java.opts}         specify the *max. heap space* of the
allocated jvm. If you exceed the max heap size, the JVM throws an OOM.

{mapreduce.reduce.memory.mb}

{mapreduce.reduce.java.opts}

are the above properties applicable to all the Map\Reduce tasks(from
different Map Reduce applications) in general, running on different slave
nodes?

or Can I change these for a particular slave node.? For e.g. say for a
SlaveNode1 I run the map task with 4GB and for other SlaveNode2 I run the
map task with 8GB. Same with the reduce task.

 

I need some understanding to *configure processing capacity* in the cluster
like Container Size, No. of Containers, No. of Mappers\Reducers. 

 

Thanks,

-Nirmal

 

  _____  







NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.