You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Vandecreme, Antoine" <an...@nist.gov> on 2013/09/19 16:19:12 UTC

How to make hadoop use all nodes?

Hi all,

I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
When I am starting a Job, I notice that some nodes are not used or partially used.

For example, if my nodes can hold 2 containers, I notice that some nodes are not running any or just 1 while others are running 2.
All my nodes are configured the same way.

Is this an expected behavior (maybe in case others jobs are started) ?
Is there a configuration to change this behavior?

Thanks,
Antoine

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hi Omkar,

>(which has 40 containers slots.) >> for total cluster?
Yes, it was just an hypotetical value though.
Below are my real configurations.

>1) yarn-site.xml -> what is the resource memory configured for per node?
12288mb

>2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
1024mb min
12288mb max

I also have those memory configurations in mapred-site.xml :
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

>3) yarn-resource-manager-log  (while starting resource manager "export 
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
The resulting log is really verbose. Are you searching for something in 
particular?

>4) On RM UI how much total cluster memory is reported (how many total 
nodes). ( RM UI click on Cluster)
So I have 58 active nodes and total memory reported is 696GB which is 
58x12 as expected.
I have 93 containers running instead of 116 I would expect (my job has 
2046 maps so it could use all 116 containers).

Here is a copy past of what I have in the scheduler tab:


*Queue State: *
RUNNING 
*Used Capacity: *
99.4% 
*Absolute Capacity: *
100.0% 
*Absolute Max Capacity: *
100.0% 
*Used Resources: *

*Num Active Applications: *
1 
*Num Pending Applications: *
0 
*Num Containers: *
139 
*Max Applications: *
10000 
*Max Applications Per User: *
10000 
*Max Active Applications: *
70 
*Max Active Applications Per User: *
70 
*Configured Capacity: *
100.0% 
*Configured Max Capacity: *
100.0% 
*Configured Minimum User Limit Percent: *
100% 
*Configured User Limit Factor: *
1.0 
*Active users: *
xxx <Memory: 708608 (100.00%), vCores: 139 (100.00%), Active Apps: 1, 
Pending Apps: 0>


I don't know where the 139 containers value is comming from.

>5) which scheduler you are using? Capacity/Fair/FIFO
I did not set yarn.resourcemanager.scheduler.class so apparently the 
default is Capacity.

>6) have you configured any user limits/ queue capacity? (please add 
details).
No.

>7) All requests you are making at same priority or with different priorities? 
(Ideally it will not matter but want to know).
I don't set any priority.

Thanks for your help.

Antoine Vandecreme

On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote:
> Hi,
> 
> few more questions
> 
> (which has 40 containers slots.) >> for total cluster? Please give below
> details
> 
> for cluster
> 1) yarn-site.xml -> what is the resource memory configured for per node?
> 2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
> 3) yarn-resource-manager-log  (while starting resource manager "export
> YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
> 4) On RM UI how much total cluster memory is reported (how many total
> nodes). ( RM UI click on Cluster)
> 5) which scheduler you are using? Capacity/Fair/FIFO
> 6) have you configured any user limits/ queue capacity? (please add
> details).
> 7) All requests you are making at same priority or with different
> priorities? (Ideally it will not matter but want to know).
> 
> Please let us know all the above details. Thanks.
> 
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
> 
> antoine.vandecreme@nist.gov> wrote:
> > Hello Omkar,
> > 
> > Thanks for your reply.
> > 
> > Yes, all 4 points are corrects.

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hi Omkar,

>(which has 40 containers slots.) >> for total cluster?
Yes, it was just an hypotetical value though.
Below are my real configurations.

>1) yarn-site.xml -> what is the resource memory configured for per node?
12288mb

>2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
1024mb min
12288mb max

I also have those memory configurations in mapred-site.xml :
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

>3) yarn-resource-manager-log  (while starting resource manager "export 
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
The resulting log is really verbose. Are you searching for something in 
particular?

>4) On RM UI how much total cluster memory is reported (how many total 
nodes). ( RM UI click on Cluster)
So I have 58 active nodes and total memory reported is 696GB which is 
58x12 as expected.
I have 93 containers running instead of 116 I would expect (my job has 
2046 maps so it could use all 116 containers).

Here is a copy past of what I have in the scheduler tab:


*Queue State: *
RUNNING 
*Used Capacity: *
99.4% 
*Absolute Capacity: *
100.0% 
*Absolute Max Capacity: *
100.0% 
*Used Resources: *

*Num Active Applications: *
1 
*Num Pending Applications: *
0 
*Num Containers: *
139 
*Max Applications: *
10000 
*Max Applications Per User: *
10000 
*Max Active Applications: *
70 
*Max Active Applications Per User: *
70 
*Configured Capacity: *
100.0% 
*Configured Max Capacity: *
100.0% 
*Configured Minimum User Limit Percent: *
100% 
*Configured User Limit Factor: *
1.0 
*Active users: *
xxx <Memory: 708608 (100.00%), vCores: 139 (100.00%), Active Apps: 1, 
Pending Apps: 0>


I don't know where the 139 containers value is comming from.

>5) which scheduler you are using? Capacity/Fair/FIFO
I did not set yarn.resourcemanager.scheduler.class so apparently the 
default is Capacity.

>6) have you configured any user limits/ queue capacity? (please add 
details).
No.

>7) All requests you are making at same priority or with different priorities? 
(Ideally it will not matter but want to know).
I don't set any priority.

Thanks for your help.

Antoine Vandecreme

On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote:
> Hi,
> 
> few more questions
> 
> (which has 40 containers slots.) >> for total cluster? Please give below
> details
> 
> for cluster
> 1) yarn-site.xml -> what is the resource memory configured for per node?
> 2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
> 3) yarn-resource-manager-log  (while starting resource manager "export
> YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
> 4) On RM UI how much total cluster memory is reported (how many total
> nodes). ( RM UI click on Cluster)
> 5) which scheduler you are using? Capacity/Fair/FIFO
> 6) have you configured any user limits/ queue capacity? (please add
> details).
> 7) All requests you are making at same priority or with different
> priorities? (Ideally it will not matter but want to know).
> 
> Please let us know all the above details. Thanks.
> 
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
> 
> antoine.vandecreme@nist.gov> wrote:
> > Hello Omkar,
> > 
> > Thanks for your reply.
> > 
> > Yes, all 4 points are corrects.

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hi Omkar,

>(which has 40 containers slots.) >> for total cluster?
Yes, it was just an hypotetical value though.
Below are my real configurations.

>1) yarn-site.xml -> what is the resource memory configured for per node?
12288mb

>2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
1024mb min
12288mb max

I also have those memory configurations in mapred-site.xml :
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

>3) yarn-resource-manager-log  (while starting resource manager "export 
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
The resulting log is really verbose. Are you searching for something in 
particular?

>4) On RM UI how much total cluster memory is reported (how many total 
nodes). ( RM UI click on Cluster)
So I have 58 active nodes and total memory reported is 696GB which is 
58x12 as expected.
I have 93 containers running instead of 116 I would expect (my job has 
2046 maps so it could use all 116 containers).

Here is a copy past of what I have in the scheduler tab:


*Queue State: *
RUNNING 
*Used Capacity: *
99.4% 
*Absolute Capacity: *
100.0% 
*Absolute Max Capacity: *
100.0% 
*Used Resources: *

*Num Active Applications: *
1 
*Num Pending Applications: *
0 
*Num Containers: *
139 
*Max Applications: *
10000 
*Max Applications Per User: *
10000 
*Max Active Applications: *
70 
*Max Active Applications Per User: *
70 
*Configured Capacity: *
100.0% 
*Configured Max Capacity: *
100.0% 
*Configured Minimum User Limit Percent: *
100% 
*Configured User Limit Factor: *
1.0 
*Active users: *
xxx <Memory: 708608 (100.00%), vCores: 139 (100.00%), Active Apps: 1, 
Pending Apps: 0>


I don't know where the 139 containers value is comming from.

>5) which scheduler you are using? Capacity/Fair/FIFO
I did not set yarn.resourcemanager.scheduler.class so apparently the 
default is Capacity.

>6) have you configured any user limits/ queue capacity? (please add 
details).
No.

>7) All requests you are making at same priority or with different priorities? 
(Ideally it will not matter but want to know).
I don't set any priority.

Thanks for your help.

Antoine Vandecreme

On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote:
> Hi,
> 
> few more questions
> 
> (which has 40 containers slots.) >> for total cluster? Please give below
> details
> 
> for cluster
> 1) yarn-site.xml -> what is the resource memory configured for per node?
> 2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
> 3) yarn-resource-manager-log  (while starting resource manager "export
> YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
> 4) On RM UI how much total cluster memory is reported (how many total
> nodes). ( RM UI click on Cluster)
> 5) which scheduler you are using? Capacity/Fair/FIFO
> 6) have you configured any user limits/ queue capacity? (please add
> details).
> 7) All requests you are making at same priority or with different
> priorities? (Ideally it will not matter but want to know).
> 
> Please let us know all the above details. Thanks.
> 
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
> 
> antoine.vandecreme@nist.gov> wrote:
> > Hello Omkar,
> > 
> > Thanks for your reply.
> > 
> > Yes, all 4 points are corrects.

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hi Omkar,

>(which has 40 containers slots.) >> for total cluster?
Yes, it was just an hypotetical value though.
Below are my real configurations.

>1) yarn-site.xml -> what is the resource memory configured for per node?
12288mb

>2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
1024mb min
12288mb max

I also have those memory configurations in mapred-site.xml :
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>5000</value>
  </property>

  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx4g -Djava.awt.headless=true</value>
  </property>

>3) yarn-resource-manager-log  (while starting resource manager "export 
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
The resulting log is really verbose. Are you searching for something in 
particular?

>4) On RM UI how much total cluster memory is reported (how many total 
nodes). ( RM UI click on Cluster)
So I have 58 active nodes and total memory reported is 696GB which is 
58x12 as expected.
I have 93 containers running instead of 116 I would expect (my job has 
2046 maps so it could use all 116 containers).

Here is a copy past of what I have in the scheduler tab:


*Queue State: *
RUNNING 
*Used Capacity: *
99.4% 
*Absolute Capacity: *
100.0% 
*Absolute Max Capacity: *
100.0% 
*Used Resources: *

*Num Active Applications: *
1 
*Num Pending Applications: *
0 
*Num Containers: *
139 
*Max Applications: *
10000 
*Max Applications Per User: *
10000 
*Max Active Applications: *
70 
*Max Active Applications Per User: *
70 
*Configured Capacity: *
100.0% 
*Configured Max Capacity: *
100.0% 
*Configured Minimum User Limit Percent: *
100% 
*Configured User Limit Factor: *
1.0 
*Active users: *
xxx <Memory: 708608 (100.00%), vCores: 139 (100.00%), Active Apps: 1, 
Pending Apps: 0>


I don't know where the 139 containers value is comming from.

>5) which scheduler you are using? Capacity/Fair/FIFO
I did not set yarn.resourcemanager.scheduler.class so apparently the 
default is Capacity.

>6) have you configured any user limits/ queue capacity? (please add 
details).
No.

>7) All requests you are making at same priority or with different priorities? 
(Ideally it will not matter but want to know).
I don't set any priority.

Thanks for your help.

Antoine Vandecreme

On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote:
> Hi,
> 
> few more questions
> 
> (which has 40 containers slots.) >> for total cluster? Please give below
> details
> 
> for cluster
> 1) yarn-site.xml -> what is the resource memory configured for per node?
> 2) yarn-site.xml -> what is the minimum resource allocation for the 
cluster?
> 3) yarn-resource-manager-log  (while starting resource manager "export
> YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
> 4) On RM UI how much total cluster memory is reported (how many total
> nodes). ( RM UI click on Cluster)
> 5) which scheduler you are using? Capacity/Fair/FIFO
> 6) have you configured any user limits/ queue capacity? (please add
> details).
> 7) All requests you are making at same priority or with different
> priorities? (Ideally it will not matter but want to know).
> 
> Please let us know all the above details. Thanks.
> 
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
> 
> antoine.vandecreme@nist.gov> wrote:
> > Hello Omkar,
> > 
> > Thanks for your reply.
> > 
> > Yes, all 4 points are corrects.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

few more questions

(which has 40 containers slots.) >> for total cluster? Please give below
details

for cluster
1) yarn-site.xml -> what is the resource memory configured for per node?
2) yarn-site.xml -> what is the minimum resource allocation for the cluster?
3) yarn-resource-manager-log  (while starting resource manager "export
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
4) On RM UI how much total cluster memory is reported (how many total
nodes). ( RM UI click on Cluster)
5) which scheduler you are using? Capacity/Fair/FIFO
6) have you configured any user limits/ queue capacity? (please add
details).
7) All requests you are making at same priority or with different
priorities? (Ideally it will not matter but want to know).

Please let us know all the above details. Thanks.


Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
antoine.vandecreme@nist.gov> wrote:

> Hello Omkar,
>
> Thanks for your reply.
>
> Yes, all 4 points are corrects.
> However, my application is requesting let say 100 containers on my cluster
> which has 40 containers slots.
> So I expected to see all containers slots used but that is not the case.
>
> Just in case it matters, it is the only application running on the server.
>
> Thanks,
> Antoine Vandecreme
>
> On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> > Hi,
> >
> > Let me clarify few things.
> > 1) you are making container requests which are not explicitly looking for
> > certain nodes. (No white listing).
> > 2) All nodes are identical in terms of resources (memory/cores) and every
> > container requires same amount of resources.
> > 3) All nodes have capacity to run say 2 containers.
> > 4) You have 20 nodes.
> >
> > Now if an application is running and is requesting 20 containers then you
> > can not say that you will get all on different nodes (uniformly
> > distributed). It more depends on which node heartbeated to the Resource
> > manager at what time and how much memory is available with it and also
> how
> > many applications are present in queue and how much they are requesting
> at
> > what request priorities. If it has say sufficient memory to run 2
> > containers then they will get allocated (This allocation is quite complex
> > ..I am assuming very simple "*" reuqest). So you may see few running 2,
> few
> > running 1 where as few with 0 containers.
> >
> > I hope it clarifies your doubt.
> >
> > Thanks,
> > Omkar Joshi
> > *Hortonworks Inc.* <http://www.hortonworks.com>
> >
> >
> > On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> >
> > antoine.vandecreme@nist.gov> wrote:
> > >  Hi all,
> > >
> > > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > > When I am starting a Job, I notice that some nodes are not used or
> > > partially used.
> > >
> > > For example, if my nodes can hold 2 containers, I notice that some
> nodes
> > > are not running any or just 1 while others are running 2.
> > > All my nodes are configured the same way.
> > >
> > > Is this an expected behavior (maybe in case others jobs are started) ?
> > > Is there a configuration to change this behavior?
> > >
> > > Thanks,
> > > Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

few more questions

(which has 40 containers slots.) >> for total cluster? Please give below
details

for cluster
1) yarn-site.xml -> what is the resource memory configured for per node?
2) yarn-site.xml -> what is the minimum resource allocation for the cluster?
3) yarn-resource-manager-log  (while starting resource manager "export
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
4) On RM UI how much total cluster memory is reported (how many total
nodes). ( RM UI click on Cluster)
5) which scheduler you are using? Capacity/Fair/FIFO
6) have you configured any user limits/ queue capacity? (please add
details).
7) All requests you are making at same priority or with different
priorities? (Ideally it will not matter but want to know).

Please let us know all the above details. Thanks.


Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
antoine.vandecreme@nist.gov> wrote:

> Hello Omkar,
>
> Thanks for your reply.
>
> Yes, all 4 points are corrects.
> However, my application is requesting let say 100 containers on my cluster
> which has 40 containers slots.
> So I expected to see all containers slots used but that is not the case.
>
> Just in case it matters, it is the only application running on the server.
>
> Thanks,
> Antoine Vandecreme
>
> On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> > Hi,
> >
> > Let me clarify few things.
> > 1) you are making container requests which are not explicitly looking for
> > certain nodes. (No white listing).
> > 2) All nodes are identical in terms of resources (memory/cores) and every
> > container requires same amount of resources.
> > 3) All nodes have capacity to run say 2 containers.
> > 4) You have 20 nodes.
> >
> > Now if an application is running and is requesting 20 containers then you
> > can not say that you will get all on different nodes (uniformly
> > distributed). It more depends on which node heartbeated to the Resource
> > manager at what time and how much memory is available with it and also
> how
> > many applications are present in queue and how much they are requesting
> at
> > what request priorities. If it has say sufficient memory to run 2
> > containers then they will get allocated (This allocation is quite complex
> > ..I am assuming very simple "*" reuqest). So you may see few running 2,
> few
> > running 1 where as few with 0 containers.
> >
> > I hope it clarifies your doubt.
> >
> > Thanks,
> > Omkar Joshi
> > *Hortonworks Inc.* <http://www.hortonworks.com>
> >
> >
> > On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> >
> > antoine.vandecreme@nist.gov> wrote:
> > >  Hi all,
> > >
> > > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > > When I am starting a Job, I notice that some nodes are not used or
> > > partially used.
> > >
> > > For example, if my nodes can hold 2 containers, I notice that some
> nodes
> > > are not running any or just 1 while others are running 2.
> > > All my nodes are configured the same way.
> > >
> > > Is this an expected behavior (maybe in case others jobs are started) ?
> > > Is there a configuration to change this behavior?
> > >
> > > Thanks,
> > > Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

few more questions

(which has 40 containers slots.) >> for total cluster? Please give below
details

for cluster
1) yarn-site.xml -> what is the resource memory configured for per node?
2) yarn-site.xml -> what is the minimum resource allocation for the cluster?
3) yarn-resource-manager-log  (while starting resource manager "export
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
4) On RM UI how much total cluster memory is reported (how many total
nodes). ( RM UI click on Cluster)
5) which scheduler you are using? Capacity/Fair/FIFO
6) have you configured any user limits/ queue capacity? (please add
details).
7) All requests you are making at same priority or with different
priorities? (Ideally it will not matter but want to know).

Please let us know all the above details. Thanks.


Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
antoine.vandecreme@nist.gov> wrote:

> Hello Omkar,
>
> Thanks for your reply.
>
> Yes, all 4 points are corrects.
> However, my application is requesting let say 100 containers on my cluster
> which has 40 containers slots.
> So I expected to see all containers slots used but that is not the case.
>
> Just in case it matters, it is the only application running on the server.
>
> Thanks,
> Antoine Vandecreme
>
> On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> > Hi,
> >
> > Let me clarify few things.
> > 1) you are making container requests which are not explicitly looking for
> > certain nodes. (No white listing).
> > 2) All nodes are identical in terms of resources (memory/cores) and every
> > container requires same amount of resources.
> > 3) All nodes have capacity to run say 2 containers.
> > 4) You have 20 nodes.
> >
> > Now if an application is running and is requesting 20 containers then you
> > can not say that you will get all on different nodes (uniformly
> > distributed). It more depends on which node heartbeated to the Resource
> > manager at what time and how much memory is available with it and also
> how
> > many applications are present in queue and how much they are requesting
> at
> > what request priorities. If it has say sufficient memory to run 2
> > containers then they will get allocated (This allocation is quite complex
> > ..I am assuming very simple "*" reuqest). So you may see few running 2,
> few
> > running 1 where as few with 0 containers.
> >
> > I hope it clarifies your doubt.
> >
> > Thanks,
> > Omkar Joshi
> > *Hortonworks Inc.* <http://www.hortonworks.com>
> >
> >
> > On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> >
> > antoine.vandecreme@nist.gov> wrote:
> > >  Hi all,
> > >
> > > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > > When I am starting a Job, I notice that some nodes are not used or
> > > partially used.
> > >
> > > For example, if my nodes can hold 2 containers, I notice that some
> nodes
> > > are not running any or just 1 while others are running 2.
> > > All my nodes are configured the same way.
> > >
> > > Is this an expected behavior (maybe in case others jobs are started) ?
> > > Is there a configuration to change this behavior?
> > >
> > > Thanks,
> > > Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

few more questions

(which has 40 containers slots.) >> for total cluster? Please give below
details

for cluster
1) yarn-site.xml -> what is the resource memory configured for per node?
2) yarn-site.xml -> what is the minimum resource allocation for the cluster?
3) yarn-resource-manager-log  (while starting resource manager "export
YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..
4) On RM UI how much total cluster memory is reported (how many total
nodes). ( RM UI click on Cluster)
5) which scheduler you are using? Capacity/Fair/FIFO
6) have you configured any user limits/ queue capacity? (please add
details).
7) All requests you are making at same priority or with different
priorities? (Ideally it will not matter but want to know).

Please let us know all the above details. Thanks.


Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <
antoine.vandecreme@nist.gov> wrote:

> Hello Omkar,
>
> Thanks for your reply.
>
> Yes, all 4 points are corrects.
> However, my application is requesting let say 100 containers on my cluster
> which has 40 containers slots.
> So I expected to see all containers slots used but that is not the case.
>
> Just in case it matters, it is the only application running on the server.
>
> Thanks,
> Antoine Vandecreme
>
> On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> > Hi,
> >
> > Let me clarify few things.
> > 1) you are making container requests which are not explicitly looking for
> > certain nodes. (No white listing).
> > 2) All nodes are identical in terms of resources (memory/cores) and every
> > container requires same amount of resources.
> > 3) All nodes have capacity to run say 2 containers.
> > 4) You have 20 nodes.
> >
> > Now if an application is running and is requesting 20 containers then you
> > can not say that you will get all on different nodes (uniformly
> > distributed). It more depends on which node heartbeated to the Resource
> > manager at what time and how much memory is available with it and also
> how
> > many applications are present in queue and how much they are requesting
> at
> > what request priorities. If it has say sufficient memory to run 2
> > containers then they will get allocated (This allocation is quite complex
> > ..I am assuming very simple "*" reuqest). So you may see few running 2,
> few
> > running 1 where as few with 0 containers.
> >
> > I hope it clarifies your doubt.
> >
> > Thanks,
> > Omkar Joshi
> > *Hortonworks Inc.* <http://www.hortonworks.com>
> >
> >
> > On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> >
> > antoine.vandecreme@nist.gov> wrote:
> > >  Hi all,
> > >
> > > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > > When I am starting a Job, I notice that some nodes are not used or
> > > partially used.
> > >
> > > For example, if my nodes can hold 2 containers, I notice that some
> nodes
> > > are not running any or just 1 while others are running 2.
> > > All my nodes are configured the same way.
> > >
> > > Is this an expected behavior (maybe in case others jobs are started) ?
> > > Is there a configuration to change this behavior?
> > >
> > > Thanks,
> > > Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hello Omkar,

Thanks for your reply.

Yes, all 4 points are corrects.
However, my application is requesting let say 100 containers on my cluster 
which has 40 containers slots.
So I expected to see all containers slots used but that is not the case.

Just in case it matters, it is the only application running on the server.

Thanks,
Antoine Vandecreme

On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> Hi,
> 
> Let me clarify few things.
> 1) you are making container requests which are not explicitly looking for
> certain nodes. (No white listing).
> 2) All nodes are identical in terms of resources (memory/cores) and every
> container requires same amount of resources.
> 3) All nodes have capacity to run say 2 containers.
> 4) You have 20 nodes.
> 
> Now if an application is running and is requesting 20 containers then you
> can not say that you will get all on different nodes (uniformly
> distributed). It more depends on which node heartbeated to the Resource
> manager at what time and how much memory is available with it and also how
> many applications are present in queue and how much they are requesting at
> what request priorities. If it has say sufficient memory to run 2
> containers then they will get allocated (This allocation is quite complex
> ..I am assuming very simple "*" reuqest). So you may see few running 2, few
> running 1 where as few with 0 containers.
> 
> I hope it clarifies your doubt.
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> 
> antoine.vandecreme@nist.gov> wrote:
> >  Hi all,
> > 
> > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > When I am starting a Job, I notice that some nodes are not used or
> > partially used.
> > 
> > For example, if my nodes can hold 2 containers, I notice that some nodes
> > are not running any or just 1 while others are running 2.
> > All my nodes are configured the same way.
> > 
> > Is this an expected behavior (maybe in case others jobs are started) ?
> > Is there a configuration to change this behavior?
> > 
> > Thanks,
> > Antoine

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hello Omkar,

Thanks for your reply.

Yes, all 4 points are corrects.
However, my application is requesting let say 100 containers on my cluster 
which has 40 containers slots.
So I expected to see all containers slots used but that is not the case.

Just in case it matters, it is the only application running on the server.

Thanks,
Antoine Vandecreme

On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> Hi,
> 
> Let me clarify few things.
> 1) you are making container requests which are not explicitly looking for
> certain nodes. (No white listing).
> 2) All nodes are identical in terms of resources (memory/cores) and every
> container requires same amount of resources.
> 3) All nodes have capacity to run say 2 containers.
> 4) You have 20 nodes.
> 
> Now if an application is running and is requesting 20 containers then you
> can not say that you will get all on different nodes (uniformly
> distributed). It more depends on which node heartbeated to the Resource
> manager at what time and how much memory is available with it and also how
> many applications are present in queue and how much they are requesting at
> what request priorities. If it has say sufficient memory to run 2
> containers then they will get allocated (This allocation is quite complex
> ..I am assuming very simple "*" reuqest). So you may see few running 2, few
> running 1 where as few with 0 containers.
> 
> I hope it clarifies your doubt.
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> 
> antoine.vandecreme@nist.gov> wrote:
> >  Hi all,
> > 
> > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > When I am starting a Job, I notice that some nodes are not used or
> > partially used.
> > 
> > For example, if my nodes can hold 2 containers, I notice that some nodes
> > are not running any or just 1 while others are running 2.
> > All my nodes are configured the same way.
> > 
> > Is this an expected behavior (maybe in case others jobs are started) ?
> > Is there a configuration to change this behavior?
> > 
> > Thanks,
> > Antoine

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hello Omkar,

Thanks for your reply.

Yes, all 4 points are corrects.
However, my application is requesting let say 100 containers on my cluster 
which has 40 containers slots.
So I expected to see all containers slots used but that is not the case.

Just in case it matters, it is the only application running on the server.

Thanks,
Antoine Vandecreme

On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> Hi,
> 
> Let me clarify few things.
> 1) you are making container requests which are not explicitly looking for
> certain nodes. (No white listing).
> 2) All nodes are identical in terms of resources (memory/cores) and every
> container requires same amount of resources.
> 3) All nodes have capacity to run say 2 containers.
> 4) You have 20 nodes.
> 
> Now if an application is running and is requesting 20 containers then you
> can not say that you will get all on different nodes (uniformly
> distributed). It more depends on which node heartbeated to the Resource
> manager at what time and how much memory is available with it and also how
> many applications are present in queue and how much they are requesting at
> what request priorities. If it has say sufficient memory to run 2
> containers then they will get allocated (This allocation is quite complex
> ..I am assuming very simple "*" reuqest). So you may see few running 2, few
> running 1 where as few with 0 containers.
> 
> I hope it clarifies your doubt.
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> 
> antoine.vandecreme@nist.gov> wrote:
> >  Hi all,
> > 
> > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > When I am starting a Job, I notice that some nodes are not used or
> > partially used.
> > 
> > For example, if my nodes can hold 2 containers, I notice that some nodes
> > are not running any or just 1 while others are running 2.
> > All my nodes are configured the same way.
> > 
> > Is this an expected behavior (maybe in case others jobs are started) ?
> > Is there a configuration to change this behavior?
> > 
> > Thanks,
> > Antoine

Re: How to make hadoop use all nodes?

Posted by Antoine Vandecreme <an...@nist.gov>.
Hello Omkar,

Thanks for your reply.

Yes, all 4 points are corrects.
However, my application is requesting let say 100 containers on my cluster 
which has 40 containers slots.
So I expected to see all containers slots used but that is not the case.

Just in case it matters, it is the only application running on the server.

Thanks,
Antoine Vandecreme

On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:
> Hi,
> 
> Let me clarify few things.
> 1) you are making container requests which are not explicitly looking for
> certain nodes. (No white listing).
> 2) All nodes are identical in terms of resources (memory/cores) and every
> container requires same amount of resources.
> 3) All nodes have capacity to run say 2 containers.
> 4) You have 20 nodes.
> 
> Now if an application is running and is requesting 20 containers then you
> can not say that you will get all on different nodes (uniformly
> distributed). It more depends on which node heartbeated to the Resource
> manager at what time and how much memory is available with it and also how
> many applications are present in queue and how much they are requesting at
> what request priorities. If it has say sufficient memory to run 2
> containers then they will get allocated (This allocation is quite complex
> ..I am assuming very simple "*" reuqest). So you may see few running 2, few
> running 1 where as few with 0 containers.
> 
> I hope it clarifies your doubt.
> 
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
> 
> 
> On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
> 
> antoine.vandecreme@nist.gov> wrote:
> >  Hi all,
> > 
> > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> > When I am starting a Job, I notice that some nodes are not used or
> > partially used.
> > 
> > For example, if my nodes can hold 2 containers, I notice that some nodes
> > are not running any or just 1 while others are running 2.
> > All my nodes are configured the same way.
> > 
> > Is this an expected behavior (maybe in case others jobs are started) ?
> > Is there a configuration to change this behavior?
> > 
> > Thanks,
> > Antoine

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Let me clarify few things.
1) you are making container requests which are not explicitly looking for
certain nodes. (No white listing).
2) All nodes are identical in terms of resources (memory/cores) and every
container requires same amount of resources.
3) All nodes have capacity to run say 2 containers.
4) You have 20 nodes.

Now if an application is running and is requesting 20 containers then you
can not say that you will get all on different nodes (uniformly
distributed). It more depends on which node heartbeated to the Resource
manager at what time and how much memory is available with it and also how
many applications are present in queue and how much they are requesting at
what request priorities. If it has say sufficient memory to run 2
containers then they will get allocated (This allocation is quite complex
..I am assuming very simple "*" reuqest). So you may see few running 2, few
running 1 where as few with 0 containers.

I hope it clarifies your doubt.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
antoine.vandecreme@nist.gov> wrote:

>  Hi all,
>
> I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> When I am starting a Job, I notice that some nodes are not used or
> partially used.
>
> For example, if my nodes can hold 2 containers, I notice that some nodes
> are not running any or just 1 while others are running 2.
> All my nodes are configured the same way.
>
> Is this an expected behavior (maybe in case others jobs are started) ?
> Is there a configuration to change this behavior?
>
> Thanks,
> Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Let me clarify few things.
1) you are making container requests which are not explicitly looking for
certain nodes. (No white listing).
2) All nodes are identical in terms of resources (memory/cores) and every
container requires same amount of resources.
3) All nodes have capacity to run say 2 containers.
4) You have 20 nodes.

Now if an application is running and is requesting 20 containers then you
can not say that you will get all on different nodes (uniformly
distributed). It more depends on which node heartbeated to the Resource
manager at what time and how much memory is available with it and also how
many applications are present in queue and how much they are requesting at
what request priorities. If it has say sufficient memory to run 2
containers then they will get allocated (This allocation is quite complex
..I am assuming very simple "*" reuqest). So you may see few running 2, few
running 1 where as few with 0 containers.

I hope it clarifies your doubt.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
antoine.vandecreme@nist.gov> wrote:

>  Hi all,
>
> I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> When I am starting a Job, I notice that some nodes are not used or
> partially used.
>
> For example, if my nodes can hold 2 containers, I notice that some nodes
> are not running any or just 1 while others are running 2.
> All my nodes are configured the same way.
>
> Is this an expected behavior (maybe in case others jobs are started) ?
> Is there a configuration to change this behavior?
>
> Thanks,
> Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Let me clarify few things.
1) you are making container requests which are not explicitly looking for
certain nodes. (No white listing).
2) All nodes are identical in terms of resources (memory/cores) and every
container requires same amount of resources.
3) All nodes have capacity to run say 2 containers.
4) You have 20 nodes.

Now if an application is running and is requesting 20 containers then you
can not say that you will get all on different nodes (uniformly
distributed). It more depends on which node heartbeated to the Resource
manager at what time and how much memory is available with it and also how
many applications are present in queue and how much they are requesting at
what request priorities. If it has say sufficient memory to run 2
containers then they will get allocated (This allocation is quite complex
..I am assuming very simple "*" reuqest). So you may see few running 2, few
running 1 where as few with 0 containers.

I hope it clarifies your doubt.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
antoine.vandecreme@nist.gov> wrote:

>  Hi all,
>
> I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> When I am starting a Job, I notice that some nodes are not used or
> partially used.
>
> For example, if my nodes can hold 2 containers, I notice that some nodes
> are not running any or just 1 while others are running 2.
> All my nodes are configured the same way.
>
> Is this an expected behavior (maybe in case others jobs are started) ?
> Is there a configuration to change this behavior?
>
> Thanks,
> Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: How to make hadoop use all nodes?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Let me clarify few things.
1) you are making container requests which are not explicitly looking for
certain nodes. (No white listing).
2) All nodes are identical in terms of resources (memory/cores) and every
container requires same amount of resources.
3) All nodes have capacity to run say 2 containers.
4) You have 20 nodes.

Now if an application is running and is requesting 20 containers then you
can not say that you will get all on different nodes (uniformly
distributed). It more depends on which node heartbeated to the Resource
manager at what time and how much memory is available with it and also how
many applications are present in queue and how much they are requesting at
what request priorities. If it has say sufficient memory to run 2
containers then they will get allocated (This allocation is quite complex
..I am assuming very simple "*" reuqest). So you may see few running 2, few
running 1 where as few with 0 containers.

I hope it clarifies your doubt.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <
antoine.vandecreme@nist.gov> wrote:

>  Hi all,
>
> I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
> When I am starting a Job, I notice that some nodes are not used or
> partially used.
>
> For example, if my nodes can hold 2 containers, I notice that some nodes
> are not running any or just 1 while others are running 2.
> All my nodes are configured the same way.
>
> Is this an expected behavior (maybe in case others jobs are started) ?
> Is there a configuration to change this behavior?
>
> Thanks,
> Antoine
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.