You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Tomasz Guziałek <to...@gmail.com> on 2014/07/08 13:01:58 UTC

The number of simultaneous map tasks is unexpected.

Hello all,

I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used are
m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table has
8 regions, so I expected at least 8 (if not 16) mapper tasks to run
simultaneously. However, only 7 are running and 1 is waiting for an empty
slot. Why this surprising number came up? I have checked that the regions
are equally distributed on the region servers (2 per node).

My properties in the job:
Configuration mapReduceConfiguration = HBaseConfiguration.create();
mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");

My properties in the CDH:
yarn.scheduler.minimum-allocation-vcores = 1
yarn.scheduler.maximum-allocation-vcores = 4

Do I miss some property? Please share your experience.

Best regards
Tomasz

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

>
> yarn.nodemanager.resource.memory-mb = 2370 MiB,
> yarn.nodemanager.resource.cpu-vcores = 2,
>

So, you cannot run more than 8 containers on your setup (according to your
settings, each container consumes 1GB and 1 vcore).

> Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

Right. But you can increase yarn.nodemanager.resource.cpu-vcores to 3 and
decrease the container sizes (and heap) of map and reduce tasks, and you
will be able to run more containers on the NodeManagers (assuming that you
will not overload the machine).

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

>
> yarn.nodemanager.resource.memory-mb = 2370 MiB,
> yarn.nodemanager.resource.cpu-vcores = 2,
>

So, you cannot run more than 8 containers on your setup (according to your
settings, each container consumes 1GB and 1 vcore).

> Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

Right. But you can increase yarn.nodemanager.resource.cpu-vcores to 3 and
decrease the container sizes (and heap) of map and reduce tasks, and you
will be able to run more containers on the NodeManagers (assuming that you
will not overload the machine).

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

>
> yarn.nodemanager.resource.memory-mb = 2370 MiB,
> yarn.nodemanager.resource.cpu-vcores = 2,
>

So, you cannot run more than 8 containers on your setup (according to your
settings, each container consumes 1GB and 1 vcore).

> Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

Right. But you can increase yarn.nodemanager.resource.cpu-vcores to 3 and
decrease the container sizes (and heap) of map and reduce tasks, and you
will be able to run more containers on the NodeManagers (assuming that you
will not overload the machine).

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

>
> yarn.nodemanager.resource.memory-mb = 2370 MiB,
> yarn.nodemanager.resource.cpu-vcores = 2,
>

So, you cannot run more than 8 containers on your setup (according to your
settings, each container consumes 1GB and 1 vcore).

> Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

Right. But you can increase yarn.nodemanager.resource.cpu-vcores to 3 and
decrease the container sizes (and heap) of map and reduce tasks, and you
will be able to run more containers on the NodeManagers (assuming that you
will not overload the machine).

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Hi Adam,

yarn.nodemanager.resource.memory-mb = 2370 MiB,
yarn.nodemanager.resource.cpu-vcores = 2,
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,
Use CGroups for Resource Management
yarn.nodemanager.linux-container-executor.resources-handler.class
is NOT checked.

Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

True, for hundreds or thousands of nodes a single coordination node might
be a bottleneck. My deployments are not expected to exceed 32 or 64 nodes.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 16:01 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Hi Tomek,
>
> You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
> of yarn.nodemanager.resource.memory-mb?
>
> You consume 1GB of RAM per container (8 containers running = 8GB of memory
> used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
> you have only 315MB of available memory on each NodeManager. Therefore,
> when you request 1GB to get a container for #8 map task, there is no
> NodeManager than can give you a whole 1GB (despite having more than 1GB of
> aggregated memory on the cluster).
>
> To verify this, please check the value of
> yarn.nodemanager.resource.memory-mb.
>
> Thanks,
> Adam
>
> PS1.
> Just our of curiosity. What are your values of
> *yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
> *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
> just to confirm. Could you have any non-default settings in your
> scheduler's configuration that limit the number of resources per user?)
> *yarn.nodemanager.linux-container-executor.resources-handler.class*
> ?
>
> PS2.
> "I am comparing M/R implementation with a custom one, where one node is
> dedicated for coordination and I utilize 4 slaves fully for computation."
>
> Note that this might not work on a larger scale, because "one node is
> dedicated for coordination" might become the bottleneck. This is one of a
> couple of reasons why YARN and original MapReduce at Google have decided to
> run coordination processes on slave nodes.
>
>
>
>
> 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> Thank you for your assistance, Adam.
>>
>> Containers running | Memory used | Memory total | Memory reserved
>>                          8 |             8 GB |        9.26 GB
>> |                     0 B
>>
>> Seems like you are right: the ApplicationMaster is occupying one slot as
>> I have 8 containers running, but 7 map tasks.
>>
>> Again, I revised my information about m1.large instance on EC2. There are
>> only 2 cores available per node giving 4 computing units (ECU units
>> introduced by Amazon). So 8 slots at a time is expected. However,
>> scheduling AM on a slave node ruins my experiment. I am comparing M/R
>> implementation with a custom one, where one node is dedicated for
>> coordination and I utilize 4 slaves fully for computation. This one core
>> for AM is extending the execution time by a factor of 2. Does any one have
>> an idea how to have 8 map tasks running?
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>>> Application Master will be is started on some slave node to coordinate the
>>> execution of all tasks within the job. The ApplicationMaster and tasks that
>>> belong to its application run in the containers controlled by the
>>> NodeManagers.
>>>
>>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>>> tasks. But it seems not to be a root cause of you problem, because
>>> according to your settings you should be able to run 16 containers
>>> maximally.
>>>
>>> Another idea might be that your are bottlenecked by the amount of memory
>>> on the cluster (each container consumes memory) and despite having vcore(s)
>>> available, you can not launch new tasks. When you go to the ResourceManager
>>> Web UI, do you see that you utilize whole cluster memory?
>>>
>>>
>>>
>>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>>
>>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>>> separate master node. The master has ResourceManager role (along with
>>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>>> single waiting map task is doubling my execution time.
>>>>
>>>> Pozdrawiam / Regards / Med venlig hilsen
>>>> Tomasz Guziałek
>>>>
>>>>
>>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>>
>>>> Is not your MapReduce AppMaster occupying one slot?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>>> empty slot. Why this surprising number came up? I have checked that the
>>>>> regions are equally distributed on the region servers (2 per node).
>>>>> >
>>>>> > My properties in the job:
>>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>>> >
>>>>> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
>>>>> >
>>>>> > My properties in the CDH:
>>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>>> >
>>>>> > Do I miss some property? Please share your experience.
>>>>> >
>>>>> > Best regards
>>>>> > Tomasz
>>>>>
>>>>
>>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Hi Adam,

yarn.nodemanager.resource.memory-mb = 2370 MiB,
yarn.nodemanager.resource.cpu-vcores = 2,
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,
Use CGroups for Resource Management
yarn.nodemanager.linux-container-executor.resources-handler.class
is NOT checked.

Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

True, for hundreds or thousands of nodes a single coordination node might
be a bottleneck. My deployments are not expected to exceed 32 or 64 nodes.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 16:01 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Hi Tomek,
>
> You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
> of yarn.nodemanager.resource.memory-mb?
>
> You consume 1GB of RAM per container (8 containers running = 8GB of memory
> used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
> you have only 315MB of available memory on each NodeManager. Therefore,
> when you request 1GB to get a container for #8 map task, there is no
> NodeManager than can give you a whole 1GB (despite having more than 1GB of
> aggregated memory on the cluster).
>
> To verify this, please check the value of
> yarn.nodemanager.resource.memory-mb.
>
> Thanks,
> Adam
>
> PS1.
> Just our of curiosity. What are your values of
> *yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
> *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
> just to confirm. Could you have any non-default settings in your
> scheduler's configuration that limit the number of resources per user?)
> *yarn.nodemanager.linux-container-executor.resources-handler.class*
> ?
>
> PS2.
> "I am comparing M/R implementation with a custom one, where one node is
> dedicated for coordination and I utilize 4 slaves fully for computation."
>
> Note that this might not work on a larger scale, because "one node is
> dedicated for coordination" might become the bottleneck. This is one of a
> couple of reasons why YARN and original MapReduce at Google have decided to
> run coordination processes on slave nodes.
>
>
>
>
> 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> Thank you for your assistance, Adam.
>>
>> Containers running | Memory used | Memory total | Memory reserved
>>                          8 |             8 GB |        9.26 GB
>> |                     0 B
>>
>> Seems like you are right: the ApplicationMaster is occupying one slot as
>> I have 8 containers running, but 7 map tasks.
>>
>> Again, I revised my information about m1.large instance on EC2. There are
>> only 2 cores available per node giving 4 computing units (ECU units
>> introduced by Amazon). So 8 slots at a time is expected. However,
>> scheduling AM on a slave node ruins my experiment. I am comparing M/R
>> implementation with a custom one, where one node is dedicated for
>> coordination and I utilize 4 slaves fully for computation. This one core
>> for AM is extending the execution time by a factor of 2. Does any one have
>> an idea how to have 8 map tasks running?
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>>> Application Master will be is started on some slave node to coordinate the
>>> execution of all tasks within the job. The ApplicationMaster and tasks that
>>> belong to its application run in the containers controlled by the
>>> NodeManagers.
>>>
>>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>>> tasks. But it seems not to be a root cause of you problem, because
>>> according to your settings you should be able to run 16 containers
>>> maximally.
>>>
>>> Another idea might be that your are bottlenecked by the amount of memory
>>> on the cluster (each container consumes memory) and despite having vcore(s)
>>> available, you can not launch new tasks. When you go to the ResourceManager
>>> Web UI, do you see that you utilize whole cluster memory?
>>>
>>>
>>>
>>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>>
>>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>>> separate master node. The master has ResourceManager role (along with
>>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>>> single waiting map task is doubling my execution time.
>>>>
>>>> Pozdrawiam / Regards / Med venlig hilsen
>>>> Tomasz Guziałek
>>>>
>>>>
>>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>>
>>>> Is not your MapReduce AppMaster occupying one slot?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>>> empty slot. Why this surprising number came up? I have checked that the
>>>>> regions are equally distributed on the region servers (2 per node).
>>>>> >
>>>>> > My properties in the job:
>>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>>> >
>>>>> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
>>>>> >
>>>>> > My properties in the CDH:
>>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>>> >
>>>>> > Do I miss some property? Please share your experience.
>>>>> >
>>>>> > Best regards
>>>>> > Tomasz
>>>>>
>>>>
>>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Hi Adam,

yarn.nodemanager.resource.memory-mb = 2370 MiB,
yarn.nodemanager.resource.cpu-vcores = 2,
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,
Use CGroups for Resource Management
yarn.nodemanager.linux-container-executor.resources-handler.class
is NOT checked.

Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

True, for hundreds or thousands of nodes a single coordination node might
be a bottleneck. My deployments are not expected to exceed 32 or 64 nodes.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 16:01 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Hi Tomek,
>
> You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
> of yarn.nodemanager.resource.memory-mb?
>
> You consume 1GB of RAM per container (8 containers running = 8GB of memory
> used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
> you have only 315MB of available memory on each NodeManager. Therefore,
> when you request 1GB to get a container for #8 map task, there is no
> NodeManager than can give you a whole 1GB (despite having more than 1GB of
> aggregated memory on the cluster).
>
> To verify this, please check the value of
> yarn.nodemanager.resource.memory-mb.
>
> Thanks,
> Adam
>
> PS1.
> Just our of curiosity. What are your values of
> *yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
> *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
> just to confirm. Could you have any non-default settings in your
> scheduler's configuration that limit the number of resources per user?)
> *yarn.nodemanager.linux-container-executor.resources-handler.class*
> ?
>
> PS2.
> "I am comparing M/R implementation with a custom one, where one node is
> dedicated for coordination and I utilize 4 slaves fully for computation."
>
> Note that this might not work on a larger scale, because "one node is
> dedicated for coordination" might become the bottleneck. This is one of a
> couple of reasons why YARN and original MapReduce at Google have decided to
> run coordination processes on slave nodes.
>
>
>
>
> 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> Thank you for your assistance, Adam.
>>
>> Containers running | Memory used | Memory total | Memory reserved
>>                          8 |             8 GB |        9.26 GB
>> |                     0 B
>>
>> Seems like you are right: the ApplicationMaster is occupying one slot as
>> I have 8 containers running, but 7 map tasks.
>>
>> Again, I revised my information about m1.large instance on EC2. There are
>> only 2 cores available per node giving 4 computing units (ECU units
>> introduced by Amazon). So 8 slots at a time is expected. However,
>> scheduling AM on a slave node ruins my experiment. I am comparing M/R
>> implementation with a custom one, where one node is dedicated for
>> coordination and I utilize 4 slaves fully for computation. This one core
>> for AM is extending the execution time by a factor of 2. Does any one have
>> an idea how to have 8 map tasks running?
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>>> Application Master will be is started on some slave node to coordinate the
>>> execution of all tasks within the job. The ApplicationMaster and tasks that
>>> belong to its application run in the containers controlled by the
>>> NodeManagers.
>>>
>>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>>> tasks. But it seems not to be a root cause of you problem, because
>>> according to your settings you should be able to run 16 containers
>>> maximally.
>>>
>>> Another idea might be that your are bottlenecked by the amount of memory
>>> on the cluster (each container consumes memory) and despite having vcore(s)
>>> available, you can not launch new tasks. When you go to the ResourceManager
>>> Web UI, do you see that you utilize whole cluster memory?
>>>
>>>
>>>
>>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>>
>>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>>> separate master node. The master has ResourceManager role (along with
>>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>>> single waiting map task is doubling my execution time.
>>>>
>>>> Pozdrawiam / Regards / Med venlig hilsen
>>>> Tomasz Guziałek
>>>>
>>>>
>>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>>
>>>> Is not your MapReduce AppMaster occupying one slot?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>>> empty slot. Why this surprising number came up? I have checked that the
>>>>> regions are equally distributed on the region servers (2 per node).
>>>>> >
>>>>> > My properties in the job:
>>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>>> >
>>>>> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
>>>>> >
>>>>> > My properties in the CDH:
>>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>>> >
>>>>> > Do I miss some property? Please share your experience.
>>>>> >
>>>>> > Best regards
>>>>> > Tomasz
>>>>>
>>>>
>>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Hi Adam,

yarn.nodemanager.resource.memory-mb = 2370 MiB,
yarn.nodemanager.resource.cpu-vcores = 2,
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,
Use CGroups for Resource Management
yarn.nodemanager.linux-container-executor.resources-handler.class
is NOT checked.

Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

True, for hundreds or thousands of nodes a single coordination node might
be a bottleneck. My deployments are not expected to exceed 32 or 64 nodes.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 16:01 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Hi Tomek,
>
> You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
> of yarn.nodemanager.resource.memory-mb?
>
> You consume 1GB of RAM per container (8 containers running = 8GB of memory
> used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
> you have only 315MB of available memory on each NodeManager. Therefore,
> when you request 1GB to get a container for #8 map task, there is no
> NodeManager than can give you a whole 1GB (despite having more than 1GB of
> aggregated memory on the cluster).
>
> To verify this, please check the value of
> yarn.nodemanager.resource.memory-mb.
>
> Thanks,
> Adam
>
> PS1.
> Just our of curiosity. What are your values of
> *yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
> *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
> just to confirm. Could you have any non-default settings in your
> scheduler's configuration that limit the number of resources per user?)
> *yarn.nodemanager.linux-container-executor.resources-handler.class*
> ?
>
> PS2.
> "I am comparing M/R implementation with a custom one, where one node is
> dedicated for coordination and I utilize 4 slaves fully for computation."
>
> Note that this might not work on a larger scale, because "one node is
> dedicated for coordination" might become the bottleneck. This is one of a
> couple of reasons why YARN and original MapReduce at Google have decided to
> run coordination processes on slave nodes.
>
>
>
>
> 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> Thank you for your assistance, Adam.
>>
>> Containers running | Memory used | Memory total | Memory reserved
>>                          8 |             8 GB |        9.26 GB
>> |                     0 B
>>
>> Seems like you are right: the ApplicationMaster is occupying one slot as
>> I have 8 containers running, but 7 map tasks.
>>
>> Again, I revised my information about m1.large instance on EC2. There are
>> only 2 cores available per node giving 4 computing units (ECU units
>> introduced by Amazon). So 8 slots at a time is expected. However,
>> scheduling AM on a slave node ruins my experiment. I am comparing M/R
>> implementation with a custom one, where one node is dedicated for
>> coordination and I utilize 4 slaves fully for computation. This one core
>> for AM is extending the execution time by a factor of 2. Does any one have
>> an idea how to have 8 map tasks running?
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>>> Application Master will be is started on some slave node to coordinate the
>>> execution of all tasks within the job. The ApplicationMaster and tasks that
>>> belong to its application run in the containers controlled by the
>>> NodeManagers.
>>>
>>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>>> tasks. But it seems not to be a root cause of you problem, because
>>> according to your settings you should be able to run 16 containers
>>> maximally.
>>>
>>> Another idea might be that your are bottlenecked by the amount of memory
>>> on the cluster (each container consumes memory) and despite having vcore(s)
>>> available, you can not launch new tasks. When you go to the ResourceManager
>>> Web UI, do you see that you utilize whole cluster memory?
>>>
>>>
>>>
>>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>>
>>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>>> separate master node. The master has ResourceManager role (along with
>>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>>> single waiting map task is doubling my execution time.
>>>>
>>>> Pozdrawiam / Regards / Med venlig hilsen
>>>> Tomasz Guziałek
>>>>
>>>>
>>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>>
>>>> Is not your MapReduce AppMaster occupying one slot?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>>> empty slot. Why this surprising number came up? I have checked that the
>>>>> regions are equally distributed on the region servers (2 per node).
>>>>> >
>>>>> > My properties in the job:
>>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>>> >
>>>>> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
>>>>> >
>>>>> > My properties in the CDH:
>>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>>> >
>>>>> > Do I miss some property? Please share your experience.
>>>>> >
>>>>> > Best regards
>>>>> > Tomasz
>>>>>
>>>>
>>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Hi Tomek,

You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
of yarn.nodemanager.resource.memory-mb?

You consume 1GB of RAM per container (8 containers running = 8GB of memory
used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
you have only 315MB of available memory on each NodeManager. Therefore,
when you request 1GB to get a container for #8 map task, there is no
NodeManager than can give you a whole 1GB (despite having more than 1GB of
aggregated memory on the cluster).

To verify this, please check the value of
yarn.nodemanager.resource.memory-mb.

Thanks,
Adam

PS1.
Just our of curiosity. What are your values of
*yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
*yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
just to confirm. Could you have any non-default settings in your
scheduler's configuration that limit the number of resources per user?)
*yarn.nodemanager.linux-container-executor.resources-handler.class*
?

PS2.
"I am comparing M/R implementation with a custom one, where one node is
dedicated for coordination and I utilize 4 slaves fully for computation."

Note that this might not work on a larger scale, because "one node is
dedicated for coordination" might become the bottleneck. This is one of a
couple of reasons why YARN and original MapReduce at Google have decided to
run coordination processes on slave nodes.




2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> Thank you for your assistance, Adam.
>
> Containers running | Memory used | Memory total | Memory reserved
>                          8 |             8 GB |        9.26 GB
> |                     0 B
>
> Seems like you are right: the ApplicationMaster is occupying one slot as I
> have 8 containers running, but 7 map tasks.
>
> Again, I revised my information about m1.large instance on EC2. There are
> only 2 cores available per node giving 4 computing units (ECU units
> introduced by Amazon). So 8 slots at a time is expected. However,
> scheduling AM on a slave node ruins my experiment. I am comparing M/R
> implementation with a custom one, where one node is dedicated for
> coordination and I utilize 4 slaves fully for computation. This one core
> for AM is extending the execution time by a factor of 2. Does any one have
> an idea how to have 8 map tasks running?
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>> Application Master will be is started on some slave node to coordinate the
>> execution of all tasks within the job. The ApplicationMaster and tasks that
>> belong to its application run in the containers controlled by the
>> NodeManagers.
>>
>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>> tasks. But it seems not to be a root cause of you problem, because
>> according to your settings you should be able to run 16 containers
>> maximally.
>>
>> Another idea might be that your are bottlenecked by the amount of memory
>> on the cluster (each container consumes memory) and despite having vcore(s)
>> available, you can not launch new tasks. When you go to the ResourceManager
>> Web UI, do you see that you utilize whole cluster memory?
>>
>>
>>
>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>
>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>> separate master node. The master has ResourceManager role (along with
>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>> single waiting map task is doubling my execution time.
>>>
>>> Pozdrawiam / Regards / Med venlig hilsen
>>> Tomasz Guziałek
>>>
>>>
>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>
>>> Is not your MapReduce AppMaster occupying one slot?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>> empty slot. Why this surprising number came up? I have checked that the
>>>> regions are equally distributed on the region servers (2 per node).
>>>> >
>>>> > My properties in the job:
>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>>> "16");
>>>> >
>>>> > My properties in the CDH:
>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>> >
>>>> > Do I miss some property? Please share your experience.
>>>> >
>>>> > Best regards
>>>> > Tomasz
>>>>
>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Hi Tomek,

You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
of yarn.nodemanager.resource.memory-mb?

You consume 1GB of RAM per container (8 containers running = 8GB of memory
used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
you have only 315MB of available memory on each NodeManager. Therefore,
when you request 1GB to get a container for #8 map task, there is no
NodeManager than can give you a whole 1GB (despite having more than 1GB of
aggregated memory on the cluster).

To verify this, please check the value of
yarn.nodemanager.resource.memory-mb.

Thanks,
Adam

PS1.
Just our of curiosity. What are your values of
*yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
*yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
just to confirm. Could you have any non-default settings in your
scheduler's configuration that limit the number of resources per user?)
*yarn.nodemanager.linux-container-executor.resources-handler.class*
?

PS2.
"I am comparing M/R implementation with a custom one, where one node is
dedicated for coordination and I utilize 4 slaves fully for computation."

Note that this might not work on a larger scale, because "one node is
dedicated for coordination" might become the bottleneck. This is one of a
couple of reasons why YARN and original MapReduce at Google have decided to
run coordination processes on slave nodes.




2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> Thank you for your assistance, Adam.
>
> Containers running | Memory used | Memory total | Memory reserved
>                          8 |             8 GB |        9.26 GB
> |                     0 B
>
> Seems like you are right: the ApplicationMaster is occupying one slot as I
> have 8 containers running, but 7 map tasks.
>
> Again, I revised my information about m1.large instance on EC2. There are
> only 2 cores available per node giving 4 computing units (ECU units
> introduced by Amazon). So 8 slots at a time is expected. However,
> scheduling AM on a slave node ruins my experiment. I am comparing M/R
> implementation with a custom one, where one node is dedicated for
> coordination and I utilize 4 slaves fully for computation. This one core
> for AM is extending the execution time by a factor of 2. Does any one have
> an idea how to have 8 map tasks running?
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>> Application Master will be is started on some slave node to coordinate the
>> execution of all tasks within the job. The ApplicationMaster and tasks that
>> belong to its application run in the containers controlled by the
>> NodeManagers.
>>
>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>> tasks. But it seems not to be a root cause of you problem, because
>> according to your settings you should be able to run 16 containers
>> maximally.
>>
>> Another idea might be that your are bottlenecked by the amount of memory
>> on the cluster (each container consumes memory) and despite having vcore(s)
>> available, you can not launch new tasks. When you go to the ResourceManager
>> Web UI, do you see that you utilize whole cluster memory?
>>
>>
>>
>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>
>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>> separate master node. The master has ResourceManager role (along with
>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>> single waiting map task is doubling my execution time.
>>>
>>> Pozdrawiam / Regards / Med venlig hilsen
>>> Tomasz Guziałek
>>>
>>>
>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>
>>> Is not your MapReduce AppMaster occupying one slot?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>> empty slot. Why this surprising number came up? I have checked that the
>>>> regions are equally distributed on the region servers (2 per node).
>>>> >
>>>> > My properties in the job:
>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>>> "16");
>>>> >
>>>> > My properties in the CDH:
>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>> >
>>>> > Do I miss some property? Please share your experience.
>>>> >
>>>> > Best regards
>>>> > Tomasz
>>>>
>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Hi Tomek,

You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
of yarn.nodemanager.resource.memory-mb?

You consume 1GB of RAM per container (8 containers running = 8GB of memory
used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
you have only 315MB of available memory on each NodeManager. Therefore,
when you request 1GB to get a container for #8 map task, there is no
NodeManager than can give you a whole 1GB (despite having more than 1GB of
aggregated memory on the cluster).

To verify this, please check the value of
yarn.nodemanager.resource.memory-mb.

Thanks,
Adam

PS1.
Just our of curiosity. What are your values of
*yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
*yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
just to confirm. Could you have any non-default settings in your
scheduler's configuration that limit the number of resources per user?)
*yarn.nodemanager.linux-container-executor.resources-handler.class*
?

PS2.
"I am comparing M/R implementation with a custom one, where one node is
dedicated for coordination and I utilize 4 slaves fully for computation."

Note that this might not work on a larger scale, because "one node is
dedicated for coordination" might become the bottleneck. This is one of a
couple of reasons why YARN and original MapReduce at Google have decided to
run coordination processes on slave nodes.




2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> Thank you for your assistance, Adam.
>
> Containers running | Memory used | Memory total | Memory reserved
>                          8 |             8 GB |        9.26 GB
> |                     0 B
>
> Seems like you are right: the ApplicationMaster is occupying one slot as I
> have 8 containers running, but 7 map tasks.
>
> Again, I revised my information about m1.large instance on EC2. There are
> only 2 cores available per node giving 4 computing units (ECU units
> introduced by Amazon). So 8 slots at a time is expected. However,
> scheduling AM on a slave node ruins my experiment. I am comparing M/R
> implementation with a custom one, where one node is dedicated for
> coordination and I utilize 4 slaves fully for computation. This one core
> for AM is extending the execution time by a factor of 2. Does any one have
> an idea how to have 8 map tasks running?
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>> Application Master will be is started on some slave node to coordinate the
>> execution of all tasks within the job. The ApplicationMaster and tasks that
>> belong to its application run in the containers controlled by the
>> NodeManagers.
>>
>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>> tasks. But it seems not to be a root cause of you problem, because
>> according to your settings you should be able to run 16 containers
>> maximally.
>>
>> Another idea might be that your are bottlenecked by the amount of memory
>> on the cluster (each container consumes memory) and despite having vcore(s)
>> available, you can not launch new tasks. When you go to the ResourceManager
>> Web UI, do you see that you utilize whole cluster memory?
>>
>>
>>
>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>
>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>> separate master node. The master has ResourceManager role (along with
>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>> single waiting map task is doubling my execution time.
>>>
>>> Pozdrawiam / Regards / Med venlig hilsen
>>> Tomasz Guziałek
>>>
>>>
>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>
>>> Is not your MapReduce AppMaster occupying one slot?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>> empty slot. Why this surprising number came up? I have checked that the
>>>> regions are equally distributed on the region servers (2 per node).
>>>> >
>>>> > My properties in the job:
>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>>> "16");
>>>> >
>>>> > My properties in the CDH:
>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>> >
>>>> > Do I miss some property? Please share your experience.
>>>> >
>>>> > Best regards
>>>> > Tomasz
>>>>
>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Hi Tomek,

You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
of yarn.nodemanager.resource.memory-mb?

You consume 1GB of RAM per container (8 containers running = 8GB of memory
used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
you have only 315MB of available memory on each NodeManager. Therefore,
when you request 1GB to get a container for #8 map task, there is no
NodeManager than can give you a whole 1GB (despite having more than 1GB of
aggregated memory on the cluster).

To verify this, please check the value of
yarn.nodemanager.resource.memory-mb.

Thanks,
Adam

PS1.
Just our of curiosity. What are your values of
*yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
*yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
just to confirm. Could you have any non-default settings in your
scheduler's configuration that limit the number of resources per user?)
*yarn.nodemanager.linux-container-executor.resources-handler.class*
?

PS2.
"I am comparing M/R implementation with a custom one, where one node is
dedicated for coordination and I utilize 4 slaves fully for computation."

Note that this might not work on a larger scale, because "one node is
dedicated for coordination" might become the bottleneck. This is one of a
couple of reasons why YARN and original MapReduce at Google have decided to
run coordination processes on slave nodes.




2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> Thank you for your assistance, Adam.
>
> Containers running | Memory used | Memory total | Memory reserved
>                          8 |             8 GB |        9.26 GB
> |                     0 B
>
> Seems like you are right: the ApplicationMaster is occupying one slot as I
> have 8 containers running, but 7 map tasks.
>
> Again, I revised my information about m1.large instance on EC2. There are
> only 2 cores available per node giving 4 computing units (ECU units
> introduced by Amazon). So 8 slots at a time is expected. However,
> scheduling AM on a slave node ruins my experiment. I am comparing M/R
> implementation with a custom one, where one node is dedicated for
> coordination and I utilize 4 slaves fully for computation. This one core
> for AM is extending the execution time by a factor of 2. Does any one have
> an idea how to have 8 map tasks running?
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>> Application Master will be is started on some slave node to coordinate the
>> execution of all tasks within the job. The ApplicationMaster and tasks that
>> belong to its application run in the containers controlled by the
>> NodeManagers.
>>
>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>> tasks. But it seems not to be a root cause of you problem, because
>> according to your settings you should be able to run 16 containers
>> maximally.
>>
>> Another idea might be that your are bottlenecked by the amount of memory
>> on the cluster (each container consumes memory) and despite having vcore(s)
>> available, you can not launch new tasks. When you go to the ResourceManager
>> Web UI, do you see that you utilize whole cluster memory?
>>
>>
>>
>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>>
>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>> separate master node. The master has ResourceManager role (along with
>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>> single waiting map task is doubling my execution time.
>>>
>>> Pozdrawiam / Regards / Med venlig hilsen
>>> Tomasz Guziałek
>>>
>>>
>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>>
>>> Is not your MapReduce AppMaster occupying one slot?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hello all,
>>>> >
>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>> empty slot. Why this surprising number came up? I have checked that the
>>>> regions are equally distributed on the region servers (2 per node).
>>>> >
>>>> > My properties in the job:
>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>>> "16");
>>>> >
>>>> > My properties in the CDH:
>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>> >
>>>> > Do I miss some property? Please share your experience.
>>>> >
>>>> > Best regards
>>>> > Tomasz
>>>>
>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory reserved
                         8 |             8 GB |        9.26 GB
|                     0 B

Seems like you are right: the ApplicationMaster is occupying one slot as I
have 8 containers running, but 7 map tasks.

Again, I revised my information about m1.large instance on EC2. There are
only 2 cores available per node giving 4 computing units (ECU units
introduced by Amazon). So 8 slots at a time is expected. However,
scheduling AM on a slave node ruins my experiment. I am comparing M/R
implementation with a custom one, where one node is dedicated for
coordination and I utilize 4 slaves fully for computation. This one core
for AM is extending the execution time by a factor of 2. Does any one have
an idea how to have 8 map tasks running?

Pozdrawiam / Regards / Med venlig hilsen
Tomasz GuziaĹek


2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> If you run an application (e.g. MapReduce job) on YARN cluster, first the
> Application Master will be is started on some slave node to coordinate the
> execution of all tasks within the job. The ApplicationMaster and tasks that
> belong to its application run in the containers controlled by the
> NodeManagers.
>
> Maybe, you simply run 8 containers on your YARN cluster and 1 container is
> consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
> But it seems not to be a root cause of you problem, because according to
> your settings you should be able to run 16 containers maximally.
>
> Another idea might be that your are bottlenecked by the amount of memory
> on the cluster (each container consumes memory) and despite having vcore(s)
> available, you can not launch new tasks. When you go to the ResourceManager
> Web UI, do you see that you utilize whole cluster memory?
>
>
>
> 2014-07-08 21:06 GMT+02:00 Tomasz GuziaĹek <to...@guzialek.info>:
>
> I was not precise when describing my cluster. I have 4 slave nodes and a
>> separate master node. The master has ResourceManager role (along with
>> JobHistory role) and the rest have NodeManager roles. If this really is an
>> ApplicationMaster, is it possible to schedule it on the master node? This
>> single waiting map task is doubling my execution time.
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz GuziaĹek
>>
>>
>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> Is not your MapReduce AppMaster occupying one slot?
>>>
>>> Sent from my iPhone
>>>
>>> > On 8 jul 2014, at 13:01, Tomasz GuziaĹek <to...@gmail.com>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>>> slot. Why this surprising number came up? I have checked that the regions
>>> are equally distributed on the region servers (2 per node).
>>> >
>>> > My properties in the job:
>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>> "16");
>>> >
>>> > My properties in the CDH:
>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>> >
>>> > Do I miss some property? Please share your experience.
>>> >
>>> > Best regards
>>> > Tomasz
>>>
>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory reserved
                         8 |             8 GB |        9.26 GB
|                     0 B

Seems like you are right: the ApplicationMaster is occupying one slot as I
have 8 containers running, but 7 map tasks.

Again, I revised my information about m1.large instance on EC2. There are
only 2 cores available per node giving 4 computing units (ECU units
introduced by Amazon). So 8 slots at a time is expected. However,
scheduling AM on a slave node ruins my experiment. I am comparing M/R
implementation with a custom one, where one node is dedicated for
coordination and I utilize 4 slaves fully for computation. This one core
for AM is extending the execution time by a factor of 2. Does any one have
an idea how to have 8 map tasks running?

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> If you run an application (e.g. MapReduce job) on YARN cluster, first the
> Application Master will be is started on some slave node to coordinate the
> execution of all tasks within the job. The ApplicationMaster and tasks that
> belong to its application run in the containers controlled by the
> NodeManagers.
>
> Maybe, you simply run 8 containers on your YARN cluster and 1 container is
> consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
> But it seems not to be a root cause of you problem, because according to
> your settings you should be able to run 16 containers maximally.
>
> Another idea might be that your are bottlenecked by the amount of memory
> on the cluster (each container consumes memory) and despite having vcore(s)
> available, you can not launch new tasks. When you go to the ResourceManager
> Web UI, do you see that you utilize whole cluster memory?
>
>
>
> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> I was not precise when describing my cluster. I have 4 slave nodes and a
>> separate master node. The master has ResourceManager role (along with
>> JobHistory role) and the rest have NodeManager roles. If this really is an
>> ApplicationMaster, is it possible to schedule it on the master node? This
>> single waiting map task is doubling my execution time.
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> Is not your MapReduce AppMaster occupying one slot?
>>>
>>> Sent from my iPhone
>>>
>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>>> slot. Why this surprising number came up? I have checked that the regions
>>> are equally distributed on the region servers (2 per node).
>>> >
>>> > My properties in the job:
>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>> "16");
>>> >
>>> > My properties in the CDH:
>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>> >
>>> > Do I miss some property? Please share your experience.
>>> >
>>> > Best regards
>>> > Tomasz
>>>
>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory reserved
                         8 |             8 GB |        9.26 GB
|                     0 B

Seems like you are right: the ApplicationMaster is occupying one slot as I
have 8 containers running, but 7 map tasks.

Again, I revised my information about m1.large instance on EC2. There are
only 2 cores available per node giving 4 computing units (ECU units
introduced by Amazon). So 8 slots at a time is expected. However,
scheduling AM on a slave node ruins my experiment. I am comparing M/R
implementation with a custom one, where one node is dedicated for
coordination and I utilize 4 slaves fully for computation. This one core
for AM is extending the execution time by a factor of 2. Does any one have
an idea how to have 8 map tasks running?

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> If you run an application (e.g. MapReduce job) on YARN cluster, first the
> Application Master will be is started on some slave node to coordinate the
> execution of all tasks within the job. The ApplicationMaster and tasks that
> belong to its application run in the containers controlled by the
> NodeManagers.
>
> Maybe, you simply run 8 containers on your YARN cluster and 1 container is
> consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
> But it seems not to be a root cause of you problem, because according to
> your settings you should be able to run 16 containers maximally.
>
> Another idea might be that your are bottlenecked by the amount of memory
> on the cluster (each container consumes memory) and despite having vcore(s)
> available, you can not launch new tasks. When you go to the ResourceManager
> Web UI, do you see that you utilize whole cluster memory?
>
>
>
> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> I was not precise when describing my cluster. I have 4 slave nodes and a
>> separate master node. The master has ResourceManager role (along with
>> JobHistory role) and the rest have NodeManager roles. If this really is an
>> ApplicationMaster, is it possible to schedule it on the master node? This
>> single waiting map task is doubling my execution time.
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> Is not your MapReduce AppMaster occupying one slot?
>>>
>>> Sent from my iPhone
>>>
>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>>> slot. Why this surprising number came up? I have checked that the regions
>>> are equally distributed on the region servers (2 per node).
>>> >
>>> > My properties in the job:
>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>> "16");
>>> >
>>> > My properties in the CDH:
>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>> >
>>> > Do I miss some property? Please share your experience.
>>> >
>>> > Best regards
>>> > Tomasz
>>>
>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory reserved
                         8 |             8 GB |        9.26 GB
|                     0 B

Seems like you are right: the ApplicationMaster is occupying one slot as I
have 8 containers running, but 7 map tasks.

Again, I revised my information about m1.large instance on EC2. There are
only 2 cores available per node giving 4 computing units (ECU units
introduced by Amazon). So 8 slots at a time is expected. However,
scheduling AM on a slave node ruins my experiment. I am comparing M/R
implementation with a custom one, where one node is dedicated for
coordination and I utilize 4 slaves fully for computation. This one core
for AM is extending the execution time by a factor of 2. Does any one have
an idea how to have 8 map tasks running?

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 0:56 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> If you run an application (e.g. MapReduce job) on YARN cluster, first the
> Application Master will be is started on some slave node to coordinate the
> execution of all tasks within the job. The ApplicationMaster and tasks that
> belong to its application run in the containers controlled by the
> NodeManagers.
>
> Maybe, you simply run 8 containers on your YARN cluster and 1 container is
> consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
> But it seems not to be a root cause of you problem, because according to
> your settings you should be able to run 16 containers maximally.
>
> Another idea might be that your are bottlenecked by the amount of memory
> on the cluster (each container consumes memory) and despite having vcore(s)
> available, you can not launch new tasks. When you go to the ResourceManager
> Web UI, do you see that you utilize whole cluster memory?
>
>
>
> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:
>
> I was not precise when describing my cluster. I have 4 slave nodes and a
>> separate master node. The master has ResourceManager role (along with
>> JobHistory role) and the rest have NodeManager roles. If this really is an
>> ApplicationMaster, is it possible to schedule it on the master node? This
>> single waiting map task is doubling my execution time.
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>>
>> Is not your MapReduce AppMaster occupying one slot?
>>>
>>> Sent from my iPhone
>>>
>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>>> slot. Why this surprising number came up? I have checked that the regions
>>> are equally distributed on the region servers (2 per node).
>>> >
>>> > My properties in the job:
>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>> "16");
>>> >
>>> > My properties in the CDH:
>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>> >
>>> > Do I miss some property? Please share your experience.
>>> >
>>> > Best regards
>>> > Tomasz
>>>
>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

If you run an application (e.g. MapReduce job) on YARN cluster, first the
Application Master will be is started on some slave node to coordinate the
execution of all tasks within the job. The ApplicationMaster and tasks that
belong to its application run in the containers controlled by the
NodeManagers.

Maybe, you simply run 8 containers on your YARN cluster and 1 container is
consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
But it seems not to be a root cause of you problem, because according to
your settings you should be able to run 16 containers maximally.

Another idea might be that your are bottlenecked by the amount of memory on
the cluster (each container consumes memory) and despite having vcore(s)
available, you can not launch new tasks. When you go to the ResourceManager
Web UI, do you see that you utilize whole cluster memory?

2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> I was not precise when describing my cluster. I have 4 slave nodes and a
> separate master node. The master has ResourceManager role (along with
> JobHistory role) and the rest have NodeManager roles. If this really is an
> ApplicationMaster, is it possible to schedule it on the master node? This
> single waiting map task is doubling my execution time.
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> Is not your MapReduce AppMaster occupying one slot?
>>
>> Sent from my iPhone
>>
>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>> wrote:
>> >
>> > Hello all,
>> >
>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>> slot. Why this surprising number came up? I have checked that the regions
>> are equally distributed on the region servers (2 per node).
>> >
>> > My properties in the job:
>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>> "16");
>> >
>> > My properties in the CDH:
>> > yarn.scheduler.minimum-allocation-vcores = 1
>> > yarn.scheduler.maximum-allocation-vcores = 4
>> >
>> > Do I miss some property? Please share your experience.
>> >
>> > Best regards
>> > Tomasz
>>
>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

If you run an application (e.g. MapReduce job) on YARN cluster, first the
Application Master will be is started on some slave node to coordinate the
execution of all tasks within the job. The ApplicationMaster and tasks that
belong to its application run in the containers controlled by the
NodeManagers.

Maybe, you simply run 8 containers on your YARN cluster and 1 container is
consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
But it seems not to be a root cause of you problem, because according to
your settings you should be able to run 16 containers maximally.

Another idea might be that your are bottlenecked by the amount of memory on
the cluster (each container consumes memory) and despite having vcore(s)
available, you can not launch new tasks. When you go to the ResourceManager
Web UI, do you see that you utilize whole cluster memory?

2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> I was not precise when describing my cluster. I have 4 slave nodes and a
> separate master node. The master has ResourceManager role (along with
> JobHistory role) and the rest have NodeManager roles. If this really is an
> ApplicationMaster, is it possible to schedule it on the master node? This
> single waiting map task is doubling my execution time.
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> Is not your MapReduce AppMaster occupying one slot?
>>
>> Sent from my iPhone
>>
>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>> wrote:
>> >
>> > Hello all,
>> >
>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>> slot. Why this surprising number came up? I have checked that the regions
>> are equally distributed on the region servers (2 per node).
>> >
>> > My properties in the job:
>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>> "16");
>> >
>> > My properties in the CDH:
>> > yarn.scheduler.minimum-allocation-vcores = 1
>> > yarn.scheduler.maximum-allocation-vcores = 4
>> >
>> > Do I miss some property? Please share your experience.
>> >
>> > Best regards
>> > Tomasz
>>
>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

If you run an application (e.g. MapReduce job) on YARN cluster, first the
Application Master will be is started on some slave node to coordinate the
execution of all tasks within the job. The ApplicationMaster and tasks that
belong to its application run in the containers controlled by the
NodeManagers.

Maybe, you simply run 8 containers on your YARN cluster and 1 container is
consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
But it seems not to be a root cause of you problem, because according to
your settings you should be able to run 16 containers maximally.

Another idea might be that your are bottlenecked by the amount of memory on
the cluster (each container consumes memory) and despite having vcore(s)
available, you can not launch new tasks. When you go to the ResourceManager
Web UI, do you see that you utilize whole cluster memory?

2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> I was not precise when describing my cluster. I have 4 slave nodes and a
> separate master node. The master has ResourceManager role (along with
> JobHistory role) and the rest have NodeManager roles. If this really is an
> ApplicationMaster, is it possible to schedule it on the master node? This
> single waiting map task is doubling my execution time.
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> Is not your MapReduce AppMaster occupying one slot?
>>
>> Sent from my iPhone
>>
>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>> wrote:
>> >
>> > Hello all,
>> >
>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>> slot. Why this surprising number came up? I have checked that the regions
>> are equally distributed on the region servers (2 per node).
>> >
>> > My properties in the job:
>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>> "16");
>> >
>> > My properties in the CDH:
>> > yarn.scheduler.minimum-allocation-vcores = 1
>> > yarn.scheduler.maximum-allocation-vcores = 4
>> >
>> > Do I miss some property? Please share your experience.
>> >
>> > Best regards
>> > Tomasz
>>
>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

If you run an application (e.g. MapReduce job) on YARN cluster, first the
Application Master will be is started on some slave node to coordinate the
execution of all tasks within the job. The ApplicationMaster and tasks that
belong to its application run in the containers controlled by the
NodeManagers.

Maybe, you simply run 8 containers on your YARN cluster and 1 container is
consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
But it seems not to be a root cause of you problem, because according to
your settings you should be able to run 16 containers maximally.

Another idea might be that your are bottlenecked by the amount of memory on
the cluster (each container consumes memory) and despite having vcore(s)
available, you can not launch new tasks. When you go to the ResourceManager
Web UI, do you see that you utilize whole cluster memory?

2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <to...@guzialek.info>:

> I was not precise when describing my cluster. I have 4 slave nodes and a
> separate master node. The master has ResourceManager role (along with
> JobHistory role) and the rest have NodeManager roles. If this really is an
> ApplicationMaster, is it possible to schedule it on the master node? This
> single waiting map task is doubling my execution time.
>
> Pozdrawiam / Regards / Med venlig hilsen
> Tomasz Guziałek
>
>
> 2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:
>
> Is not your MapReduce AppMaster occupying one slot?
>>
>> Sent from my iPhone
>>
>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
>> wrote:
>> >
>> > Hello all,
>> >
>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>> slot. Why this surprising number came up? I have checked that the regions
>> are equally distributed on the region servers (2 per node).
>> >
>> > My properties in the job:
>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>> "16");
>> >
>> > My properties in the CDH:
>> > yarn.scheduler.minimum-allocation-vcores = 1
>> > yarn.scheduler.maximum-allocation-vcores = 4
>> >
>> > Do I miss some property? Please share your experience.
>> >
>> > Best regards
>> > Tomasz
>>
>
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

I was not precise when describing my cluster. I have 4 slave nodes and a
separate master node. The master has ResourceManager role (along with
JobHistory role) and the rest have NodeManager roles. If this really is an
ApplicationMaster, is it possible to schedule it on the master node? This
single waiting map task is doubling my execution time.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Is not your MapReduce AppMaster occupying one slot?
>
> Sent from my iPhone
>
> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
> wrote:
> >
> > Hello all,
> >
> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
> simultaneously. However, only 7 are running and 1 is waiting for an empty
> slot. Why this surprising number came up? I have checked that the regions
> are equally distributed on the region servers (2 per node).
> >
> > My properties in the job:
> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
> "16");
> >
> > My properties in the CDH:
> > yarn.scheduler.minimum-allocation-vcores = 1
> > yarn.scheduler.maximum-allocation-vcores = 4
> >
> > Do I miss some property? Please share your experience.
> >
> > Best regards
> > Tomasz
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

I was not precise when describing my cluster. I have 4 slave nodes and a
separate master node. The master has ResourceManager role (along with
JobHistory role) and the rest have NodeManager roles. If this really is an
ApplicationMaster, is it possible to schedule it on the master node? This
single waiting map task is doubling my execution time.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Is not your MapReduce AppMaster occupying one slot?
>
> Sent from my iPhone
>
> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
> wrote:
> >
> > Hello all,
> >
> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
> simultaneously. However, only 7 are running and 1 is waiting for an empty
> slot. Why this surprising number came up? I have checked that the regions
> are equally distributed on the region servers (2 per node).
> >
> > My properties in the job:
> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
> "16");
> >
> > My properties in the CDH:
> > yarn.scheduler.minimum-allocation-vcores = 1
> > yarn.scheduler.maximum-allocation-vcores = 4
> >
> > Do I miss some property? Please share your experience.
> >
> > Best regards
> > Tomasz
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

I was not precise when describing my cluster. I have 4 slave nodes and a
separate master node. The master has ResourceManager role (along with
JobHistory role) and the rest have NodeManager roles. If this really is an
ApplicationMaster, is it possible to schedule it on the master node? This
single waiting map task is doubling my execution time.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Is not your MapReduce AppMaster occupying one slot?
>
> Sent from my iPhone
>
> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
> wrote:
> >
> > Hello all,
> >
> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
> simultaneously. However, only 7 are running and 1 is waiting for an empty
> slot. Why this surprising number came up? I have checked that the regions
> are equally distributed on the region servers (2 per node).
> >
> > My properties in the job:
> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
> "16");
> >
> > My properties in the CDH:
> > yarn.scheduler.minimum-allocation-vcores = 1
> > yarn.scheduler.maximum-allocation-vcores = 4
> >
> > Do I miss some property? Please share your experience.
> >
> > Best regards
> > Tomasz
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Tomasz Guziałek <to...@guzialek.info>.

I was not precise when describing my cluster. I have 4 slave nodes and a
separate master node. The master has ResourceManager role (along with
JobHistory role) and the rest have NodeManager roles. If this really is an
ApplicationMaster, is it possible to schedule it on the master node? This
single waiting map task is doubling my execution time.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-08 18:42 GMT+02:00 Adam Kawa <ka...@gmail.com>:

> Is not your MapReduce AppMaster occupying one slot?
>
> Sent from my iPhone
>
> > On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com>
> wrote:
> >
> > Hello all,
> >
> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
> simultaneously. However, only 7 are running and 1 is waiting for an empty
> slot. Why this surprising number came up? I have checked that the regions
> are equally distributed on the region servers (2 per node).
> >
> > My properties in the job:
> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
> "16");
> >
> > My properties in the CDH:
> > yarn.scheduler.minimum-allocation-vcores = 1
> > yarn.scheduler.maximum-allocation-vcores = 4
> >
> > Do I miss some property? Please share your experience.
> >
> > Best regards
> > Tomasz
>

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Is not your MapReduce AppMaster occupying one slot?

Sent from my iPhone

> On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com> wrote:
> 
> Hello all,
> 
> I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run simultaneously. However, only 7 are running and 1 is waiting for an empty slot. Why this surprising number came up? I have checked that the regions are equally distributed on the region servers (2 per node).
> 
> My properties in the job:
> Configuration mapReduceConfiguration = HBaseConfiguration.create();
> mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
> 
> My properties in the CDH:
> yarn.scheduler.minimum-allocation-vcores = 1
> yarn.scheduler.maximum-allocation-vcores = 4
> 
> Do I miss some property? Please share your experience.
> 
> Best regards
> Tomasz

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Is not your MapReduce AppMaster occupying one slot?

Sent from my iPhone

> On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com> wrote:
> 
> Hello all,
> 
> I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run simultaneously. However, only 7 are running and 1 is waiting for an empty slot. Why this surprising number came up? I have checked that the regions are equally distributed on the region servers (2 per node).
> 
> My properties in the job:
> Configuration mapReduceConfiguration = HBaseConfiguration.create();
> mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
> 
> My properties in the CDH:
> yarn.scheduler.minimum-allocation-vcores = 1
> yarn.scheduler.maximum-allocation-vcores = 4
> 
> Do I miss some property? Please share your experience.
> 
> Best regards
> Tomasz

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Is not your MapReduce AppMaster occupying one slot?

Sent from my iPhone

> On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com> wrote:
> 
> Hello all,
> 
> I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run simultaneously. However, only 7 are running and 1 is waiting for an empty slot. Why this surprising number came up? I have checked that the regions are equally distributed on the region servers (2 per node).
> 
> My properties in the job:
> Configuration mapReduceConfiguration = HBaseConfiguration.create();
> mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
> 
> My properties in the CDH:
> yarn.scheduler.minimum-allocation-vcores = 1
> yarn.scheduler.maximum-allocation-vcores = 4
> 
> Do I miss some property? Please share your experience.
> 
> Best regards
> Tomasz

Re: The number of simultaneous map tasks is unexpected.

Posted by Adam Kawa <ka...@gmail.com>.

Is not your MapReduce AppMaster occupying one slot?

Sent from my iPhone

> On 8 jul 2014, at 13:01, Tomasz Guziałek <to...@gmail.com> wrote:
> 
> Hello all,
> 
> I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run simultaneously. However, only 7 are running and 1 is waiting for an empty slot. Why this surprising number came up? I have checked that the regions are equally distributed on the region servers (2 per node).
> 
> My properties in the job:
> Configuration mapReduceConfiguration = HBaseConfiguration.create();
> mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", "16");
> 
> My properties in the CDH:
> yarn.scheduler.minimum-allocation-vcores = 1
> yarn.scheduler.maximum-allocation-vcores = 4
> 
> Do I miss some property? Please share your experience.
> 
> Best regards
> Tomasz