You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Omid Alipourfard <al...@usc.edu> on 2015/08/25 03:53:47 UTC

Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that
comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with
Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per
machine), when I run the Terasort job, one of the machines is idling, i.e.,
it is not using any substantial Disk or CPU.  All three machines are
capable of executing jobs, and one of the machines is both a name node and
a data node.

On the other hand, running the same job on a cluster of three machines with
2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them
mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because
of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is
appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
As guessed it was right that AM was configured to take more mem resource hence RM will not be able to allocate containers in that machine,
so easy guess would be to reduce the memory "yarn.app.mapreduce.am.resource.mb" may be like 512mb as the total number of maps and reducers are less, you can do some more diff mem values here

Counters could be got from the client side too once the application is finished. Or how about the history server webui ? is it able to show the MR job information (mostly no)
Are you able to physically see the files in the location ?

Counters is very useful information to fine tune your job, may be you check in logs to get more info on it.

To improve the performance of Terasort you can further check following configurations based on the counter values
mapreduce.task.io.sort.factor=100
mapreduce.task.io.sort.mb=102400
mapreduce.map.output.compress=true
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

Hope it helps !

+ Naga
________________________________
From: Omid Alipourfard [ecynics@gmail.com]
Sent: Wednesday, August 26, 2015 10:07
To: user@hadoop.apache.org
Subject: Re: Questions with regards to Yarn/Hadoop

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application Manager is using 1.5GB of memory on the idle machine which leaves no space for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container

yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same

It changes across runs, but I am tearing down and setting up the Hadoop cluster every time.  Only hadoop processes and a collector are running on the machines.  The collector has a memory footprint of around 10~ MBs and uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information please share across.

Unfortunately, I had no luck setting up the job history daemon.  After issuing: mapred job -history [JobHistoryId] I get an error that the history file does not exist -- it's a FileIOException on a path in the form of: /user/{hadoop.user}/job.id<http://job.id>.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu<ma...@usc.edu>]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
As guessed it was right that AM was configured to take more mem resource hence RM will not be able to allocate containers in that machine,
so easy guess would be to reduce the memory "yarn.app.mapreduce.am.resource.mb" may be like 512mb as the total number of maps and reducers are less, you can do some more diff mem values here

Counters could be got from the client side too once the application is finished. Or how about the history server webui ? is it able to show the MR job information (mostly no)
Are you able to physically see the files in the location ?

Counters is very useful information to fine tune your job, may be you check in logs to get more info on it.

To improve the performance of Terasort you can further check following configurations based on the counter values
mapreduce.task.io.sort.factor=100
mapreduce.task.io.sort.mb=102400
mapreduce.map.output.compress=true
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

Hope it helps !

+ Naga
________________________________
From: Omid Alipourfard [ecynics@gmail.com]
Sent: Wednesday, August 26, 2015 10:07
To: user@hadoop.apache.org
Subject: Re: Questions with regards to Yarn/Hadoop

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application Manager is using 1.5GB of memory on the idle machine which leaves no space for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container

yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same

It changes across runs, but I am tearing down and setting up the Hadoop cluster every time.  Only hadoop processes and a collector are running on the machines.  The collector has a memory footprint of around 10~ MBs and uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information please share across.

Unfortunately, I had no luck setting up the job history daemon.  After issuing: mapred job -history [JobHistoryId] I get an error that the history file does not exist -- it's a FileIOException on a path in the form of: /user/{hadoop.user}/job.id<http://job.id>.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu<ma...@usc.edu>]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
As guessed it was right that AM was configured to take more mem resource hence RM will not be able to allocate containers in that machine,
so easy guess would be to reduce the memory "yarn.app.mapreduce.am.resource.mb" may be like 512mb as the total number of maps and reducers are less, you can do some more diff mem values here

Counters could be got from the client side too once the application is finished. Or how about the history server webui ? is it able to show the MR job information (mostly no)
Are you able to physically see the files in the location ?

Counters is very useful information to fine tune your job, may be you check in logs to get more info on it.

To improve the performance of Terasort you can further check following configurations based on the counter values
mapreduce.task.io.sort.factor=100
mapreduce.task.io.sort.mb=102400
mapreduce.map.output.compress=true
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

Hope it helps !

+ Naga
________________________________
From: Omid Alipourfard [ecynics@gmail.com]
Sent: Wednesday, August 26, 2015 10:07
To: user@hadoop.apache.org
Subject: Re: Questions with regards to Yarn/Hadoop

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application Manager is using 1.5GB of memory on the idle machine which leaves no space for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container

yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same

It changes across runs, but I am tearing down and setting up the Hadoop cluster every time.  Only hadoop processes and a collector are running on the machines.  The collector has a memory footprint of around 10~ MBs and uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information please share across.

Unfortunately, I had no luck setting up the job history daemon.  After issuing: mapred job -history [JobHistoryId] I get an error that the history file does not exist -- it's a FileIOException on a path in the form of: /user/{hadoop.user}/job.id<http://job.id>.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu<ma...@usc.edu>]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
As guessed it was right that AM was configured to take more mem resource hence RM will not be able to allocate containers in that machine,
so easy guess would be to reduce the memory "yarn.app.mapreduce.am.resource.mb" may be like 512mb as the total number of maps and reducers are less, you can do some more diff mem values here

Counters could be got from the client side too once the application is finished. Or how about the history server webui ? is it able to show the MR job information (mostly no)
Are you able to physically see the files in the location ?

Counters is very useful information to fine tune your job, may be you check in logs to get more info on it.

To improve the performance of Terasort you can further check following configurations based on the counter values
mapreduce.task.io.sort.factor=100
mapreduce.task.io.sort.mb=102400
mapreduce.map.output.compress=true
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec

Hope it helps !

+ Naga
________________________________
From: Omid Alipourfard [ecynics@gmail.com]
Sent: Wednesday, August 26, 2015 10:07
To: user@hadoop.apache.org
Subject: Re: Questions with regards to Yarn/Hadoop

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application Manager is using 1.5GB of memory on the idle machine which leaves no space for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container

yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same

It changes across runs, but I am tearing down and setting up the Hadoop cluster every time.  Only hadoop processes and a collector are running on the machines.  The collector has a memory footprint of around 10~ MBs and uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information please share across.

Unfortunately, I had no luck setting up the job history daemon.  After issuing: mapred job -history [JobHistoryId] I get an error that the history file does not exist -- it's a FileIOException on a path in the form of: /user/{hadoop.user}/job.id<http://job.id>.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <ga...@huawei.com>> wrote:
Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu<ma...@usc.edu>]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

Re: Questions with regards to Yarn/Hadoop

Posted by Omid Alipourfard <ec...@gmail.com>.

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application
Manager is using 1.5GB of memory on the idle machine which leaves no space
for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container


yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same
> machine or different everytime. are there any other processes running in
> that machine if always same


It changes across runs, but I am tearing down and setting up the Hadoop
cluster every time.  Only hadoop processes and a collector are running on
the machines.  The collector has a memory footprint of around 10~ MBs and
uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information
> please share across.


Unfortunately, I had no luck setting up the job history daemon.  After
issuing: mapred job -history [JobHistoryId] I get an error that the history
file does not exist -- it's a FileIOException on a path in the form of:
/user/{hadoop.user}/job.id.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start
historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Omid,
> Seems like the machine which was running slow might have the AM container
> also and possibly 2GB is assigned to it.
> Can you share the following details :
> * Memory configuration of AM container
> * Containers which are running in the idling machine, is it always the
> same machine or different everytime. are there any other processes running
> in that machine if always same
> * Job counters for both the runs will also provide useful information
> please share across.
>
> Regards,
> + Naga
>
>
> ------------------------------
> *From:* Omid Alipourfard [alipourf@usc.edu]
> *Sent:* Tuesday, August 25, 2015 07:23
> *To:* user@hadoop.apache.org
> *Subject:* Questions with regards to Yarn/Hadoop
>
> Hi,
>
> I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that
> comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with
> Yarn, which I am hoping someone can shed some light on:
>
> I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per
> machine), when I run the Terasort job, one of the machines is idling, i.e.,
> it is not using any substantial Disk or CPU.  All three machines are
> capable of executing jobs, and one of the machines is both a name node and
> a data node.
>
> On the other hand, running the same job on a cluster of three machines
> with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.
>
> Both setups are using the same Hadoop configuration files, in both of them
> mapper tasks have 1 GB and reducer tasks have 2 GB of memory.
>
> I am guessing Yarn is not utilizing the machines correctly -- maybe
> because of the available amount of RAM, but I am not sure how to verify
> this.
>
> Any thoughts on what the problem might be or how to verify it is
> appreciated,
> Thanks,
> Omid
>
> P.S. I can also post any of the logs or configuration files.
>

Re: Questions with regards to Yarn/Hadoop

Posted by Omid Alipourfard <ec...@gmail.com>.

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application
Manager is using 1.5GB of memory on the idle machine which leaves no space
for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container


yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same
> machine or different everytime. are there any other processes running in
> that machine if always same


It changes across runs, but I am tearing down and setting up the Hadoop
cluster every time.  Only hadoop processes and a collector are running on
the machines.  The collector has a memory footprint of around 10~ MBs and
uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information
> please share across.


Unfortunately, I had no luck setting up the job history daemon.  After
issuing: mapred job -history [JobHistoryId] I get an error that the history
file does not exist -- it's a FileIOException on a path in the form of:
/user/{hadoop.user}/job.id.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start
historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Omid,
> Seems like the machine which was running slow might have the AM container
> also and possibly 2GB is assigned to it.
> Can you share the following details :
> * Memory configuration of AM container
> * Containers which are running in the idling machine, is it always the
> same machine or different everytime. are there any other processes running
> in that machine if always same
> * Job counters for both the runs will also provide useful information
> please share across.
>
> Regards,
> + Naga
>
>
> ------------------------------
> *From:* Omid Alipourfard [alipourf@usc.edu]
> *Sent:* Tuesday, August 25, 2015 07:23
> *To:* user@hadoop.apache.org
> *Subject:* Questions with regards to Yarn/Hadoop
>
> Hi,
>
> I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that
> comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with
> Yarn, which I am hoping someone can shed some light on:
>
> I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per
> machine), when I run the Terasort job, one of the machines is idling, i.e.,
> it is not using any substantial Disk or CPU.  All three machines are
> capable of executing jobs, and one of the machines is both a name node and
> a data node.
>
> On the other hand, running the same job on a cluster of three machines
> with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.
>
> Both setups are using the same Hadoop configuration files, in both of them
> mapper tasks have 1 GB and reducer tasks have 2 GB of memory.
>
> I am guessing Yarn is not utilizing the machines correctly -- maybe
> because of the available amount of RAM, but I am not sure how to verify
> this.
>
> Any thoughts on what the problem might be or how to verify it is
> appreciated,
> Thanks,
> Omid
>
> P.S. I can also post any of the logs or configuration files.
>

Re: Questions with regards to Yarn/Hadoop

Posted by Omid Alipourfard <ec...@gmail.com>.

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application
Manager is using 1.5GB of memory on the idle machine which leaves no space
for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container


yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same
> machine or different everytime. are there any other processes running in
> that machine if always same


It changes across runs, but I am tearing down and setting up the Hadoop
cluster every time.  Only hadoop processes and a collector are running on
the machines.  The collector has a memory footprint of around 10~ MBs and
uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information
> please share across.


Unfortunately, I had no luck setting up the job history daemon.  After
issuing: mapred job -history [JobHistoryId] I get an error that the history
file does not exist -- it's a FileIOException on a path in the form of:
/user/{hadoop.user}/job.id.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start
historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Omid,
> Seems like the machine which was running slow might have the AM container
> also and possibly 2GB is assigned to it.
> Can you share the following details :
> * Memory configuration of AM container
> * Containers which are running in the idling machine, is it always the
> same machine or different everytime. are there any other processes running
> in that machine if always same
> * Job counters for both the runs will also provide useful information
> please share across.
>
> Regards,
> + Naga
>
>
> ------------------------------
> *From:* Omid Alipourfard [alipourf@usc.edu]
> *Sent:* Tuesday, August 25, 2015 07:23
> *To:* user@hadoop.apache.org
> *Subject:* Questions with regards to Yarn/Hadoop
>
> Hi,
>
> I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that
> comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with
> Yarn, which I am hoping someone can shed some light on:
>
> I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per
> machine), when I run the Terasort job, one of the machines is idling, i.e.,
> it is not using any substantial Disk or CPU.  All three machines are
> capable of executing jobs, and one of the machines is both a name node and
> a data node.
>
> On the other hand, running the same job on a cluster of three machines
> with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.
>
> Both setups are using the same Hadoop configuration files, in both of them
> mapper tasks have 1 GB and reducer tasks have 2 GB of memory.
>
> I am guessing Yarn is not utilizing the machines correctly -- maybe
> because of the available amount of RAM, but I am not sure how to verify
> this.
>
> Any thoughts on what the problem might be or how to verify it is
> appreciated,
> Thanks,
> Omid
>
> P.S. I can also post any of the logs or configuration files.
>

Re: Questions with regards to Yarn/Hadoop

Posted by Omid Alipourfard <ec...@gmail.com>.

Hi Naga,

Thanks much.  This can explain why I am seeing the behavior.  Application
Manager is using 1.5GB of memory on the idle machine which leaves no space
for other containers to run on that machine.

To answer your other questions:

* Memory configuration of AM container


yarn.app.mapreduce.am.command-opts: Xmx1024m
yarn.app.mapreduce.am.resource.mb: 1536

* Containers which are running in the idling machine, is it always the same
> machine or different everytime. are there any other processes running in
> that machine if always same


It changes across runs, but I am tearing down and setting up the Hadoop
cluster every time.  Only hadoop processes and a collector are running on
the machines.  The collector has a memory footprint of around 10~ MBs and
uses less than 0.2 percent of the CPU time.

* Job counters for both the runs will also provide useful information
> please share across.


Unfortunately, I had no luck setting up the job history daemon.  After
issuing: mapred job -history [JobHistoryId] I get an error that the history
file does not exist -- it's a FileIOException on a path in the form of:
/user/{hadoop.user}/job.id.

I tried setting up these variables but they don't seem to have any effects.

yarn.log-aggregation-enable
mapreduce.jobtracker.jobhistory.location
mapreduce.jobtracker.jobhistory.completed.location

I also ran the history server through: mr-jobhistory-daemon.sh start
historyserver

Thanks,
Omid

On Mon, Aug 24, 2015 at 11:09 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

> Hi Omid,
> Seems like the machine which was running slow might have the AM container
> also and possibly 2GB is assigned to it.
> Can you share the following details :
> * Memory configuration of AM container
> * Containers which are running in the idling machine, is it always the
> same machine or different everytime. are there any other processes running
> in that machine if always same
> * Job counters for both the runs will also provide useful information
> please share across.
>
> Regards,
> + Naga
>
>
> ------------------------------
> *From:* Omid Alipourfard [alipourf@usc.edu]
> *Sent:* Tuesday, August 25, 2015 07:23
> *To:* user@hadoop.apache.org
> *Subject:* Questions with regards to Yarn/Hadoop
>
> Hi,
>
> I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that
> comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with
> Yarn, which I am hoping someone can shed some light on:
>
> I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per
> machine), when I run the Terasort job, one of the machines is idling, i.e.,
> it is not using any substantial Disk or CPU.  All three machines are
> capable of executing jobs, and one of the machines is both a name node and
> a data node.
>
> On the other hand, running the same job on a cluster of three machines
> with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.
>
> Both setups are using the same Hadoop configuration files, in both of them
> mapper tasks have 1 GB and reducer tasks have 2 GB of memory.
>
> I am guessing Yarn is not utilizing the machines correctly -- maybe
> because of the available amount of RAM, but I am not sure how to verify
> this.
>
> Any thoughts on what the problem might be or how to verify it is
> appreciated,
> Thanks,
> Omid
>
> P.S. I can also post any of the logs or configuration files.
>

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.

RE: Questions with regards to Yarn/Hadoop

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

Hi Omid,
Seems like the machine which was running slow might have the AM container also and possibly 2GB is assigned to it.
Can you share the following details :
* Memory configuration of AM container
* Containers which are running in the idling machine, is it always the same machine or different everytime. are there any other processes running in that machine if always same
* Job counters for both the runs will also provide useful information please share across.

Regards,
+ Naga


________________________________
From: Omid Alipourfard [alipourf@usc.edu]
Sent: Tuesday, August 25, 2015 07:23
To: user@hadoop.apache.org
Subject: Questions with regards to Yarn/Hadoop

Hi,

I am running a Terasort benchmark (10 GB, 25 reducers, 50 mappers) that comes with Hadoop 2.7.1.  I am experiencing an unexpected behavior with Yarn, which I am hoping someone can shed some light on:

I have a cluster of three machines with 2 cores and 3.75 GB of RAM (per machine), when I run the Terasort job, one of the machines is idling, i.e., it is not using any substantial Disk or CPU.  All three machines are capable of executing jobs, and one of the machines is both a name node and a data node.

On the other hand, running the same job on a cluster of three machines with 2 cores and 8 GB of RAM (per machine) utilizes all the machines.

Both setups are using the same Hadoop configuration files, in both of them mapper tasks have 1 GB and reducer tasks have 2 GB of memory.

I am guessing Yarn is not utilizing the machines correctly -- maybe because of the available amount of RAM, but I am not sure how to verify this.

Any thoughts on what the problem might be or how to verify it is appreciated,
Thanks,
Omid

P.S. I can also post any of the logs or configuration files.