You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by anil gupta <an...@gmail.com> on 2012/06/13 21:16:05 UTC

Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Hi All

I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
nodes in my cluster(2 Admin Node and 3 DN).
My resourcemanager is up and running and showing that all three DN are
running the nodemanager. HDFS is also working fine and showing 3 DN's.

But when i fire the pi example job. It starts to run in Local mode.
Here is the console output:
sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar pi 10 1000000000
Number of Maps  = 10
Samples per Map = 1000000000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
Instead, use dfs.metrics.session-id
12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
process : 10
12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
config null
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
12/06/13 12:03:29 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
0
12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
name and  BYTES_READ as counter name instead
12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
samples.
12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
samples.

Here is the content of yarn-site.xml:

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <description>List of directories to store localized files in.</
description>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/disk/yarn/local</value>
  </property>

  <property>
    <description>Where to store container logs.</description>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/disk/yarn/logs</value>
  </property>

  <property>
    <description>Where to aggregate logs to.</description>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/var/log/hadoop-yarn/apps</value>
  </property>

  <property>
    <description>Classpath for typical applications.</description>
     <name>yarn.application.classpath</name>
     <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $YARN_HOME/*,$YARN_HOME/lib/*
     </value>
  </property>
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>ihub-an-g1:8025</value>
</property>
<property>
        <name>yarn.resourcemanager.address</name>
        <value>ihub-an-g1:8040</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>ihub-an-g1:8030</value>
</property>
<property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>ihub-an-g1:8141</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>ihub-an-g1:8088</value>
</property>
<property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/disk/mapred/jobhistory/intermediate/done</value>
</property>
<property>
        <name>>mapreduce.jobhistory.done-dir</name>
        <value>/disk/mapred/jobhistory/done</value>
</property>
</configuration>

Can anyone tell me what is the problem over here? Appreciate your
help.
Thanks,
Anil Gupta

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Posted by anil gupta <an...@gmail.com>.

Forgot to mention:
Hadoop version: Hadoop 2.0.0-cdh4.0.0

On Wed, Jun 13, 2012 at 12:16 PM, anil gupta <an...@gmail.com> wrote:

> Hi All
>
> I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
> nodes in my cluster(2 Admin Node and 3 DN).
> My resourcemanager is up and running and showing that all three DN are
> running the nodemanager. HDFS is also working fine and showing 3 DN's.
>
> But when i fire the pi example job. It starts to run in Local mode.
> Here is the console output:
> sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
> examples.jar pi 10 1000000000
> Number of Maps  = 10
> Samples per Map = 1000000000
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Wrote input for Map #4
> Wrote input for Map #5
> Wrote input for Map #6
> Wrote input for Map #7
> Wrote input for Map #8
> Wrote input for Map #9
> Starting Job
> 12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
> Instead, use dfs.metrics.session-id
> 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
> process : 10
> 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
> config null
> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
> org.apache.hadoop.mapred.FileOutputCommitter
> 12/06/13 12:03:29 WARN mapreduce.Counters: Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
> 0
> 12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
> 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
> MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
> name and  BYTES_READ as counter name instead
> 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
> 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
> 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
> 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
> 12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
> samples.
> 12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
> 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
> samples.
>
> Here is the content of yarn-site.xml:
>
> <configuration>
>   <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
>   </property>
>
>   <property>
>     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>   </property>
>
>   <property>
>     <name>yarn.log-aggregation-enable</name>
>     <value>true</value>
>   </property>
>
>   <property>
>     <description>List of directories to store localized files in.</
> description>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/disk/yarn/local</value>
>   </property>
>
>   <property>
>     <description>Where to store container logs.</description>
>     <name>yarn.nodemanager.log-dirs</name>
>     <value>/disk/yarn/logs</value>
>   </property>
>
>   <property>
>     <description>Where to aggregate logs to.</description>
>     <name>yarn.nodemanager.remote-app-log-dir</name>
>     <value>/var/log/hadoop-yarn/apps</value>
>   </property>
>
>   <property>
>     <description>Classpath for typical applications.</description>
>      <name>yarn.application.classpath</name>
>      <value>
>         $HADOOP_CONF_DIR,
>         $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
>         $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
>         $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
>         $YARN_HOME/*,$YARN_HOME/lib/*
>      </value>
>   </property>
> <property>
>         <name>yarn.resourcemanager.resource-tracker.address</name>
>         <value>ihub-an-g1:8025</value>
> </property>
> <property>
>         <name>yarn.resourcemanager.address</name>
>         <value>ihub-an-g1:8040</value>
> </property>
> <property>
>         <name>yarn.resourcemanager.scheduler.address</name>
>         <value>ihub-an-g1:8030</value>
> </property>
> <property>
>         <name>yarn.resourcemanager.admin.address</name>
>         <value>ihub-an-g1:8141</value>
> </property>
> <property>
>         <name>yarn.resourcemanager.webapp.address</name>
>         <value>ihub-an-g1:8088</value>
> </property>
> <property>
>         <name>mapreduce.jobhistory.intermediate-done-dir</name>
>         <value>/disk/mapred/jobhistory/intermediate/done</value>
> </property>
> <property>
>         <name>>mapreduce.jobhistory.done-dir</name>
>         <value>/disk/mapred/jobhistory/done</value>
> </property>
> </configuration>
>
> Can anyone tell me what is the problem over here? Appreciate your
> help.
> Thanks,
> Anil Gupta
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Posted by Harsh J <ha...@cloudera.com>.

Hey,

Moving this to CDH-users (cdh-user@cloudera.org) list as its CDH4
packaging/deployment specific. (You can subscribe to it via
https://groups.google.com/a/cloudera.org/group/cdh-user). BCC'd
common-user and cc'd you and Marcos too.

MR jobs will see mapred-site.xml to determine what
'cluster'/'framework' they need to use. Hence, you will need to follow
these deployment instructions when using MR2 with YARN as your choice
of MR: https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step1

(Or if you use CM4, just visit YARN service in it and ask it to deploy
configs automatically to all hosts or get you a client config bundle
for self-deployment. Applying those would do this automatically and
remove pains :))

Can you try this and let us know if it works Anil?

On Thu, Jun 14, 2012 at 2:56 AM, anil gupta <an...@gmail.com> wrote:
> Hi Marcus,
>
> Sorry i forgot to mention that Job history server is installed and running
> and AFAIK resourcemanager is responsible for running MR jobs. Historyserver
> is only used to get info about MR jobs.
>
> Thanks,
> Anil
>
> On Wed, Jun 13, 2012 at 2:04 PM, Marcos Ortiz <ml...@uci.cu> wrote:
>
>> According to the CDH 4 official documentation, you should install a
>> JobHistory server for your MRv2 (YARN)
>> cluster.
>> https://ccp.cloudera.com/**display/CDH4DOC/Deploying+**
>> MapReduce+v2+%28YARN%29+on+a+**Cluster<https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster>
>>
>> How to configure the HistoryServer
>> https://ccp.cloudera.com/**display/CDH4DOC/Deploying+**
>> MapReduce+v2+%28YARN%29+on+a+**Cluster#DeployingMapReducev2%**
>> 28YARN%29onaCluster-Step3<https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3>
>>
>>
>>
>>
>> On 06/13/2012 03:16 PM, anil gupta wrote:
>>
>>> Hi All
>>>
>>> I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
>>> nodes in my cluster(2 Admin Node and 3 DN).
>>> My resourcemanager is up and running and showing that all three DN are
>>> running the nodemanager. HDFS is also working fine and showing 3 DN's.
>>>
>>> But when i fire the pi example job. It starts to run in Local mode.
>>> Here is the console output:
>>> sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/**hadoop-mapreduce-
>>> examples.jar pi 10 1000000000
>>> Number of Maps  = 10
>>> Samples per Map = 1000000000
>>> Wrote input for Map #0
>>> Wrote input for Map #1
>>> Wrote input for Map #2
>>> Wrote input for Map #3
>>> Wrote input for Map #4
>>> Wrote input for Map #5
>>> Wrote input for Map #6
>>> Wrote input for Map #7
>>> Wrote input for Map #8
>>> Wrote input for Map #9
>>> Starting Job
>>> 12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
>>> Instead, use dfs.metrics.session-id
>>> 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> processName=JobTracker, sessionId=
>>> 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>> library
>>> 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the
>>> same.
>>> 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
>>> process : 10
>>> 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
>>> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
>>> config null
>>> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
>>> org.apache.hadoop.mapred.**FileOutputCommitter
>>> 12/06/13 12:03:29 WARN mapreduce.Counters: Group
>>> org.apache.hadoop.mapred.Task$**Counter is deprecated. Use
>>> org.apache.hadoop.mapreduce.**TaskCounter instead
>>> 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
>>> 0
>>> 12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>>> org.apache.hadoop.util.**LinuxResourceCalculatorPlugin@**3d46e381
>>> 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
>>> MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
>>> name and  BYTES_READ as counter name instead
>>> 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
>>> 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
>>> 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
>>> 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
>>> 12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
>>> 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
>>> samples.
>>> 12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
>>> 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
>>> samples.
>>>
>>> Here is the content of yarn-site.xml:
>>>
>>> <configuration>
>>>   <property>
>>>     <name>yarn.nodemanager.aux-**services</name>
>>>     <value>mapreduce.shuffle</**value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>yarn.nodemanager.aux-**services.mapreduce.shuffle.**
>>> class</name>
>>>     <value>org.apache.hadoop.**mapred.ShuffleHandler</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>yarn.log-aggregation-**enable</name>
>>>     <value>true</value>
>>>   </property>
>>>
>>>   <property>
>>>     <description>List of directories to store localized files in.</
>>> description>
>>>     <name>yarn.nodemanager.local-**dirs</name>
>>>     <value>/disk/yarn/local</**value>
>>>   </property>
>>>
>>>   <property>
>>>     <description>Where to store container logs.</description>
>>>     <name>yarn.nodemanager.log-**dirs</name>
>>>     <value>/disk/yarn/logs</value>
>>>   </property>
>>>
>>>   <property>
>>>     <description>Where to aggregate logs to.</description>
>>>     <name>yarn.nodemanager.remote-**app-log-dir</name>
>>>     <value>/var/log/hadoop-yarn/**apps</value>
>>>   </property>
>>>
>>>   <property>
>>>     <description>Classpath for typical applications.</description>
>>>      <name>yarn.application.**classpath</name>
>>>      <value>
>>>         $HADOOP_CONF_DIR,
>>>         $HADOOP_COMMON_HOME/*,$HADOOP_**COMMON_HOME/lib/*,
>>>         $HADOOP_HDFS_HOME/*,$HADOOP_**HDFS_HOME/lib/*,
>>>         $HADOOP_MAPRED_HOME/*,$HADOOP_**MAPRED_HOME/lib/*,
>>>         $YARN_HOME/*,$YARN_HOME/lib/*
>>>      </value>
>>>   </property>
>>> <property>
>>>         <name>yarn.resourcemanager.**resource-tracker.address</**name>
>>>         <value>ihub-an-g1:8025</value>
>>> </property>
>>> <property>
>>>         <name>yarn.resourcemanager.**address</name>
>>>         <value>ihub-an-g1:8040</value>
>>> </property>
>>> <property>
>>>         <name>yarn.resourcemanager.**scheduler.address</name>
>>>         <value>ihub-an-g1:8030</value>
>>> </property>
>>> <property>
>>>         <name>yarn.resourcemanager.**admin.address</name>
>>>         <value>ihub-an-g1:8141</value>
>>> </property>
>>> <property>
>>>         <name>yarn.resourcemanager.**webapp.address</name>
>>>         <value>ihub-an-g1:8088</value>
>>> </property>
>>> <property>
>>>         <name>mapreduce.jobhistory.**intermediate-done-dir</name>
>>>         <value>/disk/mapred/**jobhistory/intermediate/done</**value>
>>> </property>
>>> <property>
>>>         <name>>mapreduce.jobhistory.**done-dir</name>
>>>         <value>/disk/mapred/**jobhistory/done</value>
>>> </property>
>>> </configuration>
>>>
>>> Can anyone tell me what is the problem over here? Appreciate your
>>> help.
>>> Thanks,
>>> Anil Gupta
>>>
>>>
>>>
>>>
>> --
>> Marcos Luis Ortíz Valmaseda
>>  Data Engineer&&  Sr. System Administrator at UCI
>>  http://marcosluis2186.**posterous.com<http://marcosluis2186.posterous.com>
>>  http://www.linkedin.com/in/**marcosluis2186<http://www.linkedin.com/in/marcosluis2186>
>>  Twitter: @marcosluis2186
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta



-- 
Harsh J

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Posted by Marcos Ortiz <ml...@uci.cu>.

Can you share with us in pastebin all conf files that you are using for 
YARN?


On 06/13/2012 05:26 PM, anil gupta wrote:
> Hi Marcus,
>
> Sorry i forgot to mention that Job history server is installed and 
> running and AFAIK resourcemanager is responsible for running MR jobs. 
> Historyserver is only used to get info about MR jobs.
>
> Thanks,
> Anil
>
> On Wed, Jun 13, 2012 at 2:04 PM, Marcos Ortiz <mlortiz@uci.cu 
> <ma...@uci.cu>> wrote:
>
>     According to the CDH 4 official documentation, you should install
>     a JobHistory server for your MRv2 (YARN)
>     cluster.
>     https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster
>
>     How to configure the HistoryServer
>     https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3
>
>
>
>
>
>     On 06/13/2012 03:16 PM, anil gupta wrote:
>
>         Hi All
>
>         I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
>         nodes in my cluster(2 Admin Node and 3 DN).
>         My resourcemanager is up and running and showing that all
>         three DN are
>         running the nodemanager. HDFS is also working fine and showing
>         3 DN's.
>
>         But when i fire the pi example job. It starts to run in Local
>         mode.
>         Here is the console output:
>         sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
>         examples.jar pi 10 1000000000
>         Number of Maps  = 10
>         Samples per Map = 1000000000
>         Wrote input for Map #0
>         Wrote input for Map #1
>         Wrote input for Map #2
>         Wrote input for Map #3
>         Wrote input for Map #4
>         Wrote input for Map #5
>         Wrote input for Map #6
>         Wrote input for Map #7
>         Wrote input for Map #8
>         Wrote input for Map #9
>         Starting Job
>         12/06/13 12:03:27 WARN conf.Configuration: session.id
>         <http://session.id> is deprecated.
>         Instead, use dfs.metrics.session-id
>         12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM
>         Metrics with
>         processName=JobTracker, sessionId=
>         12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the
>         native-hadoop
>         library
>         12/06/13 12:03:27 WARN mapred.JobClient: Use
>         GenericOptionsParser for
>         parsing the arguments. Applications should implement Tool for the
>         same.
>         12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input
>         paths to
>         process : 10
>         12/06/13 12:03:29 INFO mapred.JobClient: Running job:
>         job_local_0001
>         12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter
>         set in
>         config null
>         12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
>         org.apache.hadoop.mapred.FileOutputCommitter
>         12/06/13 12:03:29 WARN mapreduce.Counters: Group
>         org.apache.hadoop.mapred.Task$Counter is deprecated. Use
>         org.apache.hadoop.mapreduce.TaskCounter instead
>         12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with
>         exit code
>         0
>         12/06/13 12:03:29 INFO mapred.Task:  Using
>         ResourceCalculatorPlugin :
>         org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
>         12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
>         MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as
>         group
>         name and  BYTES_READ as counter name instead
>         12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
>         12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
>         12/06/13 12:03:30 INFO mapred.MapTask: data buffer =
>         79691776/99614720
>         12/06/13 12:03:30 INFO mapred.MapTask: record buffer =
>         262144/327680
>         12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
>         12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
>         samples.
>         12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
>         12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
>         samples.
>
>         Here is the content of yarn-site.xml:
>
>         <configuration>
>         <property>
>         <name>yarn.nodemanager.aux-services</name>
>         <value>mapreduce.shuffle</value>
>         </property>
>
>         <property>
>         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>         </property>
>
>         <property>
>         <name>yarn.log-aggregation-enable</name>
>         <value>true</value>
>         </property>
>
>         <property>
>         <description>List of directories to store localized files in.</
>         description>
>         <name>yarn.nodemanager.local-dirs</name>
>         <value>/disk/yarn/local</value>
>         </property>
>
>         <property>
>         <description>Where to store container logs.</description>
>         <name>yarn.nodemanager.log-dirs</name>
>         <value>/disk/yarn/logs</value>
>         </property>
>
>         <property>
>         <description>Where to aggregate logs to.</description>
>         <name>yarn.nodemanager.remote-app-log-dir</name>
>         <value>/var/log/hadoop-yarn/apps</value>
>         </property>
>
>         <property>
>         <description>Classpath for typical applications.</description>
>         <name>yarn.application.classpath</name>
>         <value>
>                 $HADOOP_CONF_DIR,
>                 $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
>                 $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
>                 $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
>                 $YARN_HOME/*,$YARN_HOME/lib/*
>         </value>
>         </property>
>         <property>
>         <name>yarn.resourcemanager.resource-tracker.address</name>
>         <value>ihub-an-g1:8025</value>
>         </property>
>         <property>
>         <name>yarn.resourcemanager.address</name>
>         <value>ihub-an-g1:8040</value>
>         </property>
>         <property>
>         <name>yarn.resourcemanager.scheduler.address</name>
>         <value>ihub-an-g1:8030</value>
>         </property>
>         <property>
>         <name>yarn.resourcemanager.admin.address</name>
>         <value>ihub-an-g1:8141</value>
>         </property>
>         <property>
>         <name>yarn.resourcemanager.webapp.address</name>
>         <value>ihub-an-g1:8088</value>
>         </property>
>         <property>
>         <name>mapreduce.jobhistory.intermediate-done-dir</name>
>         <value>/disk/mapred/jobhistory/intermediate/done</value>
>         </property>
>         <property>
>         <name>>mapreduce.jobhistory.done-dir</name>
>         <value>/disk/mapred/jobhistory/done</value>
>         </property>
>         </configuration>
>
>         Can anyone tell me what is the problem over here? Appreciate your
>         help.
>         Thanks,
>         Anil Gupta
>
>
>
>
>     -- 
>     Marcos Luis Ortíz Valmaseda
>      Data Engineer&&  Sr. System Administrator at UCI
>     http://marcosluis2186.posterous.com
>     http://www.linkedin.com/in/marcosluis2186
>      Twitter: @marcosluis2186
>
>
>     10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>     INFORMATICAS...
>     CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
>     http://www.uci.cu
>     http://www.facebook.com/universidad.uci
>     http://www.flickr.com/photos/universidad_uci
>
>
>
>
> -- 
> Thanks & Regards,
> Anil Gupta

-- 
Marcos Luis Ortíz Valmaseda
  Data Engineer&&  Sr. System Administrator at UCI
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Posted by anil gupta <an...@gmail.com>.

Hi Marcus,

Sorry i forgot to mention that Job history server is installed and running
and AFAIK resourcemanager is responsible for running MR jobs. Historyserver
is only used to get info about MR jobs.

Thanks,
Anil

On Wed, Jun 13, 2012 at 2:04 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> According to the CDH 4 official documentation, you should install a
> JobHistory server for your MRv2 (YARN)
> cluster.
> https://ccp.cloudera.com/**display/CDH4DOC/Deploying+**
> MapReduce+v2+%28YARN%29+on+a+**Cluster<https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster>
>
> How to configure the HistoryServer
> https://ccp.cloudera.com/**display/CDH4DOC/Deploying+**
> MapReduce+v2+%28YARN%29+on+a+**Cluster#DeployingMapReducev2%**
> 28YARN%29onaCluster-Step3<https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3>
>
>
>
>
> On 06/13/2012 03:16 PM, anil gupta wrote:
>
>> Hi All
>>
>> I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
>> nodes in my cluster(2 Admin Node and 3 DN).
>> My resourcemanager is up and running and showing that all three DN are
>> running the nodemanager. HDFS is also working fine and showing 3 DN's.
>>
>> But when i fire the pi example job. It starts to run in Local mode.
>> Here is the console output:
>> sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/**hadoop-mapreduce-
>> examples.jar pi 10 1000000000
>> Number of Maps  = 10
>> Samples per Map = 1000000000
>> Wrote input for Map #0
>> Wrote input for Map #1
>> Wrote input for Map #2
>> Wrote input for Map #3
>> Wrote input for Map #4
>> Wrote input for Map #5
>> Wrote input for Map #6
>> Wrote input for Map #7
>> Wrote input for Map #8
>> Wrote input for Map #9
>> Starting Job
>> 12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
>> Instead, use dfs.metrics.session-id
>> 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
>> process : 10
>> 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
>> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
>> config null
>> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
>> org.apache.hadoop.mapred.**FileOutputCommitter
>> 12/06/13 12:03:29 WARN mapreduce.Counters: Group
>> org.apache.hadoop.mapred.Task$**Counter is deprecated. Use
>> org.apache.hadoop.mapreduce.**TaskCounter instead
>> 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
>> 0
>> 12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.**LinuxResourceCalculatorPlugin@**3d46e381
>> 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
>> MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
>> name and  BYTES_READ as counter name instead
>> 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
>> 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
>> 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
>> 12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
>> samples.
>> 12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
>> 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
>> samples.
>>
>> Here is the content of yarn-site.xml:
>>
>> <configuration>
>>   <property>
>>     <name>yarn.nodemanager.aux-**services</name>
>>     <value>mapreduce.shuffle</**value>
>>   </property>
>>
>>   <property>
>>     <name>yarn.nodemanager.aux-**services.mapreduce.shuffle.**
>> class</name>
>>     <value>org.apache.hadoop.**mapred.ShuffleHandler</value>
>>   </property>
>>
>>   <property>
>>     <name>yarn.log-aggregation-**enable</name>
>>     <value>true</value>
>>   </property>
>>
>>   <property>
>>     <description>List of directories to store localized files in.</
>> description>
>>     <name>yarn.nodemanager.local-**dirs</name>
>>     <value>/disk/yarn/local</**value>
>>   </property>
>>
>>   <property>
>>     <description>Where to store container logs.</description>
>>     <name>yarn.nodemanager.log-**dirs</name>
>>     <value>/disk/yarn/logs</value>
>>   </property>
>>
>>   <property>
>>     <description>Where to aggregate logs to.</description>
>>     <name>yarn.nodemanager.remote-**app-log-dir</name>
>>     <value>/var/log/hadoop-yarn/**apps</value>
>>   </property>
>>
>>   <property>
>>     <description>Classpath for typical applications.</description>
>>      <name>yarn.application.**classpath</name>
>>      <value>
>>         $HADOOP_CONF_DIR,
>>         $HADOOP_COMMON_HOME/*,$HADOOP_**COMMON_HOME/lib/*,
>>         $HADOOP_HDFS_HOME/*,$HADOOP_**HDFS_HOME/lib/*,
>>         $HADOOP_MAPRED_HOME/*,$HADOOP_**MAPRED_HOME/lib/*,
>>         $YARN_HOME/*,$YARN_HOME/lib/*
>>      </value>
>>   </property>
>> <property>
>>         <name>yarn.resourcemanager.**resource-tracker.address</**name>
>>         <value>ihub-an-g1:8025</value>
>> </property>
>> <property>
>>         <name>yarn.resourcemanager.**address</name>
>>         <value>ihub-an-g1:8040</value>
>> </property>
>> <property>
>>         <name>yarn.resourcemanager.**scheduler.address</name>
>>         <value>ihub-an-g1:8030</value>
>> </property>
>> <property>
>>         <name>yarn.resourcemanager.**admin.address</name>
>>         <value>ihub-an-g1:8141</value>
>> </property>
>> <property>
>>         <name>yarn.resourcemanager.**webapp.address</name>
>>         <value>ihub-an-g1:8088</value>
>> </property>
>> <property>
>>         <name>mapreduce.jobhistory.**intermediate-done-dir</name>
>>         <value>/disk/mapred/**jobhistory/intermediate/done</**value>
>> </property>
>> <property>
>>         <name>>mapreduce.jobhistory.**done-dir</name>
>>         <value>/disk/mapred/**jobhistory/done</value>
>> </property>
>> </configuration>
>>
>> Can anyone tell me what is the problem over here? Appreciate your
>> help.
>> Thanks,
>> Anil Gupta
>>
>>
>>
>>
> --
> Marcos Luis Ortíz Valmaseda
>  Data Engineer&&  Sr. System Administrator at UCI
>  http://marcosluis2186.**posterous.com<http://marcosluis2186.posterous.com>
>  http://www.linkedin.com/in/**marcosluis2186<http://www.linkedin.com/in/marcosluis2186>
>  Twitter: @marcosluis2186
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>



-- 
Thanks & Regards,
Anil Gupta

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

Posted by Marcos Ortiz <ml...@uci.cu>.

According to the CDH 4 official documentation, you should install a 
JobHistory server for your MRv2 (YARN)
cluster.
https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster

How to configure the HistoryServer
https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3



On 06/13/2012 03:16 PM, anil gupta wrote:
> Hi All
>
> I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
> nodes in my cluster(2 Admin Node and 3 DN).
> My resourcemanager is up and running and showing that all three DN are
> running the nodemanager. HDFS is also working fine and showing 3 DN's.
>
> But when i fire the pi example job. It starts to run in Local mode.
> Here is the console output:
> sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
> examples.jar pi 10 1000000000
> Number of Maps  = 10
> Samples per Map = 1000000000
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Wrote input for Map #4
> Wrote input for Map #5
> Wrote input for Map #6
> Wrote input for Map #7
> Wrote input for Map #8
> Wrote input for Map #9
> Starting Job
> 12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
> Instead, use dfs.metrics.session-id
> 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
> process : 10
> 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
> config null
> 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
> org.apache.hadoop.mapred.FileOutputCommitter
> 12/06/13 12:03:29 WARN mapreduce.Counters: Group
> org.apache.hadoop.mapred.Task$Counter is deprecated. Use
> org.apache.hadoop.mapreduce.TaskCounter instead
> 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
> 0
> 12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
> 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
> MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
> name and  BYTES_READ as counter name instead
> 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
> 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
> 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
> 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
> 12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
> samples.
> 12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
> 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
> samples.
>
> Here is the content of yarn-site.xml:
>
> <configuration>
>    <property>
>      <name>yarn.nodemanager.aux-services</name>
>      <value>mapreduce.shuffle</value>
>    </property>
>
>    <property>
>      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>    </property>
>
>    <property>
>      <name>yarn.log-aggregation-enable</name>
>      <value>true</value>
>    </property>
>
>    <property>
>      <description>List of directories to store localized files in.</
> description>
>      <name>yarn.nodemanager.local-dirs</name>
>      <value>/disk/yarn/local</value>
>    </property>
>
>    <property>
>      <description>Where to store container logs.</description>
>      <name>yarn.nodemanager.log-dirs</name>
>      <value>/disk/yarn/logs</value>
>    </property>
>
>    <property>
>      <description>Where to aggregate logs to.</description>
>      <name>yarn.nodemanager.remote-app-log-dir</name>
>      <value>/var/log/hadoop-yarn/apps</value>
>    </property>
>
>    <property>
>      <description>Classpath for typical applications.</description>
>       <name>yarn.application.classpath</name>
>       <value>
>          $HADOOP_CONF_DIR,
>          $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
>          $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
>          $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
>          $YARN_HOME/*,$YARN_HOME/lib/*
>       </value>
>    </property>
> <property>
>          <name>yarn.resourcemanager.resource-tracker.address</name>
>          <value>ihub-an-g1:8025</value>
> </property>
> <property>
>          <name>yarn.resourcemanager.address</name>
>          <value>ihub-an-g1:8040</value>
> </property>
> <property>
>          <name>yarn.resourcemanager.scheduler.address</name>
>          <value>ihub-an-g1:8030</value>
> </property>
> <property>
>          <name>yarn.resourcemanager.admin.address</name>
>          <value>ihub-an-g1:8141</value>
> </property>
> <property>
>          <name>yarn.resourcemanager.webapp.address</name>
>          <value>ihub-an-g1:8088</value>
> </property>
> <property>
>          <name>mapreduce.jobhistory.intermediate-done-dir</name>
>          <value>/disk/mapred/jobhistory/intermediate/done</value>
> </property>
> <property>
>          <name>>mapreduce.jobhistory.done-dir</name>
>          <value>/disk/mapred/jobhistory/done</value>
> </property>
> </configuration>
>
> Can anyone tell me what is the problem over here? Appreciate your
> help.
> Thanks,
> Anil Gupta
>
>
>

-- 
Marcos Luis Ortíz Valmaseda
  Data Engineer&&  Sr. System Administrator at UCI
  http://marcosluis2186.posterous.com
  http://www.linkedin.com/in/marcosluis2186
  Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci