You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by da...@internode.on.net on 2008/09/11 09:20:47 UTC

hadoop hanging (probably misconfiguration) assistance

Hi All,
I have been trying to move from pseudo distributed hadoop cluster which worked perfectly well, to a real hadoop cluster.  I was able to execute 
the wordcount example on my pseudo cluster but my real cluster hangs at this point:

# bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2
08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2
08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
08/09/10 17:10:32 INFO mapred.JobClient:  map 0% reduce 0%

The machines are doing nothing ie all processes at 0.0%

I have changed the configuration a couple of times to see where the issue lies.  Currently I have 2 machines in the cluster the namenode and 
the jobtracker one one machine with the datanode on a separate machine.

I have moved from named nodes to ip addresses with negligible improvement.
The only errors in the logfiles are regarding flushing for log4j so I did not consider that to be relevant.

If anyone has seen this or has any ideas where I might find the source of my issues I would be grateful. 

Regards
Damien

# cat hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
        <name>mapred.task.timeout</name>
        <value>6000</value>
  <description>The number of milliseconds before a task will be
  terminated if it neither reads an input, writes an output, nor
  updates its status string.
  </description>
</property>

<property>
    <name>fs.default.name</name>
    <value>hdfs://10.7.3.164:54130/</value>
  </property>

   <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

   <property>
    <name>hadoop.logfile.size</name>
    <value>1000000</value>
  </property>

   <property>
    <name>hadoop.logfile.count</name>
    <value>2</value>
  </property>

   <property>
    <name>io.sort.mb</name>
    <value>25</value>
  </property>

  <property>
    <name>dfs.block.size</name>
    <value>8388608</value>
  </property>

  <property>
    <name>dfs.namenode.handler.count</name>
    <value>5</value>
  </property>
 
  <property>
    <name>mapred.job.tracker</name>
    <value>10.7.3.164:54131</value>
  </property>

   <property>
    <name>mapred.job.tracker.handler.count</name>
    <value>3</value>
  </property>

  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>2</value>
  </property>

  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>2</value>
  </property>

   <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx128m</value>
  </property>

  <property>
    <name>mapred.map.tasks.speculative.execution</name>
    <value>false</value>
  </property>

  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>

  <property>
    <name>mapred.submit.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>tasktracker.http.threads</name>
    <value>4</value>
  </property>

</configuration>

Re: hadoop hanging (probably misconfiguration) assistance

Posted by Amar Kamat <am...@yahoo-inc.com>.

Shengkai Zhu wrote:
> Logs may probably tell what happened.
>
> On Thu, Sep 11, 2008 at 3:20 PM, <da...@internode.on.net> wrote:
>
>   
>> Hi All,
>> I have been trying to move from pseudo distributed hadoop cluster which
>> worked perfectly well, to a real hadoop cluster.  I was able to execute
>> the wordcount example on my pseudo cluster but my real cluster hangs at
>> this point:
>>
>> # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
>> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
>> : 2
>> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
>> : 2
>> 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
>> 08/09/10 17:10:32 INFO mapred.JobClient:  map 0% reduce 0%
>>
>>     
One possibility is that the task-trackers are not up or are missing or 
have crashed. In such a case the job waits at 0% forever as there is no 
one to process.
As Shengkai mentioned, plz check the job-tracker/task-tracker logs. 
Task-trackers are slaves in Hadoop and hadoop-home-dir/conf/slaves is 
the place to declare a node as slave.
Amar
>> The machines are doing nothing ie all processes at 0.0%
>>
>> I have changed the configuration a couple of times to see where the issue
>> lies.  Currently I have 2 machines in the cluster the namenode and
>> the jobtracker one one machine with the datanode on a separate machine.
>>
>> I have moved from named nodes to ip addresses with negligible improvement.
>> The only errors in the logfiles are regarding flushing for log4j so I did
>> not consider that to be relevant.
>>
>> If anyone has seen this or has any ideas where I might find the source of
>> my issues I would be grateful.
>>
>> Regards
>> Damien
>>
>> # cat hadoop-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>>
>>  <property>
>>        <name>mapred.task.timeout</name>
>>        <value>6000</value>
>>  <description>The number of milliseconds before a task will be
>>  terminated if it neither reads an input, writes an output, nor
>>  updates its status string.
>>  </description>
>> </property>
>>
>> <property>
>>    <name>fs.default.name</name>
>>    <value>hdfs://10.7.3.164:54130/</value>
>>  </property>
>>
>>   <property>
>>    <name>dfs.replication</name>
>>    <value>1</value>
>>  </property>
>>
>>   <property>
>>    <name>hadoop.logfile.size</name>
>>    <value>1000000</value>
>>  </property>
>>
>>   <property>
>>    <name>hadoop.logfile.count</name>
>>    <value>2</value>
>>  </property>
>>
>>   <property>
>>    <name>io.sort.mb</name>
>>    <value>25</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.block.size</name>
>>    <value>8388608</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.namenode.handler.count</name>
>>    <value>5</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.job.tracker</name>
>>    <value>10.7.3.164:54131</value>
>>  </property>
>>
>>   <property>
>>    <name>mapred.job.tracker.handler.count</name>
>>    <value>3</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.tasktracker.map.tasks.maximum</name>
>>    <value>2</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>    <value>2</value>
>>  </property>
>>
>>   <property>
>>    <name>mapred.child.java.opts</name>
>>    <value>-Xmx128m</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.map.tasks.speculative.execution</name>
>>    <value>false</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.reduce.tasks.speculative.execution</name>
>>    <value>false</value>
>>  </property>
>>
>>  <property>
>>    <name>mapred.submit.replication</name>
>>    <value>1</value>
>>  </property>
>>
>>  <property>
>>    <name>tasktracker.http.threads</name>
>>    <value>4</value>
>>  </property>
>>
>> </configuration>
>>
>>
>>
>>     
>
>
>

Re: hadoop hanging (probably misconfiguration) assistance

Posted by Shengkai Zhu <ge...@gmail.com>.

Logs may probably tell what happened.

On Thu, Sep 11, 2008 at 3:20 PM, <da...@internode.on.net> wrote:

> Hi All,
> I have been trying to move from pseudo distributed hadoop cluster which
> worked perfectly well, to a real hadoop cluster.  I was able to execute
> the wordcount example on my pseudo cluster but my real cluster hangs at
> this point:
>
> # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
> 08/09/10 17:10:32 INFO mapred.JobClient:  map 0% reduce 0%
>
> The machines are doing nothing ie all processes at 0.0%
>
> I have changed the configuration a couple of times to see where the issue
> lies.  Currently I have 2 machines in the cluster the namenode and
> the jobtracker one one machine with the datanode on a separate machine.
>
> I have moved from named nodes to ip addresses with negligible improvement.
> The only errors in the logfiles are regarding flushing for log4j so I did
> not consider that to be relevant.
>
> If anyone has seen this or has any ideas where I might find the source of
> my issues I would be grateful.
>
> Regards
> Damien
>
> # cat hadoop-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
>  <property>
>        <name>mapred.task.timeout</name>
>        <value>6000</value>
>  <description>The number of milliseconds before a task will be
>  terminated if it neither reads an input, writes an output, nor
>  updates its status string.
>  </description>
> </property>
>
> <property>
>    <name>fs.default.name</name>
>    <value>hdfs://10.7.3.164:54130/</value>
>  </property>
>
>   <property>
>    <name>dfs.replication</name>
>    <value>1</value>
>  </property>
>
>   <property>
>    <name>hadoop.logfile.size</name>
>    <value>1000000</value>
>  </property>
>
>   <property>
>    <name>hadoop.logfile.count</name>
>    <value>2</value>
>  </property>
>
>   <property>
>    <name>io.sort.mb</name>
>    <value>25</value>
>  </property>
>
>  <property>
>    <name>dfs.block.size</name>
>    <value>8388608</value>
>  </property>
>
>  <property>
>    <name>dfs.namenode.handler.count</name>
>    <value>5</value>
>  </property>
>
>  <property>
>    <name>mapred.job.tracker</name>
>    <value>10.7.3.164:54131</value>
>  </property>
>
>   <property>
>    <name>mapred.job.tracker.handler.count</name>
>    <value>3</value>
>  </property>
>
>  <property>
>    <name>mapred.tasktracker.map.tasks.maximum</name>
>    <value>2</value>
>  </property>
>
>  <property>
>    <name>mapred.tasktracker.reduce.tasks.maximum</name>
>    <value>2</value>
>  </property>
>
>   <property>
>    <name>mapred.child.java.opts</name>
>    <value>-Xmx128m</value>
>  </property>
>
>  <property>
>    <name>mapred.map.tasks.speculative.execution</name>
>    <value>false</value>
>  </property>
>
>  <property>
>    <name>mapred.reduce.tasks.speculative.execution</name>
>    <value>false</value>
>  </property>
>
>  <property>
>    <name>mapred.submit.replication</name>
>    <value>1</value>
>  </property>
>
>  <property>
>    <name>tasktracker.http.threads</name>
>    <value>4</value>
>  </property>
>
> </configuration>
>
>
>


-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University