You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by da...@internode.on.net on 2008/09/11 09:20:47 UTC
hadoop hanging (probably misconfiguration) assistance
Hi All,
I have been trying to move from pseudo distributed hadoop cluster which worked perfectly well, to a real hadoop cluster. I was able to execute
the wordcount example on my pseudo cluster but my real cluster hangs at this point:
# bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2
08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process : 2
08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
08/09/10 17:10:32 INFO mapred.JobClient: map 0% reduce 0%
The machines are doing nothing ie all processes at 0.0%
I have changed the configuration a couple of times to see where the issue lies. Currently I have 2 machines in the cluster the namenode and
the jobtracker one one machine with the datanode on a separate machine.
I have moved from named nodes to ip addresses with negligible improvement.
The only errors in the logfiles are regarding flushing for log4j so I did not consider that to be relevant.
If anyone has seen this or has any ideas where I might find the source of my issues I would be grateful.
Regards
Damien
# cat hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.task.timeout</name>
<value>6000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://10.7.3.164:54130/</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.logfile.size</name>
<value>1000000</value>
</property>
<property>
<name>hadoop.logfile.count</name>
<value>2</value>
</property>
<property>
<name>io.sort.mb</name>
<value>25</value>
</property>
<property>
<name>dfs.block.size</name>
<value>8388608</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>5</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>10.7.3.164:54131</value>
</property>
<property>
<name>mapred.job.tracker.handler.count</name>
<value>3</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx128m</value>
</property>
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.submit.replication</name>
<value>1</value>
</property>
<property>
<name>tasktracker.http.threads</name>
<value>4</value>
</property>
</configuration>
Re: hadoop hanging (probably misconfiguration) assistance
Posted by Amar Kamat <am...@yahoo-inc.com>.
Shengkai Zhu wrote:
> Logs may probably tell what happened.
>
> On Thu, Sep 11, 2008 at 3:20 PM, <da...@internode.on.net> wrote:
>
>
>> Hi All,
>> I have been trying to move from pseudo distributed hadoop cluster which
>> worked perfectly well, to a real hadoop cluster. I was able to execute
>> the wordcount example on my pseudo cluster but my real cluster hangs at
>> this point:
>>
>> # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
>> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
>> : 2
>> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
>> : 2
>> 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
>> 08/09/10 17:10:32 INFO mapred.JobClient: map 0% reduce 0%
>>
>>
One possibility is that the task-trackers are not up or are missing or
have crashed. In such a case the job waits at 0% forever as there is no
one to process.
As Shengkai mentioned, plz check the job-tracker/task-tracker logs.
Task-trackers are slaves in Hadoop and hadoop-home-dir/conf/slaves is
the place to declare a node as slave.
Amar
>> The machines are doing nothing ie all processes at 0.0%
>>
>> I have changed the configuration a couple of times to see where the issue
>> lies. Currently I have 2 machines in the cluster the namenode and
>> the jobtracker one one machine with the datanode on a separate machine.
>>
>> I have moved from named nodes to ip addresses with negligible improvement.
>> The only errors in the logfiles are regarding flushing for log4j so I did
>> not consider that to be relevant.
>>
>> If anyone has seen this or has any ideas where I might find the source of
>> my issues I would be grateful.
>>
>> Regards
>> Damien
>>
>> # cat hadoop-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>>
>> <property>
>> <name>mapred.task.timeout</name>
>> <value>6000</value>
>> <description>The number of milliseconds before a task will be
>> terminated if it neither reads an input, writes an output, nor
>> updates its status string.
>> </description>
>> </property>
>>
>> <property>
>> <name>fs.default.name</name>
>> <value>hdfs://10.7.3.164:54130/</value>
>> </property>
>>
>> <property>
>> <name>dfs.replication</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>hadoop.logfile.size</name>
>> <value>1000000</value>
>> </property>
>>
>> <property>
>> <name>hadoop.logfile.count</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>io.sort.mb</name>
>> <value>25</value>
>> </property>
>>
>> <property>
>> <name>dfs.block.size</name>
>> <value>8388608</value>
>> </property>
>>
>> <property>
>> <name>dfs.namenode.handler.count</name>
>> <value>5</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker</name>
>> <value>10.7.3.164:54131</value>
>> </property>
>>
>> <property>
>> <name>mapred.job.tracker.handler.count</name>
>> <value>3</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.map.tasks.maximum</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.tasktracker.reduce.tasks.maximum</name>
>> <value>2</value>
>> </property>
>>
>> <property>
>> <name>mapred.child.java.opts</name>
>> <value>-Xmx128m</value>
>> </property>
>>
>> <property>
>> <name>mapred.map.tasks.speculative.execution</name>
>> <value>false</value>
>> </property>
>>
>> <property>
>> <name>mapred.reduce.tasks.speculative.execution</name>
>> <value>false</value>
>> </property>
>>
>> <property>
>> <name>mapred.submit.replication</name>
>> <value>1</value>
>> </property>
>>
>> <property>
>> <name>tasktracker.http.threads</name>
>> <value>4</value>
>> </property>
>>
>> </configuration>
>>
>>
>>
>>
>
>
>
Re: hadoop hanging (probably misconfiguration) assistance
Posted by Shengkai Zhu <ge...@gmail.com>.
Logs may probably tell what happened.
On Thu, Sep 11, 2008 at 3:20 PM, <da...@internode.on.net> wrote:
> Hi All,
> I have been trying to move from pseudo distributed hadoop cluster which
> worked perfectly well, to a real hadoop cluster. I was able to execute
> the wordcount example on my pseudo cluster but my real cluster hangs at
> this point:
>
> # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
> 08/09/10 17:10:32 INFO mapred.JobClient: map 0% reduce 0%
>
> The machines are doing nothing ie all processes at 0.0%
>
> I have changed the configuration a couple of times to see where the issue
> lies. Currently I have 2 machines in the cluster the namenode and
> the jobtracker one one machine with the datanode on a separate machine.
>
> I have moved from named nodes to ip addresses with negligible improvement.
> The only errors in the logfiles are regarding flushing for log4j so I did
> not consider that to be relevant.
>
> If anyone has seen this or has any ideas where I might find the source of
> my issues I would be grateful.
>
> Regards
> Damien
>
> # cat hadoop-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> <property>
> <name>mapred.task.timeout</name>
> <value>6000</value>
> <description>The number of milliseconds before a task will be
> terminated if it neither reads an input, writes an output, nor
> updates its status string.
> </description>
> </property>
>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://10.7.3.164:54130/</value>
> </property>
>
> <property>
> <name>dfs.replication</name>
> <value>1</value>
> </property>
>
> <property>
> <name>hadoop.logfile.size</name>
> <value>1000000</value>
> </property>
>
> <property>
> <name>hadoop.logfile.count</name>
> <value>2</value>
> </property>
>
> <property>
> <name>io.sort.mb</name>
> <value>25</value>
> </property>
>
> <property>
> <name>dfs.block.size</name>
> <value>8388608</value>
> </property>
>
> <property>
> <name>dfs.namenode.handler.count</name>
> <value>5</value>
> </property>
>
> <property>
> <name>mapred.job.tracker</name>
> <value>10.7.3.164:54131</value>
> </property>
>
> <property>
> <name>mapred.job.tracker.handler.count</name>
> <value>3</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>2</value>
> </property>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx128m</value>
> </property>
>
> <property>
> <name>mapred.map.tasks.speculative.execution</name>
> <value>false</value>
> </property>
>
> <property>
> <name>mapred.reduce.tasks.speculative.execution</name>
> <value>false</value>
> </property>
>
> <property>
> <name>mapred.submit.replication</name>
> <value>1</value>
> </property>
>
> <property>
> <name>tasktracker.http.threads</name>
> <value>4</value>
> </property>
>
> </configuration>
>
>
>
--
朱盛凯
Jash Zhu
复旦大学软件学院
Software School, Fudan University