You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Richard Crowley <ri...@opendns.com> on 2008/04/30 20:48:33 UTC
One-node cluster with DFS on Debian
I have failed to successfully run the grep program from the examples JAR
on a one-node cluster running on localhost. I can successfully run
the same command with a blank conf/hadoop-site.xml (meaning no DFS), so
Hadoop itself seems to work.
Here is the conf/hadoop-site.xml I'm using (from
http://wiki.apache.org/hadoop/GettingStartedWithHadoop):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>localhost:54310</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>dfs.replication</name>
<value>8</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512m</value>
</property>
</configuration>
My machine is running Debian Etch and I have tried using both Sun Java
1.5 and 1.6. (1.6 from etch-backports.) For 1.5, my conf/hadoop-env.sh
file contained the line "". For 1.6, this line was "export
JAVA_HOME=/usr/lib/jvm/java-6-sun".
To startup and initialize the DFS, I've run these commands:
hadoop/bin/hadoop namenode -format
hadoop/bin/start-all.sh
hadoop/bin/hadoop dfs -mkdir input
for i in $(ls input/); do
echo "input/$i"
hadoop/bin/hadoop dfs -put input/$i input/
done
hadoop/bin/hadoop jar hadoop/hadoop-0.16.3-examples.jar grep input
output 'dfs[a-z.]+'
The last line there generates output before hanging indefinitely. After
it hangs, load on the machine is 0 and no Java processes ever appear to
do any work. I have left it in this state for over 25 minutes with no
change. Output of above commands:
08/04/30 18:33:23 INFO mapred.FileInputFormat: Total input paths to
process : 2
08/04/30 18:33:24 INFO mapred.JobClient: Running job: job_200804301832_0001
08/04/30 18:33:25 INFO mapred.JobClient: map 0% reduce 0%
08/04/30 18:33:30 INFO mapred.JobClient: map 66% reduce 0%
08/04/30 18:33:32 INFO mapred.JobClient: map 100% reduce 0%
I've perused the logs/ directory and only see two things out of place.
In logs/hadoop-rcrowley-secondarynamenode-dev16.sfo.log, there are stack
traces of ConnectExceptions that look like this:
2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary:
Exceptio
n in doCheckpoint:
2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary:
java.net
.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:500)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.http.HttpClient.New(HttpClient.java:318)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC
onnection.java:792)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne
ction.java:733)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection
.java:658)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
nection.java:981)
at
org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
ava:149)
at
org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
ava:188)
at
org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.
java:244)
at
org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNod
e.java:309)
at
org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:22
2)
at java.lang.Thread.run(Thread.java:619)
I verified that I can in fact `ssh localhost` without a password.
The other strange thing in the logs is in
logs/hadoop-rcrowley-tasktracker-dev16.sfo.log, where the following line
is repeated thousands of times at the end of the file:
2008-04-30 18:36:40,317 INFO org.apache.hadoop.mapred.TaskTracker:
task_20080430
1832_0001_r_000000_0 0.0% reduce > copy >
Can anyone please help me get on my feet here?
Thanks,
Richard
richard@opendns.com
Re: One-node cluster with DFS on Debian
Posted by Steve Loughran <st...@apache.org>.
Richard Crowley wrote:
> Problem fixed. My machine's /etc/hostname file came without a
> fully-qualified domain name. Why does Hadoop (or perhaps just
> java.net.InetAddress) rely on reverse DNS lookups?
>
> Richard
Jave networking is a mess. There are some implicit assumptions "welll
managed network, static IPs, static proxy settings" that dont apply
everywhere...it takes testing on home boxes to bring these problems up,
like mine
"regression: Sf daemon longer works on my home machine due to networking
changes"
http://jira.smartfrog.org/jira/browse/SFOS-697
If I look at our code to fix this, we handled failure by falling back to
something else
try {
//this can still do a network reverse DNS lookup, and
hence fail
hostInetAddress = InetAddress.getLocalHost();
} catch (UnknownHostException e) {
//no, nothing there either
hostInetAddress = InetAddress.getByName(null);
}
Re: One-node cluster with DFS on Debian
Posted by Richard Crowley <ri...@opendns.com>.
Problem fixed. My machine's /etc/hostname file came without a
fully-qualified domain name. Why does Hadoop (or perhaps just
java.net.InetAddress) rely on reverse DNS lookups?
Richard
Richard Crowley wrote:
> I have failed to successfully run the grep program from the examples JAR
> on a one-node cluster running on localhost. I can successfully run
> the same command with a blank conf/hadoop-site.xml (meaning no DFS), so
> Hadoop itself seems to work.
>
> Here is the conf/hadoop-site.xml I'm using (from
> http://wiki.apache.org/hadoop/GettingStartedWithHadoop):
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/tmp/hadoop-${user.name}</value>
> </property>
> <property>
> <name>fs.default.name</name>
> <value>localhost:54310</value>
> </property>
> <property>
> <name>mapred.job.tracker</name>
> <value>localhost:54311</value>
> </property>
> <property>
> <name>dfs.replication</name>
> <value>8</value>
> </property>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx512m</value>
> </property>
> </configuration>
>
> My machine is running Debian Etch and I have tried using both Sun Java
> 1.5 and 1.6. (1.6 from etch-backports.) For 1.5, my conf/hadoop-env.sh
> file contained the line "". For 1.6, this line was "export
> JAVA_HOME=/usr/lib/jvm/java-6-sun".
>
> To startup and initialize the DFS, I've run these commands:
>
> hadoop/bin/hadoop namenode -format
> hadoop/bin/start-all.sh
> hadoop/bin/hadoop dfs -mkdir input
> for i in $(ls input/); do
> echo "input/$i"
> hadoop/bin/hadoop dfs -put input/$i input/
> done
> hadoop/bin/hadoop jar hadoop/hadoop-0.16.3-examples.jar grep input
> output 'dfs[a-z.]+'
>
> The last line there generates output before hanging indefinitely. After
> it hangs, load on the machine is 0 and no Java processes ever appear to
> do any work. I have left it in this state for over 25 minutes with no
> change. Output of above commands:
>
> 08/04/30 18:33:23 INFO mapred.FileInputFormat: Total input paths to
> process : 2
> 08/04/30 18:33:24 INFO mapred.JobClient: Running job: job_200804301832_0001
> 08/04/30 18:33:25 INFO mapred.JobClient: map 0% reduce 0%
> 08/04/30 18:33:30 INFO mapred.JobClient: map 66% reduce 0%
> 08/04/30 18:33:32 INFO mapred.JobClient: map 100% reduce 0%
>
> I've perused the logs/ directory and only see two things out of place.
> In logs/hadoop-rcrowley-secondarynamenode-dev16.sfo.log, there are stack
> traces of ConnectExceptions that look like this:
>
> 2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary:
> Exceptio
> n in doCheckpoint:
> 2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary:
> java.net
> .ConnectException: Connection timed out
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> at java.net.Socket.connect(Socket.java:519)
> at java.net.Socket.connect(Socket.java:469)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:500)
> at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
> at sun.net.www.http.HttpClient.New(HttpClient.java:306)
> at sun.net.www.http.HttpClient.New(HttpClient.java:318)
> at
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC
> onnection.java:792)
> at
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne
> ction.java:733)
> at
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection
> .java:658)
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
> nection.java:981)
> at
> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
> ava:149)
> at
> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
> ava:188)
> at
> org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.
> java:244)
> at
> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNod
> e.java:309)
> at
> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:22
> 2)
> at java.lang.Thread.run(Thread.java:619)
>
> I verified that I can in fact `ssh localhost` without a password.
>
> The other strange thing in the logs is in
> logs/hadoop-rcrowley-tasktracker-dev16.sfo.log, where the following line
> is repeated thousands of times at the end of the file:
>
> 2008-04-30 18:36:40,317 INFO org.apache.hadoop.mapred.TaskTracker:
> task_20080430
> 1832_0001_r_000000_0 0.0% reduce > copy >
>
> Can anyone please help me get on my feet here?
>
> Thanks,
>
> Richard
> richard@opendns.com
>