You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Richard Crowley <ri...@opendns.com> on 2008/04/30 20:48:33 UTC

One-node cluster with DFS on Debian

I have failed to successfully run the grep program from the examples JAR 
   on a one-node cluster running on localhost.  I can successfully run 
the same command with a blank conf/hadoop-site.xml (meaning no DFS), so 
Hadoop itself seems to work.

Here is the conf/hadoop-site.xml I'm using (from 
http://wiki.apache.org/hadoop/GettingStartedWithHadoop):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/tmp/hadoop-${user.name}</value>
	</property>
	<property>
		<name>fs.default.name</name>
		<value>localhost:54310</value>
	</property>
	<property>
		<name>mapred.job.tracker</name>
		<value>localhost:54311</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>8</value>
	</property>
	<property>
		<name>mapred.child.java.opts</name>
		<value>-Xmx512m</value>
	</property>
</configuration>

My machine is running Debian Etch and I have tried using both Sun Java 
1.5 and 1.6.  (1.6 from etch-backports.)  For 1.5, my conf/hadoop-env.sh 
file contained the line "".  For 1.6, this line was "export 
JAVA_HOME=/usr/lib/jvm/java-6-sun".

To startup and initialize the DFS, I've run these commands:

hadoop/bin/hadoop namenode -format
hadoop/bin/start-all.sh
hadoop/bin/hadoop dfs -mkdir input
for i in $(ls input/); do
	echo "input/$i"
	hadoop/bin/hadoop dfs -put input/$i input/
done
hadoop/bin/hadoop jar hadoop/hadoop-0.16.3-examples.jar grep input 
output 'dfs[a-z.]+'

The last line there generates output before hanging indefinitely.  After 
it hangs, load on the machine is 0 and no Java processes ever appear to 
do any work.  I have left it in this state for over 25 minutes with no 
change.  Output of above commands:

08/04/30 18:33:23 INFO mapred.FileInputFormat: Total input paths to 
process : 2
08/04/30 18:33:24 INFO mapred.JobClient: Running job: job_200804301832_0001
08/04/30 18:33:25 INFO mapred.JobClient:  map 0% reduce 0%
08/04/30 18:33:30 INFO mapred.JobClient:  map 66% reduce 0%
08/04/30 18:33:32 INFO mapred.JobClient:  map 100% reduce 0%

I've perused the logs/ directory and only see two things out of place. 
In logs/hadoop-rcrowley-secondarynamenode-dev16.sfo.log, there are stack 
traces of ConnectExceptions that look like this:

2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
Exceptio
n in doCheckpoint:
2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
java.net
.ConnectException: Connection timed out
         at java.net.PlainSocketImpl.socketConnect(Native Method)
         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
         at 
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
         at java.net.Socket.connect(Socket.java:519)
         at java.net.Socket.connect(Socket.java:469)
         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
         at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
         at sun.net.www.http.HttpClient.openServer(HttpClient.java:500)
         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
         at sun.net.www.http.HttpClient.New(HttpClient.java:318)
         at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC
onnection.java:792)
         at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne
ction.java:733)
         at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection
.java:658)
         at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
nection.java:981)
         at 
org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
ava:149)
         at 
org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
ava:188)
         at 
org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.
java:244)
         at 
org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNod
e.java:309)
         at 
org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:22
2)
         at java.lang.Thread.run(Thread.java:619)

I verified that I can in fact `ssh localhost` without a password.

The other strange thing in the logs is in 
logs/hadoop-rcrowley-tasktracker-dev16.sfo.log, where the following line 
is repeated thousands of times at the end of the file:

2008-04-30 18:36:40,317 INFO org.apache.hadoop.mapred.TaskTracker: 
task_20080430
1832_0001_r_000000_0 0.0% reduce > copy >

Can anyone please help me get on my feet here?

Thanks,

Richard
richard@opendns.com

Re: One-node cluster with DFS on Debian

Posted by Steve Loughran <st...@apache.org>.

Richard Crowley wrote:
> Problem fixed.  My machine's /etc/hostname file came without a 
> fully-qualified domain name.  Why does Hadoop (or perhaps just 
> java.net.InetAddress) rely on reverse DNS lookups?
> 
> Richard


Jave networking is a mess. There are some implicit assumptions "welll 
managed network, static IPs, static proxy settings" that dont apply 
everywhere...it takes testing on home boxes to bring these problems up, 
like mine

"regression: Sf daemon longer works on my home machine due to networking 
changes"
http://jira.smartfrog.org/jira/browse/SFOS-697

If I look at our code to fix this, we handled failure by falling back to 
something else

             try {
                 //this can still do a network reverse DNS lookup, and 
hence fail
                 hostInetAddress = InetAddress.getLocalHost();
             } catch (UnknownHostException e) {
                 //no, nothing there either
                 hostInetAddress = InetAddress.getByName(null);

             }

Re: One-node cluster with DFS on Debian

Posted by Richard Crowley <ri...@opendns.com>.

Problem fixed.  My machine's /etc/hostname file came without a 
fully-qualified domain name.  Why does Hadoop (or perhaps just 
java.net.InetAddress) rely on reverse DNS lookups?

Richard



Richard Crowley wrote:
> I have failed to successfully run the grep program from the examples JAR 
>   on a one-node cluster running on localhost.  I can successfully run 
> the same command with a blank conf/hadoop-site.xml (meaning no DFS), so 
> Hadoop itself seems to work.
> 
> Here is the conf/hadoop-site.xml I'm using (from 
> http://wiki.apache.org/hadoop/GettingStartedWithHadoop):
> 
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
>     <property>
>         <name>hadoop.tmp.dir</name>
>         <value>/tmp/hadoop-${user.name}</value>
>     </property>
>     <property>
>         <name>fs.default.name</name>
>         <value>localhost:54310</value>
>     </property>
>     <property>
>         <name>mapred.job.tracker</name>
>         <value>localhost:54311</value>
>     </property>
>     <property>
>         <name>dfs.replication</name>
>         <value>8</value>
>     </property>
>     <property>
>         <name>mapred.child.java.opts</name>
>         <value>-Xmx512m</value>
>     </property>
> </configuration>
> 
> My machine is running Debian Etch and I have tried using both Sun Java 
> 1.5 and 1.6.  (1.6 from etch-backports.)  For 1.5, my conf/hadoop-env.sh 
> file contained the line "".  For 1.6, this line was "export 
> JAVA_HOME=/usr/lib/jvm/java-6-sun".
> 
> To startup and initialize the DFS, I've run these commands:
> 
> hadoop/bin/hadoop namenode -format
> hadoop/bin/start-all.sh
> hadoop/bin/hadoop dfs -mkdir input
> for i in $(ls input/); do
>     echo "input/$i"
>     hadoop/bin/hadoop dfs -put input/$i input/
> done
> hadoop/bin/hadoop jar hadoop/hadoop-0.16.3-examples.jar grep input 
> output 'dfs[a-z.]+'
> 
> The last line there generates output before hanging indefinitely.  After 
> it hangs, load on the machine is 0 and no Java processes ever appear to 
> do any work.  I have left it in this state for over 25 minutes with no 
> change.  Output of above commands:
> 
> 08/04/30 18:33:23 INFO mapred.FileInputFormat: Total input paths to 
> process : 2
> 08/04/30 18:33:24 INFO mapred.JobClient: Running job: job_200804301832_0001
> 08/04/30 18:33:25 INFO mapred.JobClient:  map 0% reduce 0%
> 08/04/30 18:33:30 INFO mapred.JobClient:  map 66% reduce 0%
> 08/04/30 18:33:32 INFO mapred.JobClient:  map 100% reduce 0%
> 
> I've perused the logs/ directory and only see two things out of place. 
> In logs/hadoop-rcrowley-secondarynamenode-dev16.sfo.log, there are stack 
> traces of ConnectExceptions that look like this:
> 
> 2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
> Exceptio
> n in doCheckpoint:
> 2008-04-30 18:49:16,328 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
> java.net
> .ConnectException: Connection timed out
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>         at 
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>         at java.net.Socket.connect(Socket.java:519)
>         at java.net.Socket.connect(Socket.java:469)
>         at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:500)
>         at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>         at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>         at sun.net.www.http.HttpClient.New(HttpClient.java:318)
>         at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLC
> onnection.java:792)
>         at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConne
> ction.java:733)
>         at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection
> .java:658)
>         at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
> nection.java:981)
>         at 
> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
> ava:149)
>         at 
> org.apache.hadoop.dfs.TransferFsImage.getFileClient(TransferFsImage.j
> ava:188)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.getFSImage(SecondaryNameNode.
> java:244)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNod
> e.java:309)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:22
> 2)
>         at java.lang.Thread.run(Thread.java:619)
> 
> I verified that I can in fact `ssh localhost` without a password.
> 
> The other strange thing in the logs is in 
> logs/hadoop-rcrowley-tasktracker-dev16.sfo.log, where the following line 
> is repeated thousands of times at the end of the file:
> 
> 2008-04-30 18:36:40,317 INFO org.apache.hadoop.mapred.TaskTracker: 
> task_20080430
> 1832_0001_r_000000_0 0.0% reduce > copy >
> 
> Can anyone please help me get on my feet here?
> 
> Thanks,
> 
> Richard
> richard@opendns.com
>