You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jon Lederman <jo...@gmail.com> on 2010/12/31 18:00:48 UTC

HDFS FS Commands Hanging System

Hi All,

I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode.  I have been successful in getting SSH configured.  I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs.  So it appears to me that some communication is not occurring properly.  Does anyone have any suggestions what I look into in order to fix this problem?

Thanks in advance.

-Jon

Re: HDFS FS Commands Hanging System

Posted by Harsh J <qw...@gmail.com>.

If you're using Java version "1.6.0_18", avoid it and switch to a more
recent release.
For information on why, check http://wiki.apache.org/hadoop/HadoopJavaVersions

Although I don't think that it could be the real reason behind the
issue here, it may be good to avoid that particular release before
progressing deeper.

On Fri, Dec 31, 2010 at 10:30 PM, Jon Lederman <jo...@gmail.com> wrote:
> Hi All,
>
> I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode.  I have been successful in getting SSH configured.  I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps.
>
> However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs.  So it appears to me that some communication is not occurring properly.  Does anyone have any suggestions what I look into in order to fix this problem?
>
> Thanks in advance.
>
> -Jon

-- 
Harsh J
www.harshj.com

Re: HDFS FS Commands Hanging System

Posted by Hari Sreekumar <hs...@clickable.com>.

Could this be a java/OS issue? Which java and OS are you using?

Hari

On Sunday, January 2, 2011, Jon Lederman <jo...@gmail.com> wrote:
> Hi Esteban,
>
> Thanks.  Can you tell me how I can check whether my node can resolve the host name?  I don't know precisely how to do that.
>
> When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> I get:
>
> # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
>         at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
>         at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)
>
> 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root
> 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root
> 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is60000ms.
> 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0
> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1
>
> Then the system hangs and does not return.
>
> My core-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>      <property>
>          <name>fs.default.name</name>
>          <value>hdfs://localhost:9000</value>
>      </property>
> </configuration>
>
>
> My hdfs-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>      <property>
>          <name>dfs.replication</name>
>          <value>1</value>
>      </property>
> </configuration>
>
>
> My mapred-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>      <property>
>          <name>mapred.job.tracker</name>
>          <value>localhost:9001</value>
>      </property>
> </configuration>
>
> My masters and slaves files both indicate: localhost
>
> Thanks for your help.  I really appreciate this.
>
> -Jon
> On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:
>
>> Hello Jon,
>>
>> Could you please verify that your node can resolve the host name?
>>
>> It would be helpful too if you can attach your configuration files and the
>> output of:
>>
>> HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
>>
>> as Todd suggested.
>>
>> Cheers,
>> esteban
>> On Jan 1, 2011 2:01 PM, "Jon Lederman" <jo...@gmail.com> wrote:
>>> Hi,
>>>
>>> Still no luck in getting FS commands to work. I did take a look at the
>> logs. They all look pretty clean with the following exceptions: The DataNode
>> appears to start up fine. However, the NameNode reports that the Network
>> Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
>> namenode cannot talk to the datanode? Any thoughts on what might be wrong?
>>>
>>> Thanks in advance and happy new year.
>>>
>>> -Jon
>>> 2011-01-01 19:45:27,197 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting DataNode
>>> STARTUP_MSG: host = localhost/127.0.0.1
>>> STARTUP_MSG: args = []
>>> STARTUP_MSG: version = 0.20.2
>>> STARTUP_MSG: build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled
>>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>> ************************************************************/
>>> sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
>>> 2011-01-01 19:45:23,988 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG: host = localhost/127.0.0.1
>>> STARTUP_MSG: args = []
>>> STARTUP_MSG: version = 0.20.2
>>> STARTUP_MSG: build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled
>>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>> ************************************************************/
>>> 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=
>>> NameNode, port=8020
>>> 2011-01-01 19:45:28,355 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> localhost.locald
>>> omain/127.0.0.1:8020
>>> 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processNa
>>> me=NameNode, sessionId=null
>>> 2011-01-01 19:45:28,492 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>> Name
>>> NodeMeterics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-01-01 19:45:29,758 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
>>> 2011-01-01 19:45:29,763 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>> 2011-01-01 19:45:29,770 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>>> 2011-01-01 19:45:29,965 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing
>>> FSNamesystemMetrics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>>> 2011-01-01 19:45:29,994 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatu
>>> sMBean
>>> 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files = 1
>>> 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files under construction
>>> = 0
>>> 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.commo

Re: HDFS FS Commands Hanging System

Posted by Esteban Gutierrez Moguel <es...@gmail.com>.

Hi Jon,

could you please restart your daemons? According to your firsts log files
NameNode binds to 127.0.0.1:8020 which is weird since your configuration
file and the output of netstat show the port 9000.

cheers,
esteban.



On Sun, Jan 2, 2011 at 18:02, Jon Lederman <jo...@gmail.com> wrote:

> Hi Esteban,
>
> Thanks for your response.
>
> I don't have the fuser executable installed on the environment I am running
> on.
>
> However, I do find the following:
>
> # jps
> 923 JobTracker
> 870 SecondaryNameNode
> 1188 Jps
> 794 DataNode
> 996 TaskTracker
> 727 NameNode
> # netstat -l
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 (null):sunrpc           (null):*                LISTEN
> tcp        0      0 (null):ssh              (null):*                LISTEN
> tcp        2      0 localhost.localdomain:9000 :::*
>  LISTEN
> tcp        0      0 localhost.localdomain:9001 :::*
>  LISTEN
> tcp        0      0 ::%989480:50060         :::*                    LISTEN
> tcp        0      0 ::%989704:50030         :::*                    LISTEN
> tcp        0      0 ::%989480:50070         :::*                    LISTEN
> tcp        0      0 ::%989480:telnet        :::*                    LISTEN
> udp        0      0 (null):sunrpc           (null):*
> Active UNIX domain sockets (only servers)
> Proto RefCnt Flags       Type       State         I-Node Path
> unix  2      [ ACC ]     STREAM     LISTENING       1281 @MONITOR_617_1
> #
>
> So, all of the daemons are running.  Please note the following out of my
> log files:
>
> When I look at the log files, the NameNode on startup indicates:
> Network topology has 0 racks and 0 datanodes
> Also, my DataNode startup log is suspiciously short indicating only.
>  /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG: host = localhost/127.0.0.1
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.2
> STARTUP_MSG: build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on F
> ri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> There is no attempt from the DataNode to communicate or otherwise establish
> communication with the NameNode.  It appears to me that the NameNode and
> DataNode aren't communicating, which may be the source of my problem.
>  However, i don't know why this would be or how I can debug it since I am
> not sure of the internal operation of Hadoop.
>
> Any thoughts on all of this.  Thanks in advance.
>
> -Jon
>
>

Re: HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi Esteban,

Thanks for your response.

I don't have the fuser executable installed on the environment I am running on.

However, I do find the following:

# jps
923 JobTracker
870 SecondaryNameNode
1188 Jps
794 DataNode
996 TaskTracker
727 NameNode
# netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 (null):sunrpc           (null):*                LISTEN      
tcp        0      0 (null):ssh              (null):*                LISTEN      
tcp        2      0 localhost.localdomain:9000 :::*                    LISTEN      
tcp        0      0 localhost.localdomain:9001 :::*                    LISTEN      
tcp        0      0 ::%989480:50060         :::*                    LISTEN      
tcp        0      0 ::%989704:50030         :::*                    LISTEN      
tcp        0      0 ::%989480:50070         :::*                    LISTEN      
tcp        0      0 ::%989480:telnet        :::*                    LISTEN      
udp        0      0 (null):sunrpc           (null):*                            
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING       1281 @MONITOR_617_1
# 

So, all of the daemons are running.  Please note the following out of my log files:

When I look at the log files, the NameNode on startup indicates:
Network topology has 0 racks and 0 datanodes
Also, my DataNode startup log is suspiciously short indicating only.  /************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
************************************************************/
There is no attempt from the DataNode to communicate or otherwise establish communication with the NameNode.  It appears to me that the NameNode and DataNode aren't communicating, which may be the source of my problem.  However, i don't know why this would be or how I can debug it since I am not sure of the internal operation of Hadoop.

Any thoughts on all of this.  Thanks in advance.

-Jon


On Jan 2, 2011, at 2:05 PM, Esteban Gutierrez Moguel wrote:

> Hi Jon,
> 
> I was able to reproduce your error by shutting down HDFS and setting up nc
> to listen connections in the same port (9000).
> 
> Could you please verify that the port 9000 is being used by the right
> process (NameNode)
> 
> PIDs for "fuser -n tcp 9000" and "jps | grep NameNode" should be the same.
> 
> esteban.
> 
> 
> On Sun, Jan 2, 2011 at 10:56, Jon Lederman <jo...@gmail.com> wrote:
> 
>> Hi Esteban,
>> 
>> Thanks.  Can you tell me how I can check whether my node can resolve the
>> host name?  I don't know precisely how to do that.
>> 
>> When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
>> I get:
>> 
>> # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
>> 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
>>       at
>> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
>>       at
>> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)
>> 
>> 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login:
>> root,root
>> 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login:
>> root,root
>> 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is60000ms.
>> 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
>> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
>> localhost/127.0.0.1:9000 from root sending #0
>> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
>> localhost/127.0.0.1:9000 from root: starting, having connections 1
>> 
>> Then the system hangs and does not return.
>> 
>> My core-site.xml file is as follows:
>> 
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>>    <property>
>>        <name>fs.default.name</name>
>>        <value>hdfs://localhost:9000</value>
>>    </property>
>> </configuration>
>> 
>> 
>> My hdfs-site.xml file is as follows:
>> 
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>>    <property>
>>        <name>dfs.replication</name>
>>        <value>1</value>
>>    </property>
>> </configuration>
>> 
>> 
>> My mapred-site.xml file is as follows:
>> 
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> 
>> <!-- Put site-specific property overrides in this file. -->
>> 
>> <configuration>
>>    <property>
>>        <name>mapred.job.tracker</name>
>>        <value>localhost:9001</value>
>>    </property>
>> </configuration>
>> 
>> My masters and slaves files both indicate: localhost
>> 
>> Thanks for your help.  I really appreciate this.
>> 
>> -Jon
>> On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:
>> 
>>> Hello Jon,
>>> 
>>> Could you please verify that your node can resolve the host name?
>>> 
>>> It would be helpful too if you can attach your configuration files and
>> the
>>> output of:
>>> 
>>> HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
>>> 
>>> as Todd suggested.
>>> 
>>> Cheers,
>>> esteban
>>> On Jan 1, 2011 2:01 PM, "Jon Lederman" <jo...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Still no luck in getting FS commands to work. I did take a look at the
>>> logs. They all look pretty clean with the following exceptions: The
>> DataNode
>>> appears to start up fine. However, the NameNode reports that the Network
>>> Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
>>> namenode cannot talk to the datanode? Any thoughts on what might be
>> wrong?
>>>> 
>>>> Thanks in advance and happy new year.
>>>> 
>>>> -Jon
>>>> 2011-01-01 19:45:27,197 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting DataNode
>>>> STARTUP_MSG: host = localhost/127.0.0.1
>>>> STARTUP_MSG: args = []
>>>> STARTUP_MSG: version = 0.20.2
>>>> STARTUP_MSG: build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>>> 911707; compiled
>>>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>>> ************************************************************/
>>>> sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
>>>> 2011-01-01 19:45:23,988 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG: host = localhost/127.0.0.1
>>>> STARTUP_MSG: args = []
>>>> STARTUP_MSG: version = 0.20.2
>>>> STARTUP_MSG: build =
>>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>>> 911707; compiled
>>>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>>> ************************************************************/
>>>> 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>> Initializing RPC Metrics with hostName=
>>>> NameNode, port=8020
>>>> 2011-01-01 19:45:28,355 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>> localhost.locald
>>>> omain/127.0.0.1:8020
>>>> 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>> Initializing JVM Metrics with processNa
>>>> me=NameNode, sessionId=null
>>>> 2011-01-01 19:45:28,492 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>> Initializing
>>> Name
>>>> NodeMeterics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-01-01 19:45:29,758 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
>>>> 2011-01-01 19:45:29,763 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> supergroup=supergroup
>>>> 2011-01-01 19:45:29,770 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isPermissionEnabled=true
>>>> 2011-01-01 19:45:29,965 INFO
>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>> Initializing
>>>> FSNamesystemMetrics using context
>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>> 2011-01-01 19:45:29,994 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStatu
>>>> sMBean
>>>> 2011-01-01 19:45:30,603 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>>> Number of files = 1
>>>> 2011-01-01 19:45:30,696 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>>> Number of files under construction
>>>> = 0
>>>> 2011-01-01 19:45:30,701 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>>> Image file of size 94 loaded in 0 s
>>>> econds.
>>>> 2011-01-01 19:45:30,708 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>>> Edits file /tmp/hadoop-root/dfs/nam
>>>> e/current/edits of size 4 edits # 0 loaded in 0 seconds.
>>>> 2011-01-01 19:45:30,767 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>>> Image file of size 94 saved in 0 se
>>>> conds.
>>>> 2011-01-01 19:45:30,924 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
>>> FSImage in
>>>> 1701 msecs
>>>> 2011-01-01 19:45:30,945 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
>> blocks
>>> = 0
>>>> 2011-01-01 19:45:30,948 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
>>> blocks = 0
>>>> 2011-01-01 19:45:30,958 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>> under-replicated b
>>>> locks = 0
>>>> 2011-01-01 19:45:30,963 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>> over-replicated b
>>>> locks = 0
>>>> 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>>> Leaving safe mode after 1 secs.
>>>> 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>>> Network topology has 0 racks and 0 dat
>>>> anodes
>>>> 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>>> UnderReplicatedBlocks has 0 blocks
>>>> 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
>>>> a org.mortbay.log.Slf4jLog
>>>> 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port
>>> returned by webServer.getConnectors()[0].
>>>> getLocalPort() before open() is -1. Opening the listener on 50070
>>>> 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer:
>>> listener.getLocalPort() returned 50070 webServ
>>>> er.getConnectors()[0].getLocalPort() returned 50070
>>>> 2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty
>>> bound to port 50070
>>>> 2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
>>>> sc-ssh-svr1 logs $
>>>> 
>>>> On Dec 31, 2010, at 4:28 PM, li ping wrote:
>>>> 
>>>>> I suggest you should look through the logs to see if there is any
>> error.
>>>>> And the second point that I need to point out is which node you run the
>>>>> command "hadoop fs -ls ". If you run the command on Node A, the
>>>>> configuration item "fs.default.name" should point to the HDFS.
>>>>> 
>>>>> On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi Michael,
>>>>>> 
>>>>>> Thanks for your response. It doesn't seem to be an issue with
>> safemode.
>>>>>> 
>>>>>> Even when I try the command dfsadmin -safemode get, the system hangs.
>> I
>>> am
>>>>>> unable to execute any FS shell commands other than hadoop fs -help.
>>>>>> 
>>>>>> I am wondering whether this an issue with communication between the
>>>>>> daemons? What should I be looking at there? Or could it be something
>>> else?
>>>>>> 
>>>>>> When I do jps, I do see all the daemons listed.
>>>>>> 
>>>>>> Any other thoughts.
>>>>>> 
>>>>>> Thanks again and happy new year.
>>>>>> 
>>>>>> -Jon
>>>>>> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>>>>>> 
>>>>>>> Try checking your dfs status
>>>>>>> 
>>>>>>> hadoop dfsadmin -safemode get
>>>>>>> 
>>>>>>> Probably says "ON"
>>>>>>> 
>>>>>>> hadoop dfsadmin -safemode leave
>>>>>>> 
>>>>>>> Somebody else can probably say how to make this happen every
>> reboot....
>>>>>>> 
>>>>>>> Michael D. Black
>>>>>>> Senior Scientist
>>>>>>> Advanced Analytics Directorate
>>>>>>> Northrop Grumman Information Systems
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> 
>>>>>>> From: Jon Lederman [mailto:jon2718@gmail.com]
>>>>>>> Sent: Fri 12/31/2010 11:00 AM
>>>>>>> To: common-user@hadoop.apache.org
>>>>>>> Subject: EXTERNAL:HDFS FS Commands Hanging System
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> I have been working on running Hadoop on a new microprocessor
>>>>>> architecture in pseudo-distributed mode. I have been successful in
>>> getting
>>>>>> SSH configured. I am also able to start a namenode, secondary
>> namenode,
>>>>>> tasktracker, jobtracker and datanode as evidenced by the response I
>> get
>>> from
>>>>>> jps.
>>>>>>> 
>>>>>>> However, when I attempt to interact with the file system in any way
>>> such
>>>>>> as the simple command hadoop fs -ls, the system hangs. So it appears
>> to
>>> me
>>>>>> that some communication is not occurring properly. Does anyone have
>> any
>>>>>> suggestions what I look into in order to fix this problem?
>>>>>>> 
>>>>>>> Thanks in advance.
>>>>>>> 
>>>>>>> -Jon
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> -----李平
>>>> 
>> 
>>

Re: HDFS FS Commands Hanging System

Posted by Esteban Gutierrez Moguel <es...@gmail.com>.

Hi Jon,

I was able to reproduce your error by shutting down HDFS and setting up nc
to listen connections in the same port (9000).

Could you please verify that the port 9000 is being used by the right
process (NameNode)

PIDs for "fuser -n tcp 9000" and "jps | grep NameNode" should be the same.

esteban.


On Sun, Jan 2, 2011 at 10:56, Jon Lederman <jo...@gmail.com> wrote:

> Hi Esteban,
>
> Thanks.  Can you tell me how I can check whether my node can resolve the
> host name?  I don't know precisely how to do that.
>
> When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> I get:
>
> # HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> 11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
>        at
> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
>        at
> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)
>
> 11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login:
> root,root
> 11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login:
> root,root
> 11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is60000ms.
> 11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
> localhost/127.0.0.1:9000 from root sending #0
> 11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to
> localhost/127.0.0.1:9000 from root: starting, having connections 1
>
> Then the system hangs and does not return.
>
> My core-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>     <property>
>         <name>fs.default.name</name>
>         <value>hdfs://localhost:9000</value>
>     </property>
> </configuration>
>
>
> My hdfs-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
> </configuration>
>
>
> My mapred-site.xml file is as follows:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>     <property>
>         <name>mapred.job.tracker</name>
>         <value>localhost:9001</value>
>     </property>
> </configuration>
>
> My masters and slaves files both indicate: localhost
>
> Thanks for your help.  I really appreciate this.
>
> -Jon
> On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:
>
> > Hello Jon,
> >
> > Could you please verify that your node can resolve the host name?
> >
> > It would be helpful too if you can attach your configuration files and
> the
> > output of:
> >
> > HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> >
> > as Todd suggested.
> >
> > Cheers,
> > esteban
> > On Jan 1, 2011 2:01 PM, "Jon Lederman" <jo...@gmail.com> wrote:
> >> Hi,
> >>
> >> Still no luck in getting FS commands to work. I did take a look at the
> > logs. They all look pretty clean with the following exceptions: The
> DataNode
> > appears to start up fine. However, the NameNode reports that the Network
> > Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
> > namenode cannot talk to the datanode? Any thoughts on what might be
> wrong?
> >>
> >> Thanks in advance and happy new year.
> >>
> >> -Jon
> >> 2011-01-01 19:45:27,197 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> >> /************************************************************
> >> STARTUP_MSG: Starting DataNode
> >> STARTUP_MSG: host = localhost/127.0.0.1
> >> STARTUP_MSG: args = []
> >> STARTUP_MSG: version = 0.20.2
> >> STARTUP_MSG: build =
> > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> > 911707; compiled
> >> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> >> ************************************************************/
> >> sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
> >> 2011-01-01 19:45:23,988 INFO
> > org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> >> /************************************************************
> >> STARTUP_MSG: Starting NameNode
> >> STARTUP_MSG: host = localhost/127.0.0.1
> >> STARTUP_MSG: args = []
> >> STARTUP_MSG: version = 0.20.2
> >> STARTUP_MSG: build =
> > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> > 911707; compiled
> >> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> >> ************************************************************/
> >> 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> > Initializing RPC Metrics with hostName=
> >> NameNode, port=8020
> >> 2011-01-01 19:45:28,355 INFO
> > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> > localhost.locald
> >> omain/127.0.0.1:8020
> >> 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > Initializing JVM Metrics with processNa
> >> me=NameNode, sessionId=null
> >> 2011-01-01 19:45:28,492 INFO
> > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> Initializing
> > Name
> >> NodeMeterics using context
> > object:org.apache.hadoop.metrics.spi.NullContext
> >> 2011-01-01 19:45:29,758 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
> >> 2011-01-01 19:45:29,763 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> supergroup=supergroup
> >> 2011-01-01 19:45:29,770 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > isPermissionEnabled=true
> >> 2011-01-01 19:45:29,965 INFO
> > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> > Initializing
> >> FSNamesystemMetrics using context
> > object:org.apache.hadoop.metrics.spi.NullContext
> >> 2011-01-01 19:45:29,994 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> > FSNamesystemStatu
> >> sMBean
> >> 2011-01-01 19:45:30,603 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Number of files = 1
> >> 2011-01-01 19:45:30,696 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Number of files under construction
> >> = 0
> >> 2011-01-01 19:45:30,701 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Image file of size 94 loaded in 0 s
> >> econds.
> >> 2011-01-01 19:45:30,708 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Edits file /tmp/hadoop-root/dfs/nam
> >> e/current/edits of size 4 edits # 0 loaded in 0 seconds.
> >> 2011-01-01 19:45:30,767 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Image file of size 94 saved in 0 se
> >> conds.
> >> 2011-01-01 19:45:30,924 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> > FSImage in
> >> 1701 msecs
> >> 2011-01-01 19:45:30,945 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of
> blocks
> > = 0
> >> 2011-01-01 19:45:30,948 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> > blocks = 0
> >> 2011-01-01 19:45:30,958 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > under-replicated b
> >> locks = 0
> >> 2011-01-01 19:45:30,963 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > over-replicated b
> >> locks = 0
> >> 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> > Leaving safe mode after 1 secs.
> >> 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> > Network topology has 0 racks and 0 dat
> >> anodes
> >> 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> > UnderReplicatedBlocks has 0 blocks
> >> 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to
> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
> >> a org.mortbay.log.Slf4jLog
> >> 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port
> > returned by webServer.getConnectors()[0].
> >> getLocalPort() before open() is -1. Opening the listener on 50070
> >> 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer:
> > listener.getLocalPort() returned 50070 webServ
> >> er.getConnectors()[0].getLocalPort() returned 50070
> >> 2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty
> > bound to port 50070
> >> 2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
> >> sc-ssh-svr1 logs $
> >>
> >> On Dec 31, 2010, at 4:28 PM, li ping wrote:
> >>
> >>> I suggest you should look through the logs to see if there is any
> error.
> >>> And the second point that I need to point out is which node you run the
> >>> command "hadoop fs -ls ". If you run the command on Node A, the
> >>> configuration item "fs.default.name" should point to the HDFS.
> >>>
> >>> On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com>
> wrote:
> >>>
> >>>> Hi Michael,
> >>>>
> >>>> Thanks for your response. It doesn't seem to be an issue with
> safemode.
> >>>>
> >>>> Even when I try the command dfsadmin -safemode get, the system hangs.
> I
> > am
> >>>> unable to execute any FS shell commands other than hadoop fs -help.
> >>>>
> >>>> I am wondering whether this an issue with communication between the
> >>>> daemons? What should I be looking at there? Or could it be something
> > else?
> >>>>
> >>>> When I do jps, I do see all the daemons listed.
> >>>>
> >>>> Any other thoughts.
> >>>>
> >>>> Thanks again and happy new year.
> >>>>
> >>>> -Jon
> >>>> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
> >>>>
> >>>>> Try checking your dfs status
> >>>>>
> >>>>> hadoop dfsadmin -safemode get
> >>>>>
> >>>>> Probably says "ON"
> >>>>>
> >>>>> hadoop dfsadmin -safemode leave
> >>>>>
> >>>>> Somebody else can probably say how to make this happen every
> reboot....
> >>>>>
> >>>>> Michael D. Black
> >>>>> Senior Scientist
> >>>>> Advanced Analytics Directorate
> >>>>> Northrop Grumman Information Systems
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>>
> >>>>> From: Jon Lederman [mailto:jon2718@gmail.com]
> >>>>> Sent: Fri 12/31/2010 11:00 AM
> >>>>> To: common-user@hadoop.apache.org
> >>>>> Subject: EXTERNAL:HDFS FS Commands Hanging System
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I have been working on running Hadoop on a new microprocessor
> >>>> architecture in pseudo-distributed mode. I have been successful in
> > getting
> >>>> SSH configured. I am also able to start a namenode, secondary
> namenode,
> >>>> tasktracker, jobtracker and datanode as evidenced by the response I
> get
> > from
> >>>> jps.
> >>>>>
> >>>>> However, when I attempt to interact with the file system in any way
> > such
> >>>> as the simple command hadoop fs -ls, the system hangs. So it appears
> to
> > me
> >>>> that some communication is not occurring properly. Does anyone have
> any
> >>>> suggestions what I look into in order to fix this problem?
> >>>>>
> >>>>> Thanks in advance.
> >>>>>
> >>>>> -Jon
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> -----李平
> >>
>
>

Re: HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi Esteban,

Thanks.  Can you tell me how I can check whether my node can resolve the host name?  I don't know precisely how to do that.

When I run HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
I get:

# HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
11/01/02 16:52:14 DEBUG conf.Configuration: java.io.IOException: config()
	at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:211)
	at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:198)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1880)

11/01/02 16:52:15 DEBUG security.UserGroupInformation: Unix Login: root,root
11/01/02 16:52:17 DEBUG security.UserGroupInformation: Unix Login: root,root
11/01/02 16:52:17 DEBUG ipc.Client: The ping interval is60000ms.
11/01/02 16:52:18 DEBUG ipc.Client: Connecting to localhost/127.0.0.1:9000
11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root sending #0
11/01/02 16:52:18 DEBUG ipc.Client: IPC Client (47) connection to localhost/127.0.0.1:9000 from root: starting, having connections 1

Then the system hangs and does not return.  

My core-site.xml file is as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>


My hdfs-site.xml file is as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>


My mapred-site.xml file is as follows:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

My masters and slaves files both indicate: localhost

Thanks for your help.  I really appreciate this.

-Jon
On Jan 2, 2011, at 8:47 AM, Esteban Gutierrez Moguel wrote:

> Hello Jon,
> 
> Could you please verify that your node can resolve the host name?
> 
> It would be helpful too if you can attach your configuration files and the
> output of:
> 
> HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /
> 
> as Todd suggested.
> 
> Cheers,
> esteban
> On Jan 1, 2011 2:01 PM, "Jon Lederman" <jo...@gmail.com> wrote:
>> Hi,
>> 
>> Still no luck in getting FS commands to work. I did take a look at the
> logs. They all look pretty clean with the following exceptions: The DataNode
> appears to start up fine. However, the NameNode reports that the Network
> Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
> namenode cannot talk to the datanode? Any thoughts on what might be wrong?
>> 
>> Thanks in advance and happy new year.
>> 
>> -Jon
>> 2011-01-01 19:45:27,197 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG: host = localhost/127.0.0.1
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.20.2
>> STARTUP_MSG: build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled
>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>> sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
>> 2011-01-01 19:45:23,988 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG: host = localhost/127.0.0.1
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.20.2
>> STARTUP_MSG: build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled
>> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>> 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=
>> NameNode, port=8020
>> 2011-01-01 19:45:28,355 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> localhost.locald
>> omain/127.0.0.1:8020
>> 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processNa
>> me=NameNode, sessionId=null
>> 2011-01-01 19:45:28,492 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
> Name
>> NodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-01 19:45:29,758 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
>> 2011-01-01 19:45:29,763 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-01-01 19:45:29,770 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
>> 2011-01-01 19:45:29,965 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing
>> FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-01 19:45:29,994 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatu
>> sMBean
>> 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 1
>> 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction
>> = 0
>> 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 loaded in 0 s
>> econds.
>> 2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /tmp/hadoop-root/dfs/nam
>> e/current/edits of size 4 edits # 0 loaded in 0 seconds.
>> 2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 saved in 0 se
>> conds.
>> 2011-01-01 19:45:30,924 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> FSImage in
>> 1701 msecs
>> 2011-01-01 19:45:30,945 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
> = 0
>> 2011-01-01 19:45:30,948 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> blocks = 0
>> 2011-01-01 19:45:30,958 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> under-replicated b
>> locks = 0
>> 2011-01-01 19:45:30,963 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> over-replicated b
>> locks = 0
>> 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Leaving safe mode after 1 secs.
>> 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Network topology has 0 racks and 0 dat
>> anodes
>> 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
>> 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
>> a org.mortbay.log.Slf4jLog
>> 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].
>> getLocalPort() before open() is -1. Opening the listener on 50070
>> 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50070 webServ
>> er.getConnectors()[0].getLocalPort() returned 50070
>> 2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 50070
>> 2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
>> sc-ssh-svr1 logs $
>> 
>> On Dec 31, 2010, at 4:28 PM, li ping wrote:
>> 
>>> I suggest you should look through the logs to see if there is any error.
>>> And the second point that I need to point out is which node you run the
>>> command "hadoop fs -ls ". If you run the command on Node A, the
>>> configuration item "fs.default.name" should point to the HDFS.
>>> 
>>> On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com> wrote:
>>> 
>>>> Hi Michael,
>>>> 
>>>> Thanks for your response. It doesn't seem to be an issue with safemode.
>>>> 
>>>> Even when I try the command dfsadmin -safemode get, the system hangs. I
> am
>>>> unable to execute any FS shell commands other than hadoop fs -help.
>>>> 
>>>> I am wondering whether this an issue with communication between the
>>>> daemons? What should I be looking at there? Or could it be something
> else?
>>>> 
>>>> When I do jps, I do see all the daemons listed.
>>>> 
>>>> Any other thoughts.
>>>> 
>>>> Thanks again and happy new year.
>>>> 
>>>> -Jon
>>>> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>>>> 
>>>>> Try checking your dfs status
>>>>> 
>>>>> hadoop dfsadmin -safemode get
>>>>> 
>>>>> Probably says "ON"
>>>>> 
>>>>> hadoop dfsadmin -safemode leave
>>>>> 
>>>>> Somebody else can probably say how to make this happen every reboot....
>>>>> 
>>>>> Michael D. Black
>>>>> Senior Scientist
>>>>> Advanced Analytics Directorate
>>>>> Northrop Grumman Information Systems
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> From: Jon Lederman [mailto:jon2718@gmail.com]
>>>>> Sent: Fri 12/31/2010 11:00 AM
>>>>> To: common-user@hadoop.apache.org
>>>>> Subject: EXTERNAL:HDFS FS Commands Hanging System
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I have been working on running Hadoop on a new microprocessor
>>>> architecture in pseudo-distributed mode. I have been successful in
> getting
>>>> SSH configured. I am also able to start a namenode, secondary namenode,
>>>> tasktracker, jobtracker and datanode as evidenced by the response I get
> from
>>>> jps.
>>>>> 
>>>>> However, when I attempt to interact with the file system in any way
> such
>>>> as the simple command hadoop fs -ls, the system hangs. So it appears to
> me
>>>> that some communication is not occurring properly. Does anyone have any
>>>> suggestions what I look into in order to fix this problem?
>>>>> 
>>>>> Thanks in advance.
>>>>> 
>>>>> -Jon
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> -----李平
>>

Re: HDFS FS Commands Hanging System

Posted by Esteban Gutierrez Moguel <es...@gmail.com>.

Hello Jon,

Could you please verify that your node can resolve the host name?

It would be helpful too if you can attach your configuration files and the
output of:

HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

as Todd suggested.

Cheers,
esteban
On Jan 1, 2011 2:01 PM, "Jon Lederman" <jo...@gmail.com> wrote:
> Hi,
>
> Still no luck in getting FS commands to work. I did take a look at the
logs. They all look pretty clean with the following exceptions: The DataNode
appears to start up fine. However, the NameNode reports that the Network
Topology has 0 racks and 0 datanodes. Is this normal? Is it possible the
namenode cannot talk to the datanode? Any thoughts on what might be wrong?
>
> Thanks in advance and happy new year.
>
> -Jon
> 2011-01-01 19:45:27,197 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG: host = localhost/127.0.0.1
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.2
> STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled
> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
> 2011-01-01 19:45:23,988 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG: host = localhost/127.0.0.1
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.20.2
> STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled
> by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=
> NameNode, port=8020
> 2011-01-01 19:45:28,355 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
localhost.locald
> omain/127.0.0.1:8020
> 2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processNa
> me=NameNode, sessionId=null
> 2011-01-01 19:45:28,492 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
Name
> NodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
> 2011-01-01 19:45:29,758 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
> 2011-01-01 19:45:29,763 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-01-01 19:45:29,770 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
> 2011-01-01 19:45:29,965 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing
> FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
> 2011-01-01 19:45:29,994 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatu
> sMBean
> 2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1
> 2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction
> = 0
> 2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 94 loaded in 0 s
> econds.
> 2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop-root/dfs/nam
> e/current/edits of size 4 edits # 0 loaded in 0 seconds.
> 2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 94 saved in 0 se
> conds.
> 2011-01-01 19:45:30,924 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in
> 1701 msecs
> 2011-01-01 19:45:30,945 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
= 0
> 2011-01-01 19:45:30,948 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
blocks = 0
> 2011-01-01 19:45:30,958 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
under-replicated b
> locks = 0
> 2011-01-01 19:45:30,963 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
over-replicated b
> locks = 0
> 2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Leaving safe mode after 1 secs.
> 2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE*
Network topology has 0 racks and 0 dat
> anodes
> 2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
> 2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
> a org.mortbay.log.Slf4jLog
> 2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].
> getLocalPort() before open() is -1. Opening the listener on 50070
> 2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50070 webServ
> er.getConnectors()[0].getLocalPort() returned 50070
> 2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50070
> 2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
> sc-ssh-svr1 logs $
>
> On Dec 31, 2010, at 4:28 PM, li ping wrote:
>
>> I suggest you should look through the logs to see if there is any error.
>> And the second point that I need to point out is which node you run the
>> command "hadoop fs -ls ". If you run the command on Node A, the
>> configuration item "fs.default.name" should point to the HDFS.
>>
>> On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com> wrote:
>>
>>> Hi Michael,
>>>
>>> Thanks for your response. It doesn't seem to be an issue with safemode.
>>>
>>> Even when I try the command dfsadmin -safemode get, the system hangs. I
am
>>> unable to execute any FS shell commands other than hadoop fs -help.
>>>
>>> I am wondering whether this an issue with communication between the
>>> daemons? What should I be looking at there? Or could it be something
else?
>>>
>>> When I do jps, I do see all the daemons listed.
>>>
>>> Any other thoughts.
>>>
>>> Thanks again and happy new year.
>>>
>>> -Jon
>>> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>>>
>>>> Try checking your dfs status
>>>>
>>>> hadoop dfsadmin -safemode get
>>>>
>>>> Probably says "ON"
>>>>
>>>> hadoop dfsadmin -safemode leave
>>>>
>>>> Somebody else can probably say how to make this happen every reboot....
>>>>
>>>> Michael D. Black
>>>> Senior Scientist
>>>> Advanced Analytics Directorate
>>>> Northrop Grumman Information Systems
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> From: Jon Lederman [mailto:jon2718@gmail.com]
>>>> Sent: Fri 12/31/2010 11:00 AM
>>>> To: common-user@hadoop.apache.org
>>>> Subject: EXTERNAL:HDFS FS Commands Hanging System
>>>>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> I have been working on running Hadoop on a new microprocessor
>>> architecture in pseudo-distributed mode. I have been successful in
getting
>>> SSH configured. I am also able to start a namenode, secondary namenode,
>>> tasktracker, jobtracker and datanode as evidenced by the response I get
from
>>> jps.
>>>>
>>>> However, when I attempt to interact with the file system in any way
such
>>> as the simple command hadoop fs -ls, the system hangs. So it appears to
me
>>> that some communication is not occurring properly. Does anyone have any
>>> suggestions what I look into in order to fix this problem?
>>>>
>>>> Thanks in advance.
>>>>
>>>> -Jon
>>>>
>>>
>>>
>>
>>
>> --
>> -----李平
>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Konstantin Boudnik <co...@apache.org>.

Another possibility to fix it is to install rng-tools which will allow
you to increase the amount of entropy in your system.
--
  Take care,
Konstantin (Cos) Boudnik



On Mon, Jan 3, 2011 at 16:48, Jon Lederman <jo...@gmail.com> wrote:
> Thanks.  Will try that.  One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random?  That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking?  I am trying to understand whether this is in fact the problem or whether there could be some other issue.
>
> If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity?
>
> Thanks in advance.
>
> -Jon
> On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote:
>
>> try
>>
>>   dd if=/dev/random bs=1 count=100 of=/dev/null
>>
>> This will likely hang for a long time.
>>
>> There is no way that I know of to change the behavior of /dev/random except
>> by changing the file itself to point to a different minor device.  That
>> would be very bad form.
>>
>> One think you may be able do is to pour lots of entropy into the system via
>> /dev/urandom.  I was not able to demonstrate this, though, when I just tried
>> that.  It would be nice if there were a config variable to set that would
>> change this behavior, but right now, a code change is required (AFAIK).
>>
>> Another thing to do is replace the use of SecureRandom with a version that
>> uses /dev/urandom.  That is the point of the code that I linked to.  It
>> provides a plugin replacement that will not block.
>>
>> On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <jo...@gmail.com> wrote:
>>
>>>
>>> Could you give me a bit more information on how I can overcome this issue.
>>> I am running Hadoop on an embedded processor and networking is turned off
>>> to the embedded processor. Is there a quick way to check whether this is in
>>> fact blocking on my system?  And, are there some variables or configuration
>>> options I can set to avoid any potential blocking behavior?
>>>
>>>
>
>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Ted Dunning <td...@maprtech.com>.

On Mon, Jan 3, 2011 at 4:48 PM, Jon Lederman <jo...@gmail.com> wrote:

> Thanks.  Will try that.  One final question, based on the jstack output I
> sent, is it obvious that the system is blocked due to the behavior of
> /dev/random?



I tried to send you a highlighted markup of your jstack output.

The key thing to look for is some thread reading bytes that nests from
SecureRandom.


> If I just let the FS command run (i.e., hadoop fs -ls), is there any
> guarantee it will eventually return in some relatively finite period of time
> such as hours, or could it potentially take days, weeks, years or eternity?
>
>
It depends on how quiet your machine is.  If it has stuff happening, then it
will unwedge eventually.

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Thanks.  Will try that.  One final question, based on the jstack output I sent, is it obvious that the system is blocked due to the behavior of /dev/random?  That is, can you enlighten me to the output I sent that explicitly or implicitly indicates the blocking?  I am trying to understand whether this is in fact the problem or whether there could be some other issue.  

If I just let the FS command run (i.e., hadoop fs -ls), is there any guarantee it will eventually return in some relatively finite period of time such as hours, or could it potentially take days, weeks, years or eternity?

Thanks in advance.

-Jon
On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote:

> try
> 
>   dd if=/dev/random bs=1 count=100 of=/dev/null
> 
> This will likely hang for a long time.
> 
> There is no way that I know of to change the behavior of /dev/random except
> by changing the file itself to point to a different minor device.  That
> would be very bad form.
> 
> One think you may be able do is to pour lots of entropy into the system via
> /dev/urandom.  I was not able to demonstrate this, though, when I just tried
> that.  It would be nice if there were a config variable to set that would
> change this behavior, but right now, a code change is required (AFAIK).
> 
> Another thing to do is replace the use of SecureRandom with a version that
> uses /dev/urandom.  That is the point of the code that I linked to.  It
> provides a plugin replacement that will not block.
> 
> On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <jo...@gmail.com> wrote:
> 
>> 
>> Could you give me a bit more information on how I can overcome this issue.
>> I am running Hadoop on an embedded processor and networking is turned off
>> to the embedded processor. Is there a quick way to check whether this is in
>> fact blocking on my system?  And, are there some variables or configuration
>> options I can set to avoid any potential blocking behavior?
>> 
>>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Ted Dunning <td...@maprtech.com>.

try

   dd if=/dev/random bs=1 count=100 of=/dev/null

This will likely hang for a long time.

There is no way that I know of to change the behavior of /dev/random except
by changing the file itself to point to a different minor device.  That
would be very bad form.

One think you may be able do is to pour lots of entropy into the system via
/dev/urandom.  I was not able to demonstrate this, though, when I just tried
that.  It would be nice if there were a config variable to set that would
change this behavior, but right now, a code change is required (AFAIK).

Another thing to do is replace the use of SecureRandom with a version that
uses /dev/urandom.  That is the point of the code that I linked to.  It
provides a plugin replacement that will not block.

On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman <jo...@gmail.com> wrote:

>
> Could you give me a bit more information on how I can overcome this issue.
>  I am running Hadoop on an embedded processor and networking is turned off
> to the embedded processor. Is there a quick way to check whether this is in
> fact blocking on my system?  And, are there some variables or configuration
> options I can set to avoid any potential blocking behavior?
>
>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi Ted,

Could you give me a bit more information on how I can overcome this issue.  I am running Hadoop on an embedded processor and networking is turned off to the embedded processor. Is there a quick way to check whether this is in fact blocking on my system?  And, are there some variables or configuration options I can set to avoid any potential blocking behavior?

Thanks.

-Jon
On Jan 3, 2011, at 3:48 PM, Ted Dunning wrote:

> Yes.  It is stuck as suggested.  See the bolded lines.
> 
> You can help avoid this by dumping additional entropy into the machine via
> network traffic.  According to the man page for /dev/random you can cheat by
> writing goo into /dev/urandom, but I have been unable to verify that by
> experiment.
> 
> Is it really necessary to use /dev/random here?  Again from the man page,
> there is a strong feeling in the community that only very long lived, high
> value keys really need to read from /dev/random.  Session keys from
> /dev/urandom are fine.
> 
> I wrote an adaptation of the secure seed generator that doesn't block for
> Mahout.  It is trivial, but might be useful to copy:
> http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java
> 
> 
> 
> On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman <jo...@gmail.com> wrote:
> 
>> I have attached the jstack <pid of namenode> output.  Does it appear to be
>> stuck in SecureRandom as you noted as a possibility?  I am not sure whether
>> this is indicated in the following output:
>> 
>> ...
>> 
> "main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
>>  java.lang.Thread.State: RUNNABLE
>> *        at java.io.FileInputStream.readBytes(Native Method)
>> *        at java.io.FileInputStream.read(FileInputStream.java:236)
>>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>       at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>       - locked <0x70e59ae8> (a java.io.BufferedInputStream)
>>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>>       at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>       - locked <0x70e59970> (a java.io.BufferedInputStream)
>>       at
>> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
>>       at
>> sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
>>       at
>> sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
>> *        at
>> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
>> *        at
>> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
>> 
>>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Ted Dunning <td...@maprtech.com>.

Yes.  It is stuck as suggested.  See the bolded lines.

You can help avoid this by dumping additional entropy into the machine via
network traffic.  According to the man page for /dev/random you can cheat by
writing goo into /dev/urandom, but I have been unable to verify that by
experiment.

Is it really necessary to use /dev/random here?  Again from the man page,
there is a strong feeling in the community that only very long lived, high
value keys really need to read from /dev/random.  Session keys from
/dev/urandom are fine.

I wrote an adaptation of the secure seed generator that doesn't block for
Mahout.  It is trivial, but might be useful to copy:
http://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/common/DevURandomSeedGenerator.java

On Mon, Jan 3, 2011 at 3:13 PM, Jon Lederman <jo...@gmail.com> wrote:

> I have attached the jstack <pid of namenode> output.  Does it appear to be
> stuck in SecureRandom as you noted as a possibility?  I am not sure whether
> this is indicated in the following output:
>
> ...
>
"main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
>   java.lang.Thread.State: RUNNABLE
> *        at java.io.FileInputStream.readBytes(Native Method)
> *        at java.io.FileInputStream.read(FileInputStream.java:236)
>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>        - locked <0x70e59ae8> (a java.io.BufferedInputStream)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>        - locked <0x70e59970> (a java.io.BufferedInputStream)
>        at
> sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
>        at
> sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
>        at
> sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
> *        at
> sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
> *        at
> sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
>
>

Re: Entropy Pool and HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Todd,

I have attached the jstack <pid of namenode> output.  Does it appear to be stuck in SecureRandom as you noted as a possibility?  I am not sure whether this is indicated in the following output:

sh-4.1# jps
4038 JobTracker
4160 Jps
3917 DataNode
4121 TaskTracker
3844 NameNode
3992 SecondaryNameNode

sh-4.1# jstack 3844
2011-01-03 15:07:01
Full thread dump OpenJDK Zero VM (14.0-b16 interpreted mode):
 
"Attach Listener" daemon prio=10 tid=0x0021a870 nid=0x106e waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE
 
"3299256@qtp0-1" prio=10 tid=0x6ff2cee8 nid=0x1039 in Object.wait() [0x6f2fe000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x7dcb46a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
        - locked <0x7dcb46a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
 
"15020576@qtp0-0" prio=10 tid=0x6ff2ddd8 nid=0x1038 in Object.wait() [0x6f47e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x7dcb4718> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
        - locked <0x7dcb4718> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
 
"org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor@955cd5" daemon prio=10 tid=0x6ff036f8 nid=0xffe waiting on condition [0x6f68e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
        at java.lang.Thread.run(Thread.java:636)
 
"org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor@25c828" daemon prio=10 tid=0x6ff02230 nid=0xff9 waiting on condition [0x6f80e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2327)
        at java.lang.Thread.run(Thread.java:636)
 
"org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@22ab57" daemon prio=10 tid=0x6ff00e00 nid=0xff8 waiting on condition [0x6f98e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:350)
        at java.lang.Thread.run(Thread.java:636)
 
"org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor@b1074a" daemon prio=10 tid=0x6ff009b0 nid=0xff7 waiting on condition [0x6fb0e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2309)
        at java.lang.Thread.run(Thread.java:636)
 
"org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor@165f738" daemon prio=10 tid=0x001f66e8 nid=0xff6 waiting on condition [0x6fc9e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186)
        at java.lang.Thread.run(Thread.java:636)
 
"Low Memory Detector" daemon prio=10 tid=0x000c09a8 nid=0xf50 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE
 
"Signal Dispatcher" daemon prio=10 tid=0x000bf1b8 nid=0xf4f runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE
 
"Finalizer" daemon prio=10 tid=0x000af298 nid=0xf48 in Object.wait() [0x7063e000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
        - locked <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
 
"Reference Handler" daemon prio=10 tid=0x000aaa08 nid=0xf47 in Object.wait() [0x707be000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x7daf8bc8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:502)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x7daf8bc8> (a java.lang.ref.Reference$Lock)
 
"main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:236)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x70e59ae8> (a java.io.BufferedInputStream)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x70e59970> (a java.io.BufferedInputStream)
        at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
        at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
        at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
        at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
        at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
        - locked <0x70e592c8> (a sun.security.provider.SecureRandom)
        at java.security.SecureRandom.nextBytes(SecureRandom.java:450)
        - locked <0x70e59870> (a java.security.SecureRandom)
        at java.security.SecureRandom.next(SecureRandom.java:472)
        at java.util.Random.nextLong(Random.java:299)
        at org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x70e4d418> (a java.lang.Object)
        at org.mortbay.jetty.servlet.AbstractSessionManager.doStart(AbstractSessionManager.java:168)
        at org.mortbay.jetty.servlet.HashSessionManager.doStart(HashSessionManager.java:67)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x7dcb3a70> (a java.lang.Object)
        at org.mortbay.jetty.servlet.SessionHandler.doStart(SessionHandler.java:115)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x7dcb34c0> (a java.lang.Object)
        at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
        at org.mortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:537)
        at org.mortbay.jetty.servlet.Context.startContext(Context.java:136)
        at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1234)
        at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
        at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:460)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x7dcb3490> (a java.lang.Object)
        at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
        at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x7dcb1038> (a java.lang.Object)
        at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
        at org.mortbay.jetty.Server.doStart(Server.java:222)
        at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
        - locked <0x7dc9c0f8> (a java.lang.Object)
        at org.apache.hadoop.http.HttpServer.start(HttpServer.java:461)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:246)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:202)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
 
"VM Thread" prio=10 tid=0x000a7ce8 nid=0xf45 runnable
 
"VM Periodic Task Thread" prio=10 tid=0x000c25a8 nid=0xf51 waiting on condition
 
JNI global references: 69
 
sh-4.1#
 
On Jan 2, 2011, at 6:39 PM, Todd Lipcon wrote:

> Hi Jon,
> 
> My guess is that your system's entropy pool runs dry. You can verify by
> grabbing a jstack <pid of namenode> output, and seeing if you're stuck in
> SecureRandom.
> 
> See this old thread:
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg02170.html
> 
> <http://www.mail-archive.com/common-user@hadoop.apache.org/msg02170.html>
> -Todd
> 
> On Sun, Jan 2, 2011 at 8:37 AM, Jon Lederman <jo...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I followed the example precisely.  It seems to me that the NameNode and
>> DataNode are not communicating.  I noticed that the log file for my DataNode
>> appears suspiciously short.  I believe it should try to connect to the
>> NameNode and report such progress.  The log for the DataNode simply shows:
>> 
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG:   host = localhost/127.0.0.1
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled by 'chrisdo' on F
>> ri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>> 
>> Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as
>> indicated in bold:
>> 
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = localhost/127.0.0.1
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.2
>> STARTUP_MSG:   build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled by 'chrisdo' on F
>> ri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>> 2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=NameNode, port=900
>> 0
>> 2011-01-02 16:30:35,093 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> localhost.localdomain/127.0.0.1:90
>> 00
>> 2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=NameNode, sessi
>> onId=null
>> 2011-01-02 16:30:35,196 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>> NameNodeMeterics using
>> context object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-02 16:30:37,022 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
>> 2011-01-02 16:30:37,029 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-01-02 16:30:37,032 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>> 2011-01-02 16:30:37,216 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetric
>> s using context object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-02 16:30:37,242 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files = 1
>> 2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files under construction = 0
>> 2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 94 loaded in 0 seconds.
>> 2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Edits file /tmp/hadoop-root/dfs/name/current/edits of
>> size 4 edits # 0 loaded in 0 seconds.
>> 2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 94 saved in 0 seconds.
>> 2011-01-02 16:30:38,104 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
>> FSImage in 1726 msecs
>> 2011-01-02 16:30:38,130 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
>> = 0
>> 2011-01-02 16:30:38,133 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
>> blocks = 0
>> 2011-01-02 16:30:38,136 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> under-replicated blocks = 0
>> 2011-01-02 16:30:38,139 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> over-replicated blocks = 0
>> 2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> Leaving safe mode after 1 secs.
>> 2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> Network topology has 0 racks and 0 datanodes
>> 2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> UnderReplicatedBlocks has 0 blocks
>> 2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.
>> Slf4jLog
>> 2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() bef
>> ore open() is -1. Opening the listener on 50070
>> 2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 50070 webServer.getConnectors()
>> [0].getLocalPort() returned 50070
>> 2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound
>> to port 50070
>> 2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14
>> 
>> What should I check to see whether there is communication?  Why should the
>> network topology as reported by the Namenode indicate 0 racks and 0
>> Datanodes?
>> 
>> Also, I am curious what should be in the masters and slaves files when
>> running in pseudo-distributed mode.
>> 
>> It seems I need to have both files contain: localhost.  Otherwise, the
>> DataNode and/or NameNode do not start.
>> 
>> Any help would be greatly appreciated.
>> 
>> Thanks.
>> 
>> -Jon
>> 
>> On Jan 2, 2011, at 3:46 AM, Black, Michael (IS) wrote:
>> 
>>> Did you sert your config and format the namenode as per these
>> instructions?
>>> 
>>> http://hadoop.apache.org/common/docs/current/single_node_setup.html
>>> 
>>> 
>>> Michael D. Black
>>> Senior Scientist
>>> Advanced Analytics Directorate
>>> Northrop Grumman Information Systems
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: HDFS FS Commands Hanging System

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Jon,

My guess is that your system's entropy pool runs dry. You can verify by
grabbing a jstack <pid of namenode> output, and seeing if you're stuck in
SecureRandom.

See this old thread:
http://www.mail-archive.com/common-user@hadoop.apache.org/msg02170.html

<http://www.mail-archive.com/common-user@hadoop.apache.org/msg02170.html>
-Todd

On Sun, Jan 2, 2011 at 8:37 AM, Jon Lederman <jo...@gmail.com> wrote:

> Hi,
>
> I followed the example precisely.  It seems to me that the NameNode and
> DataNode are not communicating.  I noticed that the log file for my DataNode
> appears suspiciously short.  I believe it should try to connect to the
> NameNode and report such progress.  The log for the DataNode simply shows:
>
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = localhost/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on F
> ri Feb 19 08:07:34 UTC 2010
> ************************************************************/
>
> Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as
> indicated in bold:
>
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = localhost/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on F
> ri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=900
> 0
> 2011-01-02 16:30:35,093 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> localhost.localdomain/127.0.0.1:90
> 00
> 2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessi
> onId=null
> 2011-01-02 16:30:35,196 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
> NameNodeMeterics using
>  context object:org.apache.hadoop.metrics.spi.NullContext
> 2011-01-02 16:30:37,022 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
> 2011-01-02 16:30:37,029 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2011-01-02 16:30:37,032 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2011-01-02 16:30:37,216 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetric
> s using context object:org.apache.hadoop.metrics.spi.NullContext
> 2011-01-02 16:30:37,242 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 1
> 2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 0
> 2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 loaded in 0 seconds.
> 2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /tmp/hadoop-root/dfs/name/current/edits of
>  size 4 edits # 0 loaded in 0 seconds.
> 2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 94 saved in 0 seconds.
> 2011-01-02 16:30:38,104 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> FSImage in 1726 msecs
> 2011-01-02 16:30:38,130 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
> = 0
> 2011-01-02 16:30:38,133 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> blocks = 0
> 2011-01-02 16:30:38,136 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> under-replicated blocks = 0
> 2011-01-02 16:30:38,139 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>  over-replicated blocks = 0
> 2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Leaving safe mode after 1 secs.
> 2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.
> Slf4jLog
> 2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() bef
> ore open() is -1. Opening the listener on 50070
> 2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50070 webServer.getConnectors()
> [0].getLocalPort() returned 50070
> 2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50070
> 2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14
>
> What should I check to see whether there is communication?  Why should the
> network topology as reported by the Namenode indicate 0 racks and 0
> Datanodes?
>
> Also, I am curious what should be in the masters and slaves files when
> running in pseudo-distributed mode.
>
> It seems I need to have both files contain: localhost.  Otherwise, the
> DataNode and/or NameNode do not start.
>
> Any help would be greatly appreciated.
>
> Thanks.
>
> -Jon
>
> On Jan 2, 2011, at 3:46 AM, Black, Michael (IS) wrote:
>
> > Did you sert your config and format the namenode as per these
> instructions?
> >
> > http://hadoop.apache.org/common/docs/current/single_node_setup.html
> >
> >
> > Michael D. Black
> > Senior Scientist
> > Advanced Analytics Directorate
> > Northrop Grumman Information Systems
> >
> >
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi,

I followed the example precisely.  It seems to me that the NameNode and DataNode are not communicating.  I noticed that the log file for my DataNode appears suspiciously short.  I believe it should try to connect to the NameNode and report such progress.  The log for the DataNode simply shows:

/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
************************************************************/

Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as indicated in bold:

/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on F
ri Feb 19 08:07:34 UTC 2010
************************************************************/
2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=900
0
2011-01-02 16:30:35,093 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.localdomain/127.0.0.1:90
00
2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessi
onId=null
2011-01-02 16:30:35,196 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using
 context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-02 16:30:37,022 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
2011-01-02 16:30:37,029 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-01-02 16:30:37,032 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-01-02 16:30:37,216 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetric
s using context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-02 16:30:37,242 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 seconds.
2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/name/current/edits of
 size 4 edits # 0 loaded in 0 seconds.
2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 seconds.
2011-01-02 16:30:38,104 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 1726 msecs
2011-01-02 16:30:38,130 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2011-01-02 16:30:38,133 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2011-01-02 16:30:38,136 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0
2011-01-02 16:30:38,139 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of  over-replicated blocks = 0
2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 1 secs.
2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.
Slf4jLog
2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() bef
ore open() is -1. Opening the listener on 50070
2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()
[0].getLocalPort() returned 50070
2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070
2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14

What should I check to see whether there is communication?  Why should the network topology as reported by the Namenode indicate 0 racks and 0 Datanodes?

Also, I am curious what should be in the masters and slaves files when running in pseudo-distributed mode.

It seems I need to have both files contain: localhost.  Otherwise, the DataNode and/or NameNode do not start.

Any help would be greatly appreciated.

Thanks.

-Jon

On Jan 2, 2011, at 3:46 AM, Black, Michael (IS) wrote:

> Did you sert your config and format the namenode as per these instructions?
> 
> http://hadoop.apache.org/common/docs/current/single_node_setup.html
> 
> 
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
> 
> 
>

Re: HDFS FS Commands Hanging System

Posted by "Black, Michael (IS)" <Mi...@ngc.com>.

Did you sert your config and format the namenode as per these instructions?
 
http://hadoop.apache.org/common/docs/current/single_node_setup.html
 
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems

Re: HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi,

Still no luck in getting FS commands to work.  I did take a look at the logs.  They all look pretty clean with the following exceptions: The DataNode appears to start up fine.  However, the NameNode reports that the Network Topology has 0 racks and 0 datanodes.  Is this normal?  Is it possible the namenode cannot talk to the datanode?  Any thoughts on what might be wrong?

Thanks in advance and happy new year.

-Jon
2011-01-01 19:45:27,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
sc-ssh-svr1 logs $ more hadoop-root-namenode-localhost.log
2011-01-01 19:45:23,988 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled
 by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2011-01-01 19:45:27,059 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=
NameNode, port=8020
2011-01-01 19:45:28,355 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost.locald
omain/127.0.0.1:8020
2011-01-01 19:45:28,448 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processNa
me=NameNode, sessionId=null
2011-01-01 19:45:28,492 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing Name
NodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-01 19:45:29,758 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
2011-01-01 19:45:29,763 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-01-01 19:45:29,770 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-01-01 19:45:29,965 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing 
FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2011-01-01 19:45:29,994 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatu
sMBean
2011-01-01 19:45:30,603 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2011-01-01 19:45:30,696 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction 
= 0
2011-01-01 19:45:30,701 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 loaded in 0 s
econds.
2011-01-01 19:45:30,708 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-root/dfs/nam
e/current/edits of size 4 edits # 0 loaded in 0 seconds.
2011-01-01 19:45:30,767 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 94 saved in 0 se
conds.
2011-01-01 19:45:30,924 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 
1701 msecs
2011-01-01 19:45:30,945 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2011-01-01 19:45:30,948 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2011-01-01 19:45:30,958 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated b
locks = 0
2011-01-01 19:45:30,963 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of  over-replicated b
locks = 0
2011-01-01 19:45:30,966 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 1 secs.
2011-01-01 19:45:30,971 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 dat
anodes
2011-01-01 19:45:30,973 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2011-01-01 19:45:33,929 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) vi
a org.mortbay.log.Slf4jLog
2011-01-01 19:45:35,020 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].
getLocalPort() before open() is -1. Opening the listener on 50070
2011-01-01 19:45:35,036 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50070 webServ
er.getConnectors()[0].getLocalPort() returned 50070
2011-01-01 19:45:35,038 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50070
2011-01-01 19:45:35,041 INFO org.mortbay.log: jetty-6.1.14
sc-ssh-svr1 logs $ 

On Dec 31, 2010, at 4:28 PM, li ping wrote:

> I suggest you should look through the logs to see if there is any error.
> And the second point that I need to point out is which node you run the
> command "hadoop fs -ls ". If you run the command on Node A, the
> configuration item "fs.default.name" should point to the HDFS.
> 
> On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com> wrote:
> 
>> Hi Michael,
>> 
>> Thanks for your response.  It doesn't seem to be an issue with safemode.
>> 
>> Even when I try the command dfsadmin -safemode get, the system hangs.  I am
>> unable to execute any FS shell commands other than hadoop fs -help.
>> 
>> I am wondering whether this an issue with communication between the
>> daemons?  What should I be looking at there?  Or could it be something else?
>> 
>> When I do jps, I do see all the daemons listed.
>> 
>> Any other thoughts.
>> 
>> Thanks again and happy new year.
>> 
>> -Jon
>> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>> 
>>> Try checking your dfs status
>>> 
>>> hadoop dfsadmin -safemode get
>>> 
>>> Probably says "ON"
>>> 
>>> hadoop dfsadmin -safemode leave
>>> 
>>> Somebody else can probably say how to make this happen every reboot....
>>> 
>>> Michael D. Black
>>> Senior Scientist
>>> Advanced Analytics Directorate
>>> Northrop Grumman Information Systems
>>> 
>>> 
>>> ________________________________
>>> 
>>> From: Jon Lederman [mailto:jon2718@gmail.com]
>>> Sent: Fri 12/31/2010 11:00 AM
>>> To: common-user@hadoop.apache.org
>>> Subject: EXTERNAL:HDFS FS Commands Hanging System
>>> 
>>> 
>>> 
>>> Hi All,
>>> 
>>> I have been working on running Hadoop on a new microprocessor
>> architecture in pseudo-distributed mode.  I have been successful in getting
>> SSH configured.  I am also able to start a namenode, secondary namenode,
>> tasktracker, jobtracker and datanode as evidenced by the response I get from
>> jps.
>>> 
>>> However, when I attempt to interact with the file system in any way such
>> as the simple command hadoop fs -ls, the system hangs.  So it appears to me
>> that some communication is not occurring properly.  Does anyone have any
>> suggestions what I look into in order to fix this problem?
>>> 
>>> Thanks in advance.
>>> 
>>> -Jon
>>> 
>> 
>> 
> 
> 
> -- 
> -----李平

Re: HDFS FS Commands Hanging System

Posted by li ping <li...@gmail.com>.

I suggest you should look through the logs to see if there is any error.
And the second point that I need to point out is which node you run the
command "hadoop fs -ls ". If you run the command on Node A, the
configuration item "fs.default.name" should point to the HDFS.

On Sat, Jan 1, 2011 at 3:20 AM, Jon Lederman <jo...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your response.  It doesn't seem to be an issue with safemode.
>
> Even when I try the command dfsadmin -safemode get, the system hangs.  I am
> unable to execute any FS shell commands other than hadoop fs -help.
>
> I am wondering whether this an issue with communication between the
> daemons?  What should I be looking at there?  Or could it be something else?
>
> When I do jps, I do see all the daemons listed.
>
> Any other thoughts.
>
> Thanks again and happy new year.
>
> -Jon
> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>
> > Try checking your dfs status
> >
> > hadoop dfsadmin -safemode get
> >
> > Probably says "ON"
> >
> > hadoop dfsadmin -safemode leave
> >
> > Somebody else can probably say how to make this happen every reboot....
> >
> > Michael D. Black
> > Senior Scientist
> > Advanced Analytics Directorate
> > Northrop Grumman Information Systems
> >
> >
> > ________________________________
> >
> > From: Jon Lederman [mailto:jon2718@gmail.com]
> > Sent: Fri 12/31/2010 11:00 AM
> > To: common-user@hadoop.apache.org
> > Subject: EXTERNAL:HDFS FS Commands Hanging System
> >
> >
> >
> > Hi All,
> >
> > I have been working on running Hadoop on a new microprocessor
> architecture in pseudo-distributed mode.  I have been successful in getting
> SSH configured.  I am also able to start a namenode, secondary namenode,
> tasktracker, jobtracker and datanode as evidenced by the response I get from
> jps.
> >
> > However, when I attempt to interact with the file system in any way such
> as the simple command hadoop fs -ls, the system hangs.  So it appears to me
> that some communication is not occurring properly.  Does anyone have any
> suggestions what I look into in order to fix this problem?
> >
> > Thanks in advance.
> >
> > -Jon
> >
>
>


-- 
-----李平

Re: HDFS FS Commands Hanging System

Posted by Todd Lipcon <to...@cloudera.com>.

Hi Jon,

Try:
HADOOP_ROOT_LOGGER=DEBUG,console hadoop fs -ls /

-Todd

On Fri, Dec 31, 2010 at 11:20 AM, Jon Lederman <jo...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for your response.  It doesn't seem to be an issue with safemode.
>
> Even when I try the command dfsadmin -safemode get, the system hangs.  I am
> unable to execute any FS shell commands other than hadoop fs -help.
>
> I am wondering whether this an issue with communication between the
> daemons?  What should I be looking at there?  Or could it be something else?
>
> When I do jps, I do see all the daemons listed.
>
> Any other thoughts.
>
> Thanks again and happy new year.
>
> -Jon
> On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:
>
> > Try checking your dfs status
> >
> > hadoop dfsadmin -safemode get
> >
> > Probably says "ON"
> >
> > hadoop dfsadmin -safemode leave
> >
> > Somebody else can probably say how to make this happen every reboot....
> >
> > Michael D. Black
> > Senior Scientist
> > Advanced Analytics Directorate
> > Northrop Grumman Information Systems
> >
> >
> > ________________________________
> >
> > From: Jon Lederman [mailto:jon2718@gmail.com]
> > Sent: Fri 12/31/2010 11:00 AM
> > To: common-user@hadoop.apache.org
> > Subject: EXTERNAL:HDFS FS Commands Hanging System
> >
> >
> >
> > Hi All,
> >
> > I have been working on running Hadoop on a new microprocessor
> architecture in pseudo-distributed mode.  I have been successful in getting
> SSH configured.  I am also able to start a namenode, secondary namenode,
> tasktracker, jobtracker and datanode as evidenced by the response I get from
> jps.
> >
> > However, when I attempt to interact with the file system in any way such
> as the simple command hadoop fs -ls, the system hangs.  So it appears to me
> that some communication is not occurring properly.  Does anyone have any
> suggestions what I look into in order to fix this problem?
> >
> > Thanks in advance.
> >
> > -Jon
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: HDFS FS Commands Hanging System

Posted by Jon Lederman <jo...@gmail.com>.

Hi Michael,

Thanks for your response.  It doesn't seem to be an issue with safemode.

Even when I try the command dfsadmin -safemode get, the system hangs.  I am unable to execute any FS shell commands other than hadoop fs -help.

I am wondering whether this an issue with communication between the daemons?  What should I be looking at there?  Or could it be something else?

When I do jps, I do see all the daemons listed.

Any other thoughts.

Thanks again and happy new year.

-Jon
On Dec 31, 2010, at 9:09 AM, Black, Michael (IS) wrote:

> Try checking your dfs status
> 
> hadoop dfsadmin -safemode get
> 
> Probably says "ON"
> 
> hadoop dfsadmin -safemode leave
> 
> Somebody else can probably say how to make this happen every reboot....
> 
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
> 
> 
> ________________________________
> 
> From: Jon Lederman [mailto:jon2718@gmail.com]
> Sent: Fri 12/31/2010 11:00 AM
> To: common-user@hadoop.apache.org
> Subject: EXTERNAL:HDFS FS Commands Hanging System
> 
> 
> 
> Hi All,
> 
> I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode.  I have been successful in getting SSH configured.  I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps.
> 
> However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs.  So it appears to me that some communication is not occurring properly.  Does anyone have any suggestions what I look into in order to fix this problem?
> 
> Thanks in advance.
> 
> -Jon 
>

RE:HDFS FS Commands Hanging System

Posted by "Black, Michael (IS)" <Mi...@ngc.com>.

Try checking your dfs status
 
hadoop dfsadmin -safemode get
 
Probably says "ON"
 
hadoop dfsadmin -safemode leave
 
Somebody else can probably say how to make this happen every reboot....
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 

________________________________

From: Jon Lederman [mailto:jon2718@gmail.com]
Sent: Fri 12/31/2010 11:00 AM
To: common-user@hadoop.apache.org
Subject: EXTERNAL:HDFS FS Commands Hanging System



Hi All,

I have been working on running Hadoop on a new microprocessor architecture in pseudo-distributed mode.  I have been successful in getting SSH configured.  I am also able to start a namenode, secondary namenode, tasktracker, jobtracker and datanode as evidenced by the response I get from jps.

However, when I attempt to interact with the file system in any way such as the simple command hadoop fs -ls, the system hangs.  So it appears to me that some communication is not occurring properly.  Does anyone have any suggestions what I look into in order to fix this problem?

Thanks in advance.

-Jon