You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/16 17:57:58 UTC
[Hadoop Wiki] Update of "TroubleShooting" by SteveLoughran

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "TroubleShooting" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/TroubleShooting?action=diff&rev1=7&rev2=8

Comment:
add CouldOnlyBeReplicatedTo

  === Exception when initializing the filesystem ===
  
  
+ {{{
- {{{ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException
+ ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException
      at java.io.DataInputStream.readFully(DataInputStream.java:178)
      at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
      at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
@@ -22, +23 @@

      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:176)
      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:162)
      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
-     at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)}}}
+     at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)
+ }}}
  
- This is sometimes encountered if there is a corruption of the {{{edits}}} file
+ This is sometimes encountered if there is a corruption of the {{{ edits }}} file
  in the transaction log. Try using a hex editor or equivalent to open
  up 'edits' and get rid of the last record. In all cases, the last record
  might not be complete so your NameNode is not starting. Once you update
- your edits, start the NameNode and run {{{hadoop fsck /}}} to see if you
+ your edits, start the NameNode and run {{{ hadoop fsck / }}} to see if you
  have any corrupt files and fix/get rid of them.
  
- Take a back up of {{{dfs.name.dir}}} before updating and playing around
+ Take a back up of {{{ dfs.name.dir }}} before updating and playing around
  with it.
  
  == Client cannot talk to filesystem ==
@@ -51, +53 @@

  === Error message: Could not obtain block ===
  
  Your logs contain something like
- {{{INFO hdfs.DFSClient: Could not obtain block blk_-4157273618194597760_1160
+ {{{ INFO hdfs.DFSClient: Could not obtain block blk_-4157273618194597760_1160
-  from any node:  java.io.IOException: No live nodes contain current block}}}
+  from any node:  java.io.IOException: No live nodes contain current block }}}
  
  There are no live datanodes containing a copy of the block of the file you are looking for. Bring up any nodes that are down, or skip that block.
  
  == Reduce hangs ==
  
  This can be a DNS issue. Two problems which have been encountered in practice are:
-  * Machines with multiple NICs. In this case, set {{{dfs.datanode.dns.interface}}} (in {{{hdfs-site.xml}}}) and {{{mapred.datanode.dns.interface}}} (in {{{mapred-site.xml}}}) to the name of the network interface used by Hadoop (something like eth0 under Linux),
+  * Machines with multiple NICs. In this case, set {{{ dfs.datanode.dns.interface }}} (in {{{ hdfs-site.xml }}}) and {{{ mapred.datanode.dns.interface }}} (in {{{ mapred-site.xml }}}) to the name of the network interface used by Hadoop (something like {{{ eth0 }}} under Linux),
-  * Badly formatted or incorrect hosts files ({{{/etc/hosts}}} under Linux) can wreak havoc. Any DNS problem will hobble Hadoop, so ensure that names can be resolved correctly.
+  * Badly formatted or incorrect hosts files ({{{ /etc/hosts }}} under Linux) can wreak havoc. Any DNS problem will hobble Hadoop, so ensure that names can be resolved correctly.
  
+ == Error message saying a file "Could only be replicated to 0 nodes instead of 1" ==
+ 
+ (or any similar number)
+ 
+ See [[CouldOnlyBeReplicatedTo]]
+