You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/07/07 20:41:27 UTC

[Nutch Wiki] Update of "NutchHadoopTutorial" by StephenHalsey

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by StephenHalsey:
http://wiki.apache.org/nutch/NutchHadoopTutorial

The comment on the change is:
An additional comment to the page about a possible problem when adding datanodes

------------------------------------------------------------------------------
  scp -r /nutch/search/* nutch@computer:/nutch/search
  }}}
  
- Do this for every computer you want to use as a slave node.  Then edit the slaves file, adding each slave node name to the file, one per line.  You will also want to edit the hadoop-site.xml file and change the values for the map and reduce task numbers, making this a multiple of the number of machines you have.  For our system which has 6 data nodes I put in 32 as the number of tasks.  The replication property can also be changed at this time.  A good starting value si something like 2 or 3.  Once this is done you should be able to startup all of the nodes.
+ Do this for every computer you want to use as a slave node.  Then edit the slaves file, adding each slave node name to the file, one per line.  You will also want to edit the hadoop-site.xml file and change the values for the map and reduce task numbers, making this a multiple of the number of machines you have.  For our system which has 6 data nodes I put in 32 as the number of tasks.  The replication property can also be changed at this time.  A good starting value si something like 2 or 3. *(see Note at bottom about possibly having to clear filesystem of new datanodes).   Once this is done you should be able to startup all of the nodes.
  
  To start all of the nodes we use the exact same command as before:
  
@@ -570, +570 @@

  
  http://www.netlikon.de/docs/javadoc-hadoop-0.1/overview-summary.html
  
+ 
+ * - I, StephenHalsey, have used this tutorial and found it very useful, but when I tried to add additional datanodes I got error messages in the logs of those datanodes saying "2006-07-07 18:58:18,345 INFO org.apache.hadoop.dfs.DataNode: Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.UnregisteredDatanodeException: Data node linux89-1:50010is attempting to report storage ID DS-1437847760. Expecting DS-1437847760.".  I think this was because the hadoop/filesystem/data/storage file was the same on the new data nodes and they had the same data as the one that had been copied from the original.  To get round this I turned everything off using bin/stop-all.sh on the name-node and deleted everything in the /filesystem directory on the new datanodes so they were clean and ran bin/start-all.sh on the namenode and then saw that the filesystem on the new datanodes had been created with new hadoop/filesystem/data/storage files and new directories and everything
  seemed to work fine from then on.  This probably is not a problem if you do follow the above process without starting any datanodes because they will all be empty, but was for me because I put some data onto the dfs of the single datanode system before copying it all onto the new datanodes.  Well done for the tutorial by the way, very helpful. Steve.
+