You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/12 11:44:49 UTC

[Nutch Wiki] Update of "NutchHadoopTutorial" by AlexMc

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchHadoopTutorial" page has been changed by AlexMc.
http://wiki.apache.org/nutch/NutchHadoopTutorial?action=diff&rev1=17&rev2=18

--------------------------------------------------

- = How to Setup Nutch and Hadoop =
+ = How to Setup Nutch (V1.0) and Hadoop =
  --------------------------------------------------------------------------------
  After searching the web and mailing lists, it seems that there is very little information on how to setup Nutch using the Hadoop (formerly NDFS) distributed file system (HDFS) and MapReduce.  The purpose of this tutorial is to provide a step-by-step method to get Nutch running with Hadoop file system on multiple machines, including being able to both index (crawl) and search across multiple machines.  
  
@@ -15, +15 @@

  Three, this tutorial uses Whitebox Enterprise Linux 3 Respin 2 (WHEL).  For those of you who don't know Whitebox, it is a RedHat Enterprise Linux clone.  You should be able to follow along for any linux system, but the systems I use are Whitebox.
  
  Four, this tutorial uses Nutch 0.8 Dev Revision 385702, and may not be compatible with future releases of either Nutch or Hadoop.
+ (AlexMc is trying to update this article to make it consistent with Nutch version 1.0 )
  
  Five, for this tutorial we setup nutch across 6 different computers.  If you are using a different number of machines you should still be fine but you should have at least two different machines to prove the distributed capabilities of both HDFS and MapReduce.