You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/12 11:42:49 UTC

[Nutch Wiki] Update of "NutchHadoopTutorial0.8" by AlexMc

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchHadoopTutorial0.8" page has been changed by AlexMc.
http://wiki.apache.org/nutch/NutchHadoopTutorial0.8?action=diff&rev1=18&rev2=19

--------------------------------------------------

  ## page was copied from NutchHadoopTutorial
- = How to Setup Nutch and Hadoop =
+ = How to Setup Nutch V0.8 and Hadoop =
  --------------------------------------------------------------------------------
+ Note: this is a slightly old version of the article. The latest version should be found at [NutchHadoopTutorial]
+ 
  After searching the web and mailing lists, it seems that there is very little information on how to setup Nutch using the Hadoop (formerly NDFS) distributed file system (HDFS) and MapReduce.  The purpose of this tutorial is to provide a step-by-step method to get Nutch running with Hadoop file system on multiple machines, including being able to both index (crawl) and search across multiple machines.  
  
  This document does not go into the Nutch or Hadoop architecture.  It only tells how to get the systems up and running.  At the end of the tutorial though I will point you to relevant resources if you want to know more about the architecture of Nutch and Hadoop.