You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/10/08 01:55:08 UTC

[Nutch Wiki] Update of "FAQ" by Gal Nitzan

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  ==== What is MapReduce? ====
  
- Well the information is scarce but you can get an idea by reading the following documents:
+ [http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html#more "Excerpt from TomWhite's blog: MapReduce"]
+ MapReduce is the brainchild of Google and is very well documented by Jeffrey Dean and Sanjay Ghemawat in their paper [http://labs.google.com/papers/mapreduce.html "MapReduce: Simplified Data Processing on Large Clusters"]. In essence, it allows massive data sets to be processed in a distributed fashion by breaking the processing into many small computations of two types: a map operation that transforms the input into an intermediate representation, and a reduce function that recombines the intermediate representation into the final output. This processing model is ideal for the operations a search engine indexer like Nutch or Google needs to perform - like computing inlinks for URLs, or building inverted indexes - and it will [http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/mapred.pdf "transform Nutch"] into a scalable, distributed search engine.
  
- [http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf "MapReduce presentation by Doug Cutting"]
- 
- [http://labs.google.com/papers/mapreduce.html "Google lab document"]
  
  ==== How to start working with MapReduce? ====