You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/09/29 22:21:44 UTC

[Lucene-hadoop Wiki] Update of "FrontPage" by OwenOMalley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/FrontPage

------------------------------------------------------------------------------
[http://lucene.apache.org/hadoop/ Hadoop] is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named [:HadoopMapReduce: Map/Reduce], where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

The intent is to scale Hadoop up to handling thousand of computers. The current high water marks that have been reported are:
- * Nodes in a single file system cluster (!DataNodes): 620
+ * Nodes in a single file system cluster (!DataNodes): 902
- * Nodes in a single map/reduce cluster (!TaskTrackers): 500
+ * Nodes in a single map/reduce cluster (!TaskTrackers): 902

Hadoop was originally built as infrastructure for the [http://lucene.apache.org/nutch/ Nutch] project, which crawls the web and builds a search engine index for the crawled pages. Both Hadoop and Nutch are part of the [http://lucene.apache.org/java/docs/index.html Lucene] [http://www.apache.org/ Apache] project.