You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Polsnet <po...@163.com> on 2009/07/03 06:03:30 UTC

Nutch 1.0 on the limits of the data

Nutch 1.0 largest number of data can support? (File size or number of
records)
-- 
View this message in context: http://www.nabble.com/Nutch-1.0-on-the-limits-of-the-data-tp24317298p24317298.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 1.0 on the limits of the data

Posted by Otis Gospodnetic <og...@yahoo.com>.

Depends on hardware, of course!

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Polsnet <po...@163.com>
> To: nutch-user@lucene.apache.org
> Sent: Friday, July 3, 2009 12:03:30 AM
> Subject: Nutch 1.0 on the limits of the data
> 
> 
> Nutch 1.0 largest number of data can support? (File size or number of
> records)
> -- 
> View this message in context: 
> http://www.nabble.com/Nutch-1.0-on-the-limits-of-the-data-tp24317298p24317298.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 1.0 on the limits of the data

Posted by Dennis Kubes <ku...@apache.org>.

Simple answer is billions, perhaps tens to hundreds of billions of 
records, as it leverages Hadoop.  Yahoo is currently using Hadoop to 
create its web index.  But as Otis pointed out, Hadoop is parallel 
processing and as such is completely dependent on amount of hardware.

Dennis

Polsnet wrote:
> Nutch 1.0 largest number of data can support? (File size or number of
> records)