You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Polsnet <po...@163.com> on 2009/07/03 06:03:30 UTC
Nutch 1.0 on the limits of the data
Nutch 1.0 largest number of data can support? (File size or number of
records)
--
View this message in context: http://www.nabble.com/Nutch-1.0-on-the-limits-of-the-data-tp24317298p24317298.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Nutch 1.0 on the limits of the data
Posted by Otis Gospodnetic <og...@yahoo.com>.
Depends on hardware, of course!
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
> From: Polsnet <po...@163.com>
> To: nutch-user@lucene.apache.org
> Sent: Friday, July 3, 2009 12:03:30 AM
> Subject: Nutch 1.0 on the limits of the data
>
>
> Nutch 1.0 largest number of data can support? (File size or number of
> records)
> --
> View this message in context:
> http://www.nabble.com/Nutch-1.0-on-the-limits-of-the-data-tp24317298p24317298.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Nutch 1.0 on the limits of the data
Posted by Dennis Kubes <ku...@apache.org>.
Simple answer is billions, perhaps tens to hundreds of billions of
records, as it leverages Hadoop. Yahoo is currently using Hadoop to
create its web index. But as Otis pointed out, Hadoop is parallel
processing and as such is completely dependent on amount of hardware.
Dennis
Polsnet wrote:
> Nutch 1.0 largest number of data can support? (File size or number of
> records)