You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ni...@gmail.com on 2005/10/25 02:29:00 UTC

Nutch Limits (maximum number of pages)

Hi,

Does anybody know what the maximum number of pages that have ever been
fetched and indexed with nutch is? I know Yahoo Research did fetch 100M
pages about 3 years ago, but they stopped after that. Is there any real
large scale (like, google and yahoo) Webdb out there that has been fetched
by nutch?

Thanks, Nima

Re: Nutch Limits (maximum number of pages)

Posted by Stefan Groschupf <sg...@media-style.com>.
With map reduce there will be only hardware limits.
To crawl ~ 500 Mio with nutch .7 is  a pain since db update mai takes  
more than one week.

Stefan

Am 25.10.2005 um 02:29 schrieb <ni...@gmail.com> <ni...@gmail.com>:

> Hi,
>
> Does anybody know what the maximum number of pages that have ever been
> fetched and indexed with nutch is? I know Yahoo Research did fetch  
> 100M
> pages about 3 years ago, but they stopped after that. Is there any  
> real
> large scale (like, google and yahoo) Webdb out there that has been  
> fetched
> by nutch?
>
> Thanks, Nima
>