You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ni...@gmail.com on 2005/10/25 02:29:00 UTC
Nutch Limits (maximum number of pages)
Hi,
Does anybody know what the maximum number of pages that have ever been
fetched and indexed with nutch is? I know Yahoo Research did fetch 100M
pages about 3 years ago, but they stopped after that. Is there any real
large scale (like, google and yahoo) Webdb out there that has been fetched
by nutch?
Thanks, Nima
Re: Nutch Limits (maximum number of pages)
Posted by Stefan Groschupf <sg...@media-style.com>.
With map reduce there will be only hardware limits.
To crawl ~ 500 Mio with nutch .7 is a pain since db update mai takes
more than one week.
Stefan
Am 25.10.2005 um 02:29 schrieb <ni...@gmail.com> <ni...@gmail.com>:
> Hi,
>
> Does anybody know what the maximum number of pages that have ever been
> fetched and indexed with nutch is? I know Yahoo Research did fetch
> 100M
> pages about 3 years ago, but they stopped after that. Is there any
> real
> large scale (like, google and yahoo) Webdb out there that has been
> fetched
> by nutch?
>
> Thanks, Nima
>