You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2008/09/17 15:30:12 UTC
how much space required?
Hi,
I'm running an intranet crawl and have got to the 6th depth which apparently has 2.2 million links to fetch. I started off with 100Gb but that was barely enough for the fetch not to mention the updatedb step, so I'm just trying to find a reliable method for determining how much space is required to do the crawl.
Any ideas?
Ed.
_________________________________________________________________
Win New York holidays with Kellogg’s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/
RE: how much space required?
Posted by Edward Quick <ed...@hotmail.com>.
>
> I wonder if crawling to that depth for that many links you may have no
> choice but to set up a hadoop cluster rather than trying to run it on a
> single machine.
Thanks Kevin, I wondered what hadoop for!
>
> On Wed, Sep 17, 2008 at 6:30 AM, Edward Quick <ed...@hotmail.com>wrote:
>
> >
> > Hi,
> >
> > I'm running an intranet crawl and have got to the 6th depth which
> > apparently has 2.2 million links to fetch. I started off with 100Gb but that
> > was barely enough for the fetch not to mention the updatedb step, so I'm
> > just trying to find a reliable method for determining how much space is
> > required to do the crawl.
> >
> > Any ideas?
> >
> > Ed.
> >
> > _________________________________________________________________
> > Win New York holidays with Kellogg's & Live Search
> > http://clk.atdmt.com/UKM/go/111354033/direct/01/
_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/
Re: how much space required?
Posted by Kevin MacDonald <ke...@hautesecure.com>.
I wonder if crawling to that depth for that many links you may have no
choice but to set up a hadoop cluster rather than trying to run it on a
single machine.
On Wed, Sep 17, 2008 at 6:30 AM, Edward Quick <ed...@hotmail.com>wrote:
>
> Hi,
>
> I'm running an intranet crawl and have got to the 6th depth which
> apparently has 2.2 million links to fetch. I started off with 100Gb but that
> was barely enough for the fetch not to mention the updatedb step, so I'm
> just trying to find a reliable method for determining how much space is
> required to do the crawl.
>
> Any ideas?
>
> Ed.
>
> _________________________________________________________________
> Win New York holidays with Kellogg's & Live Search
> http://clk.atdmt.com/UKM/go/111354033/direct/01/