You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Abhijit Bera <ab...@geodesiconline.com> on 2008/05/31 13:26:17 UTC

Does the fetch phase take a very long time?

Hi

I got nutch working on my cluster after making necessary changes to my
crawlfilter file. It seems to be working.

I fired a crawl command two days back to the nutch cluster to crawl a
list of 20 websites to a depth of 8.

As of now I think it's fetching at depth 3 and the segment files
generated are almost 70MB in size. I opened the files and I could see
valid URLs.

The first two fetch phases also took a very long time to complete. This
third fetch phase is taking even longer than the first two. 

Is this normal or is something going terribly wrong?

-- 
Abhijit Bera

Associate Software Engineer - Web Enterprise Division

Geodesic Information Systems Ltd.

Please show concern for the environment. Print this e-mail only if
required.

I use Ubuntu Linux.

--Disclaimer--

This email and any files transmitted with it are confidential and
intended solely for the use of the entity to which they are addressed.
If you have received this email in error please notify the sender
immediately. Please note that any views presented in the email are
solely those of the author and do not necessarily represent those
of Geodesic.

While all care has been taken to avoid viruses the recipient is advised
to check this email and attachments for presence of viruses. Geodesic
accepts no liability on this account. Mails may be stored for monitoring
and review