You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vijay <vi...@gmail.com> on 2009/10/02 02:10:22 UTC

Fetcher problems with stable version of nutch-1.0 ?

Hi all,

    I am trying to use nutch to crawl and index a list of about 50K URLs
with depth=1.  I am running indexing with the command:
nutch-1.0/bin/nutch crawl urls/ -depth 1 -topN 100000
  with appropriate changes to the configuration files.

  I find that the fetching always terminates prematurely and the logs show
an error that looks like:

----------------------------------------------------------------------------------------------------------------
activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1
Aborting with 200 hung threads.
Fetcher: done
----------------------------------------------------------------------------------------------------------------

   I have not seen this particular error message when using nutch-0.9. Is it
advisable to revert to using nutch-0.9? Or do we have some kind of patch to
fix this error?



Thanks,
Vijay

Re: Fetcher problems with stable version of nutch-1.0 ?

Posted by Julien Nioche <li...@gmail.com>.
Hi,

This is likely to be related to
https://issues.apache.org/jira/browse/NUTCH-719 (see post from S Dennis for
the solution). The totalsize counter was getting out of sync with the actual
content of the fetch queues causing the Fetcher to wait idly before
timeouting and aborting. Nutch 0.9 uses a different fetcher implementation
which is why it did not give this problem.

What makes you think that it stops prematurely? Aren't you getting all the
expected URLs?

HTH

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2009/10/2 Vijay <vi...@gmail.com>

> Hi all,
>
>    I am trying to use nutch to crawl and index a list of about 50K URLs
> with depth=1.  I am running indexing with the command:
> nutch-1.0/bin/nutch crawl urls/ -depth 1 -topN 100000
>  with appropriate changes to the configuration files.
>
>  I find that the fetching always terminates prematurely and the logs show
> an error that looks like:
>
>
> ----------------------------------------------------------------------------------------------------------------
> activeThreads=200, spinWaiting=200, fetchQueues.totalSize=1
> Aborting with 200 hung threads.
> Fetcher: done
>
> ----------------------------------------------------------------------------------------------------------------
>
>   I have not seen this particular error message when using nutch-0.9. Is it
> advisable to revert to using nutch-0.9? Or do we have some kind of patch to
> fix this error?
>
>
>
> Thanks,
> Vijay
>