You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by adu <du...@hzduozhun.com> on 2014/08/08 05:03:51 UTC
How to reduce the unfetched urls?
Hi all,
I use 10000 urls as the seeds , and crawl with depth 1. The result I got
is only 2000 urls are fetched.
I have checked the url filter. Also, i can't find any log about the http
connect failure. Are there any configs
should I notice in nutch-default.xml? Wait for your help.
Re: How to reduce the unfetched urls?
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,
are unfetched URLs marked as such (status db_unfetched)?
You can check this using
$NUTCH_HOME/bin/nutch readdb
If yes, and since exactly 2000 URLs are fetched,
it's more likely a problem with
-topN <size_of_fetch_list>
Which version of Nutch is used?
Best,
Sebastian
2014-08-08 5:03 GMT+02:00 adu <du...@hzduozhun.com>:
> Hi all,
> I use 10000 urls as the seeds , and crawl with depth 1. The result I got
> is only 2000 urls are fetched.
>
> I have checked the url filter. Also, i can't find any log about the http
> connect failure. Are there any configs
>
> should I notice in nutch-default.xml? Wait for your help.
>
Re: How to reduce the unfetched urls?
Posted by al...@aim.com.
What is status of one of the unfetched urls in the db?
-----Original Message-----
From: adu <du...@hzduozhun.com>
To: user <us...@nutch.apache.org>
Sent: Thu, Aug 7, 2014 8:04 pm
Subject: How to reduce the unfetched urls?
Hi all,
I use 10000 urls as the seeds , and crawl with depth 1. The result I got
is only 2000 urls are fetched.
I have checked the url filter. Also, i can't find any log about the http
connect failure. Are there any configs
should I notice in nutch-default.xml? Wait for your help.