You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2016/01/16 15:23:52 UTC

Re: user Digest 16 Jan 2016 13:19:55 -0000 Issue 2520

Hi Manish,

On Sat, Jan 16, 2016 at 5:19 AM, <us...@nutch.apache.org> wrote:

> I was Checking the Nutch logs,I observed there are more fetching logs then
> parsed logs.
> I understand parsing does not happen for urls with fetch fail but the
> difference is so high, any Idea ?


How did you run the crawl e.g. did you enable filtering at each stage? or
more than one stage? A dump of your crawldb will most likely give you
insight into what is going on.
Also, see the ProtocolStatus tool and the work Mike Jove has been doing
recently e.g. c, it will give you an insight into what is going on
https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/util/ProtocolStatusStatistics.java
Ta