You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/08 00:04:36 UTC
readdb -stats
Hello,
When I run the readdb -stats command on my crawldb, I get:
status 1 (db_unfetched): 199820
status 2 (db_fetched): 257384
status 3 (db_gone): 557
status 4 (db_redir_temp): 40265
status 5 (db_redir_perm): 6152
CrawlDb statistics: done
I understand the db_unfetched and db_fetched, but what are the other stats?
Best regards,
-C.B.
Re: readdb -stats
Posted by Markus Jelsma <ma...@openindex.io>.
http://svn.apache.org/viewvc/nutch/tags/release-1.3/src/java/org/apache/nutch/protocol/ProtocolStatus.java?view=markup
> Hello,
>
> When I run the readdb -stats command on my crawldb, I get:
>
> status 1 (db_unfetched): 199820
> status 2 (db_fetched): 257384
> status 3 (db_gone): 557
> status 4 (db_redir_temp): 40265
> status 5 (db_redir_perm): 6152
> CrawlDb statistics: done
>
> I understand the db_unfetched and db_fetched, but what are the other stats?
>
> Best regards,
> -C.B.
Re: readdb -stats
Posted by lewis john mcgibbney <le...@gmail.com>.
Hi C.B.,
OK, this requires another entry to the wiki page, if you have not already
found it, it can be found here [1]. Thanks for pointing this out.
In a nutshell I think this page [2] will give you the best description of
your 'unknowns' below. I realise that this may seem like I am passing the
buck, but reading this page paying particular attention to this section here
[3] should sort the majority of the grey areas.
[1] http://wiki.apache.org/nutch/bin/nutch_readdb
[2] http://en.wikipedia.org/wiki/HTTP_404
[3] http://en.wikipedia.org/wiki/HTTP_404#Overview e.g. 410 Gone
If you would like to update the wiki please do so, if not I will get it
sorted shortly.
Thanks
On Thu, Jul 7, 2011 at 11:04 PM, Cam Bazz <ca...@gmail.com> wrote:
> Hello,
>
> When I run the readdb -stats command on my crawldb, I get:
>
> status 1 (db_unfetched): 199820
> status 2 (db_fetched): 257384
> status 3 (db_gone): 557
> status 4 (db_redir_temp): 40265
> status 5 (db_redir_perm): 6152
> CrawlDb statistics: done
>
> I understand the db_unfetched and db_fetched, but what are the other stats?
>
> Best regards,
> -C.B.
>
--
*Lewis*