You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/08 00:04:36 UTC

readdb -stats

Hello,

When I run the readdb -stats command on my crawldb, I get:

status 1 (db_unfetched):        199820
status 2 (db_fetched):  257384
status 3 (db_gone):     557
status 4 (db_redir_temp):       40265
status 5 (db_redir_perm):       6152
CrawlDb statistics: done

I understand the db_unfetched and db_fetched, but what are the other stats?

Best regards,
-C.B.

Re: readdb -stats

Posted by Markus Jelsma <ma...@openindex.io>.
http://svn.apache.org/viewvc/nutch/tags/release-1.3/src/java/org/apache/nutch/protocol/ProtocolStatus.java?view=markup

> Hello,
> 
> When I run the readdb -stats command on my crawldb, I get:
> 
> status 1 (db_unfetched):        199820
> status 2 (db_fetched):  257384
> status 3 (db_gone):     557
> status 4 (db_redir_temp):       40265
> status 5 (db_redir_perm):       6152
> CrawlDb statistics: done
> 
> I understand the db_unfetched and db_fetched, but what are the other stats?
> 
> Best regards,
> -C.B.

Re: readdb -stats

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi C.B.,

OK, this requires another entry to the wiki page, if you have not already
found it, it can be found here [1]. Thanks for pointing this out.

In a nutshell I think this page [2] will give you the best description of
your 'unknowns' below. I realise that this may seem like I am passing the
buck, but reading this page paying particular attention to this section here
[3] should sort the majority of the grey areas.

[1] http://wiki.apache.org/nutch/bin/nutch_readdb
[2] http://en.wikipedia.org/wiki/HTTP_404
[3] http://en.wikipedia.org/wiki/HTTP_404#Overview e.g. 410 Gone

If you would like to update the wiki please do so, if not I will get it
sorted shortly.

Thanks

On Thu, Jul 7, 2011 at 11:04 PM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> When I run the readdb -stats command on my crawldb, I get:
>
> status 1 (db_unfetched):        199820
> status 2 (db_fetched):  257384
> status 3 (db_gone):     557
> status 4 (db_redir_temp):       40265
> status 5 (db_redir_perm):       6152
> CrawlDb statistics: done
>
> I understand the db_unfetched and db_fetched, but what are the other stats?
>
> Best regards,
> -C.B.
>



-- 
*Lewis*