You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2011/10/28 06:54:33 UTC

crawldb stats do not match

Hello,

I have merged two cralwldb  using bin/nutch mergedb crawldb crawldb1 crawldb2

I noticed that stats numbers in crawldb1+crawldb2 are not equal to numbers in crawldb. For example df_unfetched1+ df_unfetched2 is not equal to  df_unfetched

Any comments on this issue?

Thanks.
Alex.

Re: crawldb stats do not match

Posted by Mathijs Homminga <ma...@kalooga.com>.
Hi Alex,

My first thought: duplicate entries (urls)  in crawldb1 and crawldb2?

Mathijs

On Oct 28, 2011, at 6:54 , alxsss@aim.com wrote:

> Hello,
> 
> I have merged two cralwldb  using bin/nutch mergedb crawldb crawldb1 crawldb2
> 
> I noticed that stats numbers in crawldb1+crawldb2 are not equal to numbers in crawldb. For example df_unfetched1+ df_unfetched2 is not equal to  df_unfetched
> 
> Any comments on this issue?
> 
> Thanks.
> Alex.