You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2011/10/28 06:54:33 UTC
crawldb stats do not match
Hello,
I have merged two cralwldb using bin/nutch mergedb crawldb crawldb1 crawldb2
I noticed that stats numbers in crawldb1+crawldb2 are not equal to numbers in crawldb. For example df_unfetched1+ df_unfetched2 is not equal to df_unfetched
Any comments on this issue?
Thanks.
Alex.
Re: crawldb stats do not match
Posted by Mathijs Homminga <ma...@kalooga.com>.
Hi Alex,
My first thought: duplicate entries (urls) in crawldb1 and crawldb2?
Mathijs
On Oct 28, 2011, at 6:54 , alxsss@aim.com wrote:
> Hello,
>
> I have merged two cralwldb using bin/nutch mergedb crawldb crawldb1 crawldb2
>
> I noticed that stats numbers in crawldb1+crawldb2 are not equal to numbers in crawldb. For example df_unfetched1+ df_unfetched2 is not equal to df_unfetched
>
> Any comments on this issue?
>
> Thanks.
> Alex.