You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2012/02/06 16:44:25 UTC
Too few parsed pages
Hi,
When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages, but only 10.000 'PARSED' pages. One month ago, I had something like 40.000 'PARSED' pages in my segments, and this number reduced a little every day. If I look in the logs of the segments, I can find approximately these numbers if I count the number of treated pages. But I find nothing strange in the parse that could explain the fact I have so few pages in the end.
What can explain the fact I have so few pages which are parsed ?
Thanks.
Re: Too few parsed pages
Posted by Markus Jelsma <ma...@openindex.io>.
Likely db_not_modified records, they are not parsed.
On Monday 06 February 2012 16:44:25 Danicela nutch wrote:
> Hi,
>
> When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages,
> but only 10.000 'PARSED' pages. One month ago, I had something like 40.000
> 'PARSED' pages in my segments, and this number reduced a little every day.
> If I look in the logs of the segments, I can find approximately these
> numbers if I count the number of treated pages. But I find nothing strange
> in the parse that could explain the fact I have so few pages in the end.
>
> What can explain the fact I have so few pages which are parsed ?
>
> Thanks.
--
Markus Jelsma - CTO - Openindex