You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2012/02/06 16:44:25 UTC

Too few parsed pages

Hi,

 When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages, but only 10.000 'PARSED' pages. One month ago, I had something like 40.000 'PARSED' pages in my segments, and this number reduced a little every day. If I look in the logs of the segments, I can find approximately these numbers if I count the number of treated pages. But I find nothing strange in the parse that could explain the fact I have so few pages in the end.

 What can explain the fact I have so few pages which are parsed ?

 Thanks.

Re: Too few parsed pages

Posted by Markus Jelsma <ma...@openindex.io>.
Likely db_not_modified records, they are not parsed.

On Monday 06 February 2012 16:44:25 Danicela nutch wrote:
> Hi,
> 
>  When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages,
> but only 10.000 'PARSED' pages. One month ago, I had something like 40.000
> 'PARSED' pages in my segments, and this number reduced a little every day.
> If I look in the logs of the segments, I can find approximately these
> numbers if I count the number of treated pages. But I find nothing strange
> in the parse that could explain the fact I have so few pages in the end.
> 
>  What can explain the fact I have so few pages which are parsed ?
> 
>  Thanks.

-- 
Markus Jelsma - CTO - Openindex