You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2012/02/06 17:03:52 UTC

Re : Re: Too few parsed pages

I don't understand, what should I do ?

----- Message d'origine -----
De : Markus Jelsma
Envoyés : 06.02.12 16:45
À : user@nutch.apache.org
Objet : Re: Too few parsed pages

 Likely db_not_modified records, they are not parsed. On Monday 06 February 2012 16:44:25 Danicela nutch wrote: > Hi, > > When I make a readseg -list on a segment, I have 60.000 'FETCHED' pages, > but only 10.000 'PARSED' pages. One month ago, I had something like 40.000 > 'PARSED' pages in my segments, and this number reduced a little every day. > If I look in the logs of the segments, I can find approximately these > numbers if I count the number of treated pages. But I find nothing strange > in the parse that could explain the fact I have so few pages in the end. > > What can explain the fact I have so few pages which are parsed ? > > Thanks. -- Markus Jelsma - CTO - Openindex

Re: Too few parsed pages

Posted by Markus Jelsma <ma...@openindex.io>.
Nothing, this is good. If a page is not modified you don't need to parse it 
again as it was already parsed in an older segment.

On Monday 06 February 2012 17:03:52 Danicela nutch wrote:
> I don't understand, what should I do ?
> 
> ----- Message d'origine -----
> De : Markus Jelsma
> Envoyés : 06.02.12 16:45
> À : user@nutch.apache.org
> Objet : Re: Too few parsed pages
> 
>  Likely db_not_modified records, they are not parsed. On Monday 06 February
> 2012 16:44:25 Danicela nutch wrote: > Hi, > > When I make a readseg -list
> on a segment, I have 60.000 'FETCHED' pages, > but only 10.000 'PARSED'
> pages. One month ago, I had something like 40.000 > 'PARSED' pages in my
> segments, and this number reduced a little every day. > If I look in the
> logs of the segments, I can find approximately these > numbers if I count
> the number of treated pages. But I find nothing strange > in the parse
> that could explain the fact I have so few pages in the end. > > What can
> explain the fact I have so few pages which are parsed ? > > Thanks. --
> Markus Jelsma - CTO - Openindex

-- 
Markus Jelsma - CTO - Openindex