You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/08/01 14:59:25 UTC

topN with maxNumSegments?

Hi,

Creating fetch lists with topN works nicely. It usually fetches just less than 
topN. However, when i use maxNumSegments N in conjuction with topN it looks 
like the first generates segment does not respect the topN setting.

Instead of 300.000 URL's to be fetched, the first of the generated segments 
with maxNumSegments fetches many many many more.

Anyone knows what's going on? Did i misunderstand maxNumSegments? Is there a 
bug here?

Thanks

Re: topN with maxNumSegments?

Posted by Markus Jelsma <ma...@openindex.io>.
for reference:
https://issues.apache.org/jira/browse/NUTCH-1074

On Monday 01 August 2011 16:00:13 Julien Nioche wrote:
> Hi Markus,
> 
> Looks like a bug. TopN should be followed regardless of the number of
> segments. Could you please open a JIRA?
> 
> Thanks
> 
> Julien
> 
> On 1 August 2011 13:59, Markus Jelsma <ma...@openindex.io> wrote:
> > Hi,
> > 
> > Creating fetch lists with topN works nicely. It usually fetches just less
> > than
> > topN. However, when i use maxNumSegments N in conjuction with topN it
> > looks like the first generates segment does not respect the topN
> > setting.
> > 
> > Instead of 300.000 URL's to be fetched, the first of the generated
> > segments with maxNumSegments fetches many many many more.
> > 
> > Anyone knows what's going on? Did i misunderstand maxNumSegments? Is
> > there a
> > bug here?
> > 
> > Thanks

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: topN with maxNumSegments?

Posted by Julien Nioche <li...@gmail.com>.
Hi Markus,

Looks like a bug. TopN should be followed regardless of the number of
segments. Could you please open a JIRA?

Thanks

Julien

On 1 August 2011 13:59, Markus Jelsma <ma...@openindex.io> wrote:

> Hi,
>
> Creating fetch lists with topN works nicely. It usually fetches just less
> than
> topN. However, when i use maxNumSegments N in conjuction with topN it looks
> like the first generates segment does not respect the topN setting.
>
> Instead of 300.000 URL's to be fetched, the first of the generated segments
> with maxNumSegments fetches many many many more.
>
> Anyone knows what's going on? Did i misunderstand maxNumSegments? Is there
> a
> bug here?
>
> Thanks
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com