You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/08/01 14:59:25 UTC
topN with maxNumSegments?
Hi,
Creating fetch lists with topN works nicely. It usually fetches just less than
topN. However, when i use maxNumSegments N in conjuction with topN it looks
like the first generates segment does not respect the topN setting.
Instead of 300.000 URL's to be fetched, the first of the generated segments
with maxNumSegments fetches many many many more.
Anyone knows what's going on? Did i misunderstand maxNumSegments? Is there a
bug here?
Thanks
Re: topN with maxNumSegments?
Posted by Markus Jelsma <ma...@openindex.io>.
for reference:
https://issues.apache.org/jira/browse/NUTCH-1074
On Monday 01 August 2011 16:00:13 Julien Nioche wrote:
> Hi Markus,
>
> Looks like a bug. TopN should be followed regardless of the number of
> segments. Could you please open a JIRA?
>
> Thanks
>
> Julien
>
> On 1 August 2011 13:59, Markus Jelsma <ma...@openindex.io> wrote:
> > Hi,
> >
> > Creating fetch lists with topN works nicely. It usually fetches just less
> > than
> > topN. However, when i use maxNumSegments N in conjuction with topN it
> > looks like the first generates segment does not respect the topN
> > setting.
> >
> > Instead of 300.000 URL's to be fetched, the first of the generated
> > segments with maxNumSegments fetches many many many more.
> >
> > Anyone knows what's going on? Did i misunderstand maxNumSegments? Is
> > there a
> > bug here?
> >
> > Thanks
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: topN with maxNumSegments?
Posted by Julien Nioche <li...@gmail.com>.
Hi Markus,
Looks like a bug. TopN should be followed regardless of the number of
segments. Could you please open a JIRA?
Thanks
Julien
On 1 August 2011 13:59, Markus Jelsma <ma...@openindex.io> wrote:
> Hi,
>
> Creating fetch lists with topN works nicely. It usually fetches just less
> than
> topN. However, when i use maxNumSegments N in conjuction with topN it looks
> like the first generates segment does not respect the topN setting.
>
> Instead of 300.000 URL's to be fetched, the first of the generated segments
> with maxNumSegments fetches many many many more.
>
> Anyone knows what's going on? Did i misunderstand maxNumSegments? Is there
> a
> bug here?
>
> Thanks
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com