You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Wang <jo...@gmail.com> on 2009/01/07 07:25:17 UTC

TermScorer default buffer size

Hi:

   The default buffer size (for docid,score etc) is 32 in TermScorer.

    We have a large index with some terms to have very dense doc sets. By
increasing the buffer size we see very dramatic performance improvements.

    With our index (may not be typical), here are some numbers with buffer
size w.r.t. performance in our query (a large OR query):

    Buffer-size  improvement
2042 -       22.0 %
4084 -       39.1 %
8172 -       51.1 %

    I understand this may not be suitable for every application, so do you
think it makes sense to make this buffer size configurable?

Thanks

-John

Re: TermScorer default buffer size

Posted by Paul Elschot <pa...@xs4all.nl>.
On Friday 09 January 2009 05:29:15 John Wang wrote:
> Makes sense.
> I didn't think 32 was the empirically determined magic number ;)

That number does have a history, but I don't know the details.
 
> Are you planning to do a patch for this?

No, but could you open an issue and mention the performance
improvements?

Regards,
Paul Elschot


> 
> -John
> 
> On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <pa...@xs4all.nl> wrote:
> 
> > John,
> >
> > Continuing, see below.
> >
> > On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> > > On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > > > Hi:
> > > >
> > > >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > > >
> > > >     We have a large index with some terms to have very dense doc sets.
> > By
> > > > increasing the buffer size we see very dramatic performance
> > improvements.
> > > >
> > > >     With our index (may not be typical), here are some numbers with
> > buffer
> > > > size w.r.t. performance in our query (a large OR query):
> > > >
> > > >     Buffer-size  improvement
> > > > 2042 -       22.0 %
> > > > 4084 -       39.1 %
> > > > 8172 -       51.1 %
> > > >
> > > >     I understand this may not be suitable for every application, so do
> > you
> > > > think it makes sense to make this buffer size configurable?
> > > >
> > >
> > > Ideally the TermScorer buffer size could be set to a size depending on
> > > the query structure, but there is no facility for this yet.
> > > For OR queries larger buffers help, but not for AND queries.
> > > See also LUCENE-430 on reducing buffer sizes for the underlying
> > > TermDocs for very sparse doc sets.
> >
> > It may be possible to change the TermScorer buffer size dynamically.
> > For OR queries TermScorer.next() is used, and for AND queries
> > TermScorer.skipTo() is used.
> > That means that when the buffer runs out during TermScorer.next(),
> > it could be enlarged, for example by doubling (or quadrupling) the size
> > to a configurable maximum of 8K or even 16K, see above. When
> > TermScorer.skipTo() runs out of the buffer it could leave the buffer
> > size unchanged.
> >
> > This involves some memory allocation during search.
> > That is unusual, but it could be worthwhile given the
> > performance improvement.
> >
> > Regards,
> > Paul Elschot
> >
> 

Re: TermScorer default buffer size

Posted by John Wang <jo...@gmail.com>.
Makes sense.
I didn't think 32 was the empirically determined magic number ;)

Are you planning to do a patch for this?

-John

On Thu, Jan 8, 2009 at 1:27 AM, Paul Elschot <pa...@xs4all.nl> wrote:

> John,
>
> Continuing, see below.
>
> On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> > On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > > Hi:
> > >
> > >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > >
> > >     We have a large index with some terms to have very dense doc sets.
> By
> > > increasing the buffer size we see very dramatic performance
> improvements.
> > >
> > >     With our index (may not be typical), here are some numbers with
> buffer
> > > size w.r.t. performance in our query (a large OR query):
> > >
> > >     Buffer-size  improvement
> > > 2042 -       22.0 %
> > > 4084 -       39.1 %
> > > 8172 -       51.1 %
> > >
> > >     I understand this may not be suitable for every application, so do
> you
> > > think it makes sense to make this buffer size configurable?
> > >
> >
> > Ideally the TermScorer buffer size could be set to a size depending on
> > the query structure, but there is no facility for this yet.
> > For OR queries larger buffers help, but not for AND queries.
> > See also LUCENE-430 on reducing buffer sizes for the underlying
> > TermDocs for very sparse doc sets.
>
> It may be possible to change the TermScorer buffer size dynamically.
> For OR queries TermScorer.next() is used, and for AND queries
> TermScorer.skipTo() is used.
> That means that when the buffer runs out during TermScorer.next(),
> it could be enlarged, for example by doubling (or quadrupling) the size
> to a configurable maximum of 8K or even 16K, see above. When
> TermScorer.skipTo() runs out of the buffer it could leave the buffer
> size unchanged.
>
> This involves some memory allocation during search.
> That is unusual, but it could be worthwhile given the
> performance improvement.
>
> Regards,
> Paul Elschot
>

Re: TermScorer default buffer size

Posted by Paul Elschot <pa...@xs4all.nl>.
John, 

Continuing, see below.

On Wednesday 07 January 2009 14:24:15 Paul Elschot wrote:
> On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> > Hi:
> > 
> >    The default buffer size (for docid,score etc) is 32 in TermScorer.
> > 
> >     We have a large index with some terms to have very dense doc sets. By
> > increasing the buffer size we see very dramatic performance improvements.
> > 
> >     With our index (may not be typical), here are some numbers with buffer
> > size w.r.t. performance in our query (a large OR query):
> > 
> >     Buffer-size  improvement
> > 2042 -       22.0 %
> > 4084 -       39.1 %
> > 8172 -       51.1 %
> > 
> >     I understand this may not be suitable for every application, so do you
> > think it makes sense to make this buffer size configurable?
> > 
> 
> Ideally the TermScorer buffer size could be set to a size depending on
> the query structure, but there is no facility for this yet.
> For OR queries larger buffers help, but not for AND queries.
> See also LUCENE-430 on reducing buffer sizes for the underlying
> TermDocs for very sparse doc sets.

It may be possible to change the TermScorer buffer size dynamically.
For OR queries TermScorer.next() is used, and for AND queries
TermScorer.skipTo() is used.
That means that when the buffer runs out during TermScorer.next(),
it could be enlarged, for example by doubling (or quadrupling) the size
to a configurable maximum of 8K or even 16K, see above. When
TermScorer.skipTo() runs out of the buffer it could leave the buffer
size unchanged.

This involves some memory allocation during search.
That is unusual, but it could be worthwhile given the
performance improvement.

Regards,
Paul Elschot

Re: TermScorer default buffer size

Posted by Paul Elschot <pa...@xs4all.nl>.
On Wednesday 07 January 2009 07:25:17 John Wang wrote:
> Hi:
> 
>    The default buffer size (for docid,score etc) is 32 in TermScorer.
> 
>     We have a large index with some terms to have very dense doc sets. By
> increasing the buffer size we see very dramatic performance improvements.
> 
>     With our index (may not be typical), here are some numbers with buffer
> size w.r.t. performance in our query (a large OR query):
> 
>     Buffer-size  improvement
> 2042 -       22.0 %
> 4084 -       39.1 %
> 8172 -       51.1 %
> 
>     I understand this may not be suitable for every application, so do you
> think it makes sense to make this buffer size configurable?
> 

Ideally the TermScorer buffer size could be set to a size depending on
the query structure, but there is no facility for this yet.
For OR queries larger buffers help, but not for AND queries.
See also LUCENE-430 on reducing buffer sizes for the underlying
TermDocs for very sparse doc sets.

Regards,
Paul Elschot