You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by emmanuel Gosse <em...@gmail.com> on 2013/01/19 22:57:51 UTC

FieldCacheTermsFilter performance

Hi,

I would like to share a performance problem about FieldCacheTermsFilter
between 3.0.3 and 4.0.0 Lucene versions.

I've made tests with the same application with 3.0.3 (my production
version) and 4.0.0.
And I found a "big" difference of response time.

I run "real life" injection of 400 000 queries and I obtain the average of
time response.
I used to run this type of tests to validate that we have no performance
regression.

So I've made other tests to find out where comes this difference.
Desactivating faceting or changing Directory used or other more...

And for one test, I desactivated the filters (I use only
FieldCacheTermsFilter) and I obtained the same average of time response.

To give some data :
20 millions of documents
3 indexes under a multireader
no indexations, only searcher (indexation is not implemented in this app)
400 000 queries with jmeter

Test :

3.0.3 or 4.0.0
Queries without filters : 60ms (average of time response)

Queries with filters:
3.0.3 : 150ms
4.0.0 : 400ms

The code difference of my application is only the required one to plug with
each Lucene version.

The fields used to filter are not stored and in 4.0.0 version, are
stringfield.
I checked that caches of fieldCache dont move for the test.

I have no more ideas to seek. Maybe I've not understood which type of field
I should use.

Emmanuel

-----------
Emmanuel Gosse
Fnac.Com <http://www.fnac.com>

Re: FieldCacheTermsFilter performance

Posted by emmanuel Gosse <em...@gmail.com>.
Hi,

We have about 120 filters, half is selective but some filters are "boolean".


It's easy to find where the difference comes.

binarySearchLookup in DocTermsIndexImpl versus StringIndex :

In StringIndex, just a comparaison between Strings  :
int cmp = lookup[mid].compareTo(key);

In DocTermsIndexImpl, the BytesRef has to be retrieved :

public BytesRef lookup(int ord, BytesRef ret) {
      return bytes.fill(ret, termOrdToBytesOffset.get(ord));
}


Emmanuel


2013/1/20 Uwe Schindler <uw...@thetaphi.de>

> Hi,
>
> in Lucene 4.0 I would recommend to use TermsFilter (from queries module),
> not FieldCacheTermsFilter, because the term dictionary is much faster and
> it is in this case better to use the posting lists, instead of scanning all
> documents (which FCTermsCache does). How many filter terms do you have? Is
> the filter selective? To further improve, use CachingWrapperFilter, too
> (this will cache filter results, which is useful if you have a set of
> Filters/terms that are used quite often).
> The problem with FCTermsFilter is: It scans all documents from beginning
> to end and looks them up the terms cache. In Lucene 4.0 the structure of
> the FieldCache changed to be more memory efficient (which does not hurt the
> primary use-case of sorting), but scanning all documents and resolving all
> terms is not always the best option (this also heavily relies on your index
> structure, FCTermsFilter may still be faster under some circumstances).
>
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: emmanuel Gosse [mailto:emmanuel.gosse@gmail.com]
> > Sent: Saturday, January 19, 2013 10:58 PM
> > To: java-user@lucene.apache.org
> > Subject: FieldCacheTermsFilter performance
> >
> > Hi,
> >
> > I would like to share a performance problem about FieldCacheTermsFilter
> > between 3.0.3 and 4.0.0 Lucene versions.
> >
> > I've made tests with the same application with 3.0.3 (my production
> > version) and 4.0.0.
> > And I found a "big" difference of response time.
> >
> > I run "real life" injection of 400 000 queries and I obtain the average
> of time
> > response.
> > I used to run this type of tests to validate that we have no performance
> > regression.
> >
> > So I've made other tests to find out where comes this difference.
> > Desactivating faceting or changing Directory used or other more...
> >
> > And for one test, I desactivated the filters (I use only
> > FieldCacheTermsFilter) and I obtained the same average of time response.
> >
> > To give some data :
> > 20 millions of documents
> > 3 indexes under a multireader
> > no indexations, only searcher (indexation is not implemented in this app)
> > 400 000 queries with jmeter
> >
> > Test :
> >
> > 3.0.3 or 4.0.0
> > Queries without filters : 60ms (average of time response)
> >
> > Queries with filters:
> > 3.0.3 : 150ms
> > 4.0.0 : 400ms
> >
> > The code difference of my application is only the required one to plug
> with
> > each Lucene version.
> >
> > The fields used to filter are not stored and in 4.0.0 version, are
> stringfield.
> > I checked that caches of fieldCache dont move for the test.
> >
> > I have no more ideas to seek. Maybe I've not understood which type of
> field
> > I should use.
> >
> > Emmanuel
> >
> > -----------
> > Emmanuel Gosse
> > Fnac.Com <http://www.fnac.com>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Emmanuel Gosse
06 65 26 96 71

RE: FieldCacheTermsFilter performance

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

in Lucene 4.0 I would recommend to use TermsFilter (from queries module), not FieldCacheTermsFilter, because the term dictionary is much faster and it is in this case better to use the posting lists, instead of scanning all documents (which FCTermsCache does). How many filter terms do you have? Is the filter selective? To further improve, use CachingWrapperFilter, too (this will cache filter results, which is useful if you have a set of Filters/terms that are used quite often).
The problem with FCTermsFilter is: It scans all documents from beginning to end and looks them up the terms cache. In Lucene 4.0 the structure of the FieldCache changed to be more memory efficient (which does not hurt the primary use-case of sorting), but scanning all documents and resolving all terms is not always the best option (this also heavily relies on your index structure, FCTermsFilter may still be faster under some circumstances).

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: emmanuel Gosse [mailto:emmanuel.gosse@gmail.com]
> Sent: Saturday, January 19, 2013 10:58 PM
> To: java-user@lucene.apache.org
> Subject: FieldCacheTermsFilter performance
> 
> Hi,
> 
> I would like to share a performance problem about FieldCacheTermsFilter
> between 3.0.3 and 4.0.0 Lucene versions.
> 
> I've made tests with the same application with 3.0.3 (my production
> version) and 4.0.0.
> And I found a "big" difference of response time.
> 
> I run "real life" injection of 400 000 queries and I obtain the average of time
> response.
> I used to run this type of tests to validate that we have no performance
> regression.
> 
> So I've made other tests to find out where comes this difference.
> Desactivating faceting or changing Directory used or other more...
> 
> And for one test, I desactivated the filters (I use only
> FieldCacheTermsFilter) and I obtained the same average of time response.
> 
> To give some data :
> 20 millions of documents
> 3 indexes under a multireader
> no indexations, only searcher (indexation is not implemented in this app)
> 400 000 queries with jmeter
> 
> Test :
> 
> 3.0.3 or 4.0.0
> Queries without filters : 60ms (average of time response)
> 
> Queries with filters:
> 3.0.3 : 150ms
> 4.0.0 : 400ms
> 
> The code difference of my application is only the required one to plug with
> each Lucene version.
> 
> The fields used to filter are not stored and in 4.0.0 version, are stringfield.
> I checked that caches of fieldCache dont move for the test.
> 
> I have no more ideas to seek. Maybe I've not understood which type of field
> I should use.
> 
> Emmanuel
> 
> -----------
> Emmanuel Gosse
> Fnac.Com <http://www.fnac.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org