You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christoph Kaser <lu...@iconparc.de> on 2013/12/16 14:29:54 UTC

BooleanFilter vs BooleanQuery performance

Hi all,

from my tests on an index with 22 million entries, it seems that in many 
cases a BooleanFilter is a lot slower than an equivalent BooleanQuery.
Is this the expected behaviour? i would have expected a Filter to be at 
least as fast as a query, since it basically does the same thing, but 
without scoring.

Is there a better alternative to using a BooleanFilter?

Regards
Christoph

-- 
Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstrasse 1
80333 München

www.iconparc.de

Tel +49 -89- 15 90 06 - 21
Fax +49 -89- 15 90 06 - 49

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
121830, Amtsgericht München

  


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BooleanFilter vs BooleanQuery performance

Posted by Christoph Kaser <lu...@iconparc.de>.
Hi,

thank you for your explanation!
I used a wrapped BooleanQuery instead, this turned out to be a lot faster.

Christoph

Am 16.12.2013 15:19, schrieb Uwe Schindler:
> Hi,
>
> The problem with BooleanFilter is its implementation:
> It creates BitSets and AND/ORs them together. The BitSets are created, because you can cache them for later use (the main use-case for filters).
> In contrast, a query intersect the DocIdSetIterators directly. The good thing with this: If you have Queries that only match little documents, the other queries can then advance and leave out the doc ids not matching. BooleanFilter has to get all matching docIds from all filters.
>
> If you want it fast, use BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no scoring done (in most cases, older BooleanQuery sometimes still calculated the score).
>
> In general, BooleanFilter and ChainedFilter is in my opinion legacy code from older days and should no longer be used (unless you cache Filters and want to cache the boolFilter, too). This is why they are not part of Lucene's Core classes.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Christoph Kaser [mailto:lucene_list@iconparc.de]
>> Sent: Monday, December 16, 2013 2:30 PM
>> To: java-user@lucene.apache.org
>> Subject: BooleanFilter vs BooleanQuery performance
>>
>> Hi all,
>>
>> from my tests on an index with 22 million entries, it seems that in many cases
>> a BooleanFilter is a lot slower than an equivalent BooleanQuery.
>> Is this the expected behaviour? i would have expected a Filter to be at least
>> as fast as a query, since it basically does the same thing, but without scoring.
>>
>> Is there a better alternative to using a BooleanFilter?
>>
>> Regards
>> Christoph
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


-- 
Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstrasse 1
80333 München

www.iconparc.de

Tel +49 -89- 15 90 06 - 21
Fax +49 -89- 15 90 06 - 49

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
121830, Amtsgericht München

  


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: BooleanFilter vs BooleanQuery performance

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

The problem with BooleanFilter is its implementation:
It creates BitSets and AND/ORs them together. The BitSets are created, because you can cache them for later use (the main use-case for filters).
In contrast, a query intersect the DocIdSetIterators directly. The good thing with this: If you have Queries that only match little documents, the other queries can then advance and leave out the doc ids not matching. BooleanFilter has to get all matching docIds from all filters.

If you want it fast, use BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no scoring done (in most cases, older BooleanQuery sometimes still calculated the score).

In general, BooleanFilter and ChainedFilter is in my opinion legacy code from older days and should no longer be used (unless you cache Filters and want to cache the boolFilter, too). This is why they are not part of Lucene's Core classes.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Christoph Kaser [mailto:lucene_list@iconparc.de]
> Sent: Monday, December 16, 2013 2:30 PM
> To: java-user@lucene.apache.org
> Subject: BooleanFilter vs BooleanQuery performance
> 
> Hi all,
> 
> from my tests on an index with 22 million entries, it seems that in many cases
> a BooleanFilter is a lot slower than an equivalent BooleanQuery.
> Is this the expected behaviour? i would have expected a Filter to be at least
> as fast as a query, since it basically does the same thing, but without scoring.
> 
> Is there a better alternative to using a BooleanFilter?
> 
> Regards
> Christoph
> 
> --
> Dipl.-Inf. Christoph Kaser
> 
> IconParc GmbH
> Sophienstrasse 1
> 80333 München
> 
> www.iconparc.de
> 
> Tel +49 -89- 15 90 06 - 21
> Fax +49 -89- 15 90 06 - 49
> 
> Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer. HRB
> 121830, Amtsgericht München
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org