You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/12/28 06:44:46 UTC

[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries

    [ https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975421#action_12975421 ] 

Robert Muir commented on LUCENE-2836:
-------------------------------------

Here's some results from my silly wildcard benchmarker (I think luceneutil doesnt yet have a keyword title or similar field for this):

(using 10M docs with single valued numeric field, so 10M terms too)

in general its a stupid rewrite method, unless your users are typing in truly horrific queries and then its better.

||Pattern||no. matching docs||avgms (filter)||avgms (fieldcache)||
|N?N?N?N|1000|35.9|52.5|
|?NNNNNN|10|3.1|44.2|
|??NNNNN|100|5.5|45.6|
|???NNNN|1000|44.7|48.5|
|????NNN|10000|141.8|67.9|
|NN??NNN|100|3.6|41.5|
|NN?N\*|10000|5.3|42.7|
|?NN\*|100000|25.9|50.8|
|\*N|1000000|1639.2|446.8|
|\*N\*|5217031|2089.4|701.2|
|\*NN\*|590040|1811.6|674.8|


> FieldCache rewrite method for MultiTermQueries
> ----------------------------------------------
>
>                 Key: LUCENE-2836
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2836
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2836.patch
>
>
> For some MultiTermQueries, like RangeQuery we have a FieldCacheRangeFilter etc (in this case its particularly optimized).
> But in the general case, since LUCENE-2784 we can now have a rewrite method to rewrite any MultiTermQuery 
> using the FieldCache, because MultiTermQuery's getEnum no longer takes IndexReader but Terms, and all the 
> FilteredTermsEnums are now just real TermsEnum decorators.
> In cases like low frequency queries this is actually slower (I think this has been shown for numeric ranges before too),
> but for the really high-frequency cases like especially ugly wildcards, regexes, fuzzies, etc, this can be several times faster 
> using the FieldCache instead, since all the terms are in RAM and automaton can blast through them quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org