You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/04/29 13:35:05 UTC
[jira] [Updated] (LUCENE-6458) MultiTermQuery's FILTER rewrite
method should support skipping whenever possible
[ https://issues.apache.org/jira/browse/LUCENE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6458:
---------------------------------
Attachment: LUCENE-6458.patch
Here is a patch, it is quite similar to the old "auto" rewrite except that it rewrites per segment and only consumes the filtered terms enum once. Queries are executed as regular disjunctions when there are 50 matching terms or less.
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
Prefix3 113.17 (1.7%) 88.55 (2.7%) -21.8% ( -25% - -17%)
Wildcard 37.43 (2.0%) 36.26 (3.2%) -3.1% ( -8% - 2%)
HighSpanNear 4.30 (2.6%) 4.24 (4.0%) -1.6% ( -7% - 5%)
OrHighNotLow 71.52 (1.5%) 70.51 (3.1%) -1.4% ( -5% - 3%)
HighSloppyPhrase 20.60 (6.3%) 20.34 (7.6%) -1.3% ( -14% - 13%)
OrHighNotMed 96.14 (2.0%) 95.11 (2.8%) -1.1% ( -5% - 3%)
MedPhrase 23.49 (1.8%) 23.30 (3.5%) -0.8% ( -6% - 4%)
Respell 62.25 (8.9%) 62.01 (7.4%) -0.4% ( -15% - 17%)
AndHighHigh 52.43 (0.7%) 52.27 (1.1%) -0.3% ( -2% - 1%)
OrNotHighHigh 26.08 (3.5%) 26.02 (1.0%) -0.2% ( -4% - 4%)
OrHighNotHigh 61.96 (2.0%) 61.85 (2.1%) -0.2% ( -4% - 4%)
IntNRQ 8.03 (3.1%) 8.02 (2.6%) -0.2% ( -5% - 5%)
LowTerm 783.62 (4.9%) 783.25 (4.5%) -0.0% ( -9% - 9%)
MedSpanNear 18.77 (1.9%) 18.76 (3.6%) -0.0% ( -5% - 5%)
LowSpanNear 14.49 (2.5%) 14.49 (2.6%) -0.0% ( -4% - 5%)
MedTerm 237.81 (2.1%) 237.76 (3.0%) -0.0% ( -4% - 5%)
PKLookup 266.15 (2.5%) 266.38 (2.5%) 0.1% ( -4% - 5%)
OrHighMed 50.61 (6.0%) 50.68 (6.1%) 0.1% ( -11% - 13%)
Fuzzy2 19.87 (4.4%) 19.92 (7.8%) 0.2% ( -11% - 12%)
OrNotHighMed 90.03 (1.1%) 90.25 (0.8%) 0.2% ( -1% - 2%)
HighPhrase 15.56 (2.0%) 15.61 (2.7%) 0.3% ( -4% - 5%)
MedSloppyPhrase 252.97 (5.2%) 253.93 (4.3%) 0.4% ( -8% - 10%)
LowPhrase 8.16 (1.7%) 8.21 (1.9%) 0.6% ( -2% - 4%)
HighTerm 115.17 (2.4%) 116.05 (2.7%) 0.8% ( -4% - 6%)
OrHighHigh 25.19 (5.7%) 25.45 (6.4%) 1.0% ( -10% - 13%)
OrHighLow 42.12 (7.5%) 42.60 (6.9%) 1.1% ( -12% - 16%)
LowSloppyPhrase 129.20 (1.6%) 130.68 (2.0%) 1.2% ( -2% - 4%)
AndHighMed 231.64 (1.3%) 235.28 (2.1%) 1.6% ( -1% - 4%)
AndHighLow 733.51 (3.9%) 751.08 (3.5%) 2.4% ( -4% - 10%)
Fuzzy1 85.42 (17.0%) 91.04 (5.9%) 6.6% ( -13% - 35%)
OrNotHighLow 893.55 (2.9%) 962.35 (4.6%) 7.7% ( 0% - 15%)
{noformat}
I was hoping it would kick in for numeric range queries but unfortunately they often need to match hundreds of terms. I'm wondering if it would be different for auto-prefix.
Prefix3 and Wildcard are a bit slower because these ones get actually executed as regular disjunctions. I think the slowdown is fair given that it also requires less memory and provides true skipping support (which the benchmark doesn't use).
> MultiTermQuery's FILTER rewrite method should support skipping whenever possible
> --------------------------------------------------------------------------------
>
> Key: LUCENE-6458
> URL: https://issues.apache.org/jira/browse/LUCENE-6458
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6458.patch
>
>
> Today MultiTermQuery's FILTER rewrite always builds a bit set fom all matching terms. This means that we need to consume the entire postings lists of all matching terms. Instead we should try to execute like regular disjunctions when there are few terms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org