You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/05/12 17:34:00 UTC
[jira] [Updated] (LUCENE-6458) MultiTermQuery's FILTER rewrite
method should support skipping whenever possible
[ https://issues.apache.org/jira/browse/LUCENE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6458:
---------------------------------
Attachment: LUCENE-6458.patch
wikimedium.10M.nostopwords.tasks
I did some more benchmarking of the change with filters (see attached tasks file) and various thresholds (and a fixed seed):
{noformat}
16
TaskQPS baseline StdDev QPS patch StdDev Pct diff
MTQ 24.33 (7.5%) 20.67 (7.3%) -15.1% ( -27% - 0%)
IntNRQ 20.38 (7.3%) 17.85 (11.9%) -12.4% ( -29% - 7%)
IntNRQ_50 8.94 (10.1%) 8.67 (8.6%) -3.0% ( -19% - 17%)
MTQ_50 9.05 (7.9%) 8.93 (5.3%) -1.3% ( -13% - 12%)
IntNRQ_10 13.72 (12.7%) 13.60 (11.9%) -0.9% ( -22% - 27%)
IntNRQ_1 17.53 (17.1%) 17.53 (16.3%) 0.0% ( -28% - 40%)
MTQ_10 13.70 (11.2%) 13.89 (8.7%) 1.4% ( -16% - 23%)
MTQ_1 19.11 (15.8%) 21.43 (18.0%) 12.1% ( -18% - 54%)
64
TaskQPS baseline StdDev QPS patch StdDev Pct diff
IntNRQ 20.53 (6.9%) 16.42 (5.3%) -20.0% ( -30% - -8%)
MTQ 24.31 (7.3%) 20.34 (6.4%) -16.3% ( -27% - -2%)
IntNRQ_50 8.87 (9.2%) 8.31 (6.5%) -6.3% ( -20% - 10%)
IntNRQ_10 13.55 (12.7%) 12.80 (10.2%) -5.6% ( -25% - 19%)
IntNRQ_1 17.27 (16.3%) 16.38 (13.1%) -5.2% ( -29% - 28%)
MTQ_50 9.00 (7.6%) 9.02 (4.5%) 0.3% ( -10% - 13%)
MTQ_10 13.65 (11.1%) 14.73 (8.2%) 7.9% ( -10% - 30%)
MTQ_1 18.95 (15.1%) 25.32 (17.2%) 33.6% ( 1% - 77%)
256
TaskQPS baseline StdDev QPS patch StdDev Pct diff
IntNRQ 20.43 (9.4%) 12.69 (1.7%) -37.9% ( -44% - -29%)
MTQ 24.13 (9.3%) 19.32 (5.3%) -19.9% ( -31% - -5%)
IntNRQ_1 17.21 (19.5%) 13.90 (7.7%) -19.2% ( -38% - 9%)
IntNRQ_10 13.49 (12.7%) 10.95 (5.7%) -18.8% ( -33% - 0%)
IntNRQ_50 8.85 (10.5%) 7.40 (3.8%) -16.4% ( -27% - -2%)
MTQ_50 8.94 (8.3%) 8.82 (4.4%) -1.3% ( -12% - 12%)
MTQ_10 13.53 (12.6%) 14.64 (5.9%) 8.2% ( -9% - 30%)
MTQ_1 18.88 (15.6%) 26.52 (14.2%) 40.5% ( 9% - 83%)
1024
TaskQPS baseline StdDev QPS patch StdDev Pct diff
IntNRQ 20.40 (7.7%) 6.54 (1.5%) -67.9% ( -71% - -63%)
IntNRQ_1 17.57 (17.2%) 8.27 (2.9%) -52.9% ( -62% - -39%)
IntNRQ_10 13.66 (13.0%) 6.72 (2.4%) -50.8% ( -58% - -40%)
IntNRQ_50 8.96 (10.4%) 5.01 (1.5%) -44.1% ( -50% - -35%)
MTQ 24.41 (8.2%) 18.07 (4.4%) -26.0% ( -35% - -14%)
MTQ_50 9.05 (8.1%) 8.65 (3.5%) -4.5% ( -14% - 7%)
MTQ_10 13.60 (11.5%) 14.41 (3.9%) 6.0% ( -8% - 24%)
MTQ_1 19.11 (15.6%) 27.32 (12.9%) 43.0% ( 12% - 84%)
{noformat}
Rewriting to a BooleanQuery never helps when there is no filter, but something that the benchmark doesn't capture is that at least BooleanQuery does not allocate O(maxDoc) memory which can matter for large datasets.
When there are filters, it's more complicated, it depends on the density of the filter, on the number of terms and also apparently on how frequencies of the different terms compare (this is my current theory for why WildcardQuery performs better than NRQ).
Net/net I think this validates that 64 would be a good threshold to rewrite, with a minimum slowdown when filters are dense, and interesting speedups when filters are sparse?
> MultiTermQuery's FILTER rewrite method should support skipping whenever possible
> --------------------------------------------------------------------------------
>
> Key: LUCENE-6458
> URL: https://issues.apache.org/jira/browse/LUCENE-6458
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6458.patch, LUCENE-6458.patch, wikimedium.10M.nostopwords.tasks
>
>
> Today MultiTermQuery's FILTER rewrite always builds a bit set fom all matching terms. This means that we need to consume the entire postings lists of all matching terms. Instead we should try to execute like regular disjunctions when there are few terms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org