You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2018/05/18 08:22:00 UTC
[jira] [Commented] (LUCENE-8312) Leverage impacts for SynonymQuery
[ https://issues.apache.org/jira/browse/LUCENE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480331#comment-16480331 ]
Adrien Grand commented on LUCENE-8312:
--------------------------------------
Here is a patch which sums up term frequencies for each unique norm value in the impacts. I also did some refactorings to the way impacts are leveraged by TermScorer by introducing a new {{ImpactsDISI}} which abstracts how to leverage impacts to efficiently skip non-competitive documents. It is used by TermQuery, FeatureQuery and SynonymQuery, and maybe soon PhraseQuery as well.
I hacked luceneutil to run disjunctions as synonym queries to check the impact of this change when total hit counts are not tracked:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev Pct diff
HighTermMonthSort 158.74 (10.5%) 144.83 (10.2%) -8.8% ( -26% - 13%)
HighTerm 1460.56 (5.3%) 1395.35 (3.5%) -4.5% ( -12% - 4%)
HighTermDayOfYearSort 66.81 (9.3%) 64.08 (11.7%) -4.1% ( -22% - 18%)
AndHighHigh 33.33 (5.0%) 32.15 (3.5%) -3.5% ( -11% - 5%)
MedTerm 1738.21 (4.9%) 1687.75 (3.2%) -2.9% ( -10% - 5%)
LowTerm 3582.99 (3.4%) 3496.28 (3.9%) -2.4% ( -9% - 5%)
AndHighMed 154.32 (3.7%) 151.61 (2.7%) -1.8% ( -7% - 4%)
Prefix3 89.89 (5.0%) 89.15 (5.6%) -0.8% ( -10% - 10%)
IntNRQ 34.35 (13.9%) 34.21 (15.0%) -0.4% ( -25% - 33%)
LowPhrase 1815.14 (3.1%) 1809.71 (3.0%) -0.3% ( -6% - 6%)
MedPhrase 163.59 (1.4%) 163.20 (1.3%) -0.2% ( -2% - 2%)
HighSloppyPhrase 12.22 (4.8%) 12.19 (4.8%) -0.2% ( -9% - 9%)
Respell 195.28 (2.4%) 194.94 (1.9%) -0.2% ( -4% - 4%)
Wildcard 103.19 (2.7%) 103.02 (2.9%) -0.2% ( -5% - 5%)
Fuzzy2 159.47 (4.9%) 159.23 (7.6%) -0.2% ( -12% - 13%)
MedSloppyPhrase 58.26 (4.2%) 58.22 (4.5%) -0.1% ( -8% - 8%)
LowSloppyPhrase 61.14 (2.4%) 61.19 (2.6%) 0.1% ( -4% - 5%)
LowSpanNear 92.96 (3.7%) 93.13 (3.4%) 0.2% ( -6% - 7%)
MedSpanNear 48.08 (3.4%) 48.22 (3.3%) 0.3% ( -6% - 7%)
Fuzzy1 312.46 (6.6%) 313.81 (11.1%) 0.4% ( -16% - 19%)
HighSpanNear 7.00 (5.5%) 7.03 (5.6%) 0.4% ( -10% - 12%)
HighPhrase 27.40 (2.6%) 27.53 (2.9%) 0.5% ( -4% - 6%)
AndHighLow 1219.32 (3.6%) 1233.33 (4.1%) 1.1% ( -6% - 9%)
OrHighMed 30.41 (7.7%) 141.92 (13.6%) 366.6% ( 320% - 420%)
OrHighHigh 23.02 (7.3%) 145.78 (16.6%) 533.4% ( 474% - 601%)
OrHighLow 35.95 (7.7%) 234.72 (19.9%) 552.9% ( 488% - 628%)
{noformat}
> Leverage impacts for SynonymQuery
> ---------------------------------
>
> Key: LUCENE-8312
> URL: https://issues.apache.org/jira/browse/LUCENE-8312
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-8312.patch
>
>
> Now that we expose raw impacts, we could leverage them for synonym queries.
> It would be a matter of summing up term frequencies for each unique norm value.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org