You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2018/05/18 08:22:00 UTC

[jira] [Commented] (LUCENE-8312) Leverage impacts for SynonymQuery

    [ https://issues.apache.org/jira/browse/LUCENE-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480331#comment-16480331 ] 

Adrien Grand commented on LUCENE-8312:
--------------------------------------

Here is a patch which sums up term frequencies for each unique norm value in the impacts. I also did some refactorings to the way impacts are leveraged by TermScorer by introducing a new {{ImpactsDISI}} which abstracts how to leverage impacts to efficiently skip non-competitive documents. It is used by TermQuery, FeatureQuery and SynonymQuery, and maybe soon PhraseQuery as well.

I hacked luceneutil to run disjunctions as synonym queries to check the impact of this change when total hit counts are not tracked:
  
{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev                Pct diff
       HighTermMonthSort      158.74     (10.5%)      144.83     (10.2%)   -8.8% ( -26% -   13%)
                HighTerm     1460.56      (5.3%)     1395.35      (3.5%)   -4.5% ( -12% -    4%)
   HighTermDayOfYearSort       66.81      (9.3%)       64.08     (11.7%)   -4.1% ( -22% -   18%)
             AndHighHigh       33.33      (5.0%)       32.15      (3.5%)   -3.5% ( -11% -    5%)
                 MedTerm     1738.21      (4.9%)     1687.75      (3.2%)   -2.9% ( -10% -    5%)
                 LowTerm     3582.99      (3.4%)     3496.28      (3.9%)   -2.4% (  -9% -    5%)
              AndHighMed      154.32      (3.7%)      151.61      (2.7%)   -1.8% (  -7% -    4%)
                 Prefix3       89.89      (5.0%)       89.15      (5.6%)   -0.8% ( -10% -   10%)
                  IntNRQ       34.35     (13.9%)       34.21     (15.0%)   -0.4% ( -25% -   33%)
               LowPhrase     1815.14      (3.1%)     1809.71      (3.0%)   -0.3% (  -6% -    6%)
               MedPhrase      163.59      (1.4%)      163.20      (1.3%)   -0.2% (  -2% -    2%)
        HighSloppyPhrase       12.22      (4.8%)       12.19      (4.8%)   -0.2% (  -9% -    9%)
                 Respell      195.28      (2.4%)      194.94      (1.9%)   -0.2% (  -4% -    4%)
                Wildcard      103.19      (2.7%)      103.02      (2.9%)   -0.2% (  -5% -    5%)
                  Fuzzy2      159.47      (4.9%)      159.23      (7.6%)   -0.2% ( -12% -   13%)
         MedSloppyPhrase       58.26      (4.2%)       58.22      (4.5%)   -0.1% (  -8% -    8%)
         LowSloppyPhrase       61.14      (2.4%)       61.19      (2.6%)    0.1% (  -4% -    5%)
             LowSpanNear       92.96      (3.7%)       93.13      (3.4%)    0.2% (  -6% -    7%)
             MedSpanNear       48.08      (3.4%)       48.22      (3.3%)    0.3% (  -6% -    7%)
                  Fuzzy1      312.46      (6.6%)      313.81     (11.1%)    0.4% ( -16% -   19%)
            HighSpanNear        7.00      (5.5%)        7.03      (5.6%)    0.4% ( -10% -   12%)
              HighPhrase       27.40      (2.6%)       27.53      (2.9%)    0.5% (  -4% -    6%)
              AndHighLow     1219.32      (3.6%)     1233.33      (4.1%)    1.1% (  -6% -    9%)
               OrHighMed       30.41      (7.7%)      141.92     (13.6%)  366.6% ( 320% -  420%)
              OrHighHigh       23.02      (7.3%)      145.78     (16.6%)  533.4% ( 474% -  601%)
               OrHighLow       35.95      (7.7%)      234.72     (19.9%)  552.9% ( 488% -  628%)
{noformat}

> Leverage impacts for SynonymQuery
> ---------------------------------
>
>                 Key: LUCENE-8312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8312
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8312.patch
>
>
> Now that we expose raw impacts, we could leverage them for synonym queries.
> It would be a matter of summing up term frequencies for each unique norm value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org