You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/01/23 09:28:12 UTC

[jira] [Commented] (LUCENE-4709) Nuke FacetResultNode.residue

    [ https://issues.apache.org/jira/browse/LUCENE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560476#comment-13560476 ] 

Shai Erera commented on LUCENE-4709:
------------------------------------

BTW, a somewhat supporting evidence that we should nuke it, are the following benchmark results (thanks Mike!). Base is trunk, comp is trunk + no residue computation:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                 Respell      111.64      (3.2%)      110.49      (3.2%)   -1.0% (  -7% -    5%)
              OrHighHigh        4.33      (2.8%)        4.30      (3.0%)   -0.7% (  -6% -    5%)
            HighSpanNear        2.98      (2.3%)        2.97      (2.0%)   -0.4% (  -4% -    3%)
        HighSloppyPhrase        0.89      (8.9%)        0.89      (8.2%)   -0.3% ( -15% -   18%)
                HighTerm        7.95      (2.3%)        7.93      (2.4%)   -0.2% (  -4% -    4%)
               OrHighLow        7.57      (2.2%)        7.55      (2.3%)   -0.2% (  -4% -    4%)
               OrHighMed        7.51      (2.7%)        7.51      (2.8%)    0.1% (  -5% -    5%)
                Wildcard       74.46      (3.6%)       74.54      (2.0%)    0.1% (  -5% -    5%)
                PKLookup      247.56      (2.1%)      247.85      (2.8%)    0.1% (  -4% -    5%)
             LowSpanNear        7.54      (4.6%)        7.59      (3.6%)    0.7% (  -7% -    9%)
             AndHighHigh       12.56      (0.9%)       12.68      (1.0%)    0.9% (  -1% -    2%)
             MedSpanNear       19.88      (1.5%)       20.08      (2.2%)    1.0% (  -2% -    4%)
         MedSloppyPhrase       18.45      (2.1%)       18.64      (2.1%)    1.0% (  -3% -    5%)
         LowSloppyPhrase       17.52      (3.7%)       17.71      (3.8%)    1.1% (  -6% -    8%)
                 Prefix3       45.70      (5.6%)       46.25      (2.7%)    1.2% (  -6% -   10%)
               LowPhrase       16.86      (3.4%)       17.07      (3.1%)    1.2% (  -5% -    8%)
                 MedTerm       23.00      (1.4%)       23.33      (1.8%)    1.4% (  -1% -    4%)
                  IntNRQ       17.97      (7.8%)       18.26      (4.7%)    1.6% ( -10% -   15%)
              HighPhrase       15.71      (7.0%)       15.98      (5.2%)    1.7% (  -9% -   15%)
                  Fuzzy1       33.30      (1.8%)       33.90      (1.3%)    1.8% (  -1% -    5%)
                  Fuzzy2       41.46      (2.2%)       42.26      (2.0%)    1.9% (  -2% -    6%)
                 LowTerm       40.47      (1.1%)       41.45      (1.7%)    2.4% (   0% -    5%)
              AndHighMed       49.38      (0.9%)       51.08      (1.3%)    3.4% (   1% -    5%)
               MedPhrase       55.65      (2.7%)       57.79      (2.5%)    3.8% (  -1% -    9%)
              AndHighLow       98.02      (1.5%)      104.36      (2.9%)    6.5% (   2% -   10%)
{noformat}
                
> Nuke FacetResultNode.residue
> ----------------------------
>
>                 Key: LUCENE-4709
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4709
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>
> The residue is the count of all categories that did not make it to the top K. But, this is a senseless statistic. Take for example the following case: two documents with categories [A/1, A/2, A/3] and [A/1, A/4, A/5]. If you ask for top-1 category of "A", you'll get A (count=2), A/1 (count=2), but A's residue will be 4!
> As a user, that number doesn't tell you anything, except maybe when you index only one category per document for a given dimension. But in that case, the residue is {{root.value - sum(topK.value)}}, which the application can compute if it needs to.
> In short, we're just wasting CPU cycles for that statistic, so I'm going to remove it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org