You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/02/06 19:57:19 UTC

[jira] [Commented] (LUCENE-4757) Cleanup FacetsAccumulator API path

    [ https://issues.apache.org/jira/browse/LUCENE-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572666#comment-13572666 ] 

Michael McCandless commented on LUCENE-4757:
--------------------------------------------

I tested perf of last patch on wikibig, with 7 facet dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                  IntNRQ        4.15      (2.6%)        3.88      (2.8%)   -6.4% ( -11% -   -1%)
                HighTerm       22.40      (3.0%)       21.05      (3.3%)   -6.0% ( -11% -    0%)
                 Prefix3       14.92      (2.3%)       14.15      (2.4%)   -5.2% (  -9% -    0%)
                 MedTerm       53.74      (2.5%)       51.02      (2.9%)   -5.1% ( -10% -    0%)
               OrHighLow       19.23      (2.8%)       18.35      (3.0%)   -4.6% ( -10% -    1%)
               OrHighMed       18.62      (2.8%)       17.77      (3.0%)   -4.6% ( -10% -    1%)
              OrHighHigh        9.79      (3.0%)        9.35      (3.1%)   -4.5% ( -10% -    1%)
                Wildcard       30.48      (1.7%)       29.44      (2.1%)   -3.4% (  -7% -    0%)
                 LowTerm      114.24      (1.6%)      112.06      (1.8%)   -1.9% (  -5% -    1%)
             AndHighHigh       23.91      (0.8%)       23.54      (1.3%)   -1.5% (  -3% -    0%)
                  Fuzzy1       48.93      (2.0%)       48.30      (2.0%)   -1.3% (  -5% -    2%)
                  Fuzzy2       56.09      (3.0%)       55.38      (2.4%)   -1.3% (  -6% -    4%)
                 Respell       46.99      (3.7%)       46.39      (2.9%)   -1.3% (  -7% -    5%)
               MedPhrase      120.51      (5.7%)      119.16      (6.0%)   -1.1% ( -12% -   11%)
        HighSloppyPhrase        0.94      (4.5%)        0.93      (6.1%)   -1.1% ( -11% -    9%)
         MedSloppyPhrase       26.59      (1.4%)       26.37      (2.4%)   -0.8% (  -4% -    3%)
               LowPhrase       21.67      (5.6%)       21.52      (6.1%)   -0.7% ( -11% -   11%)
              HighPhrase       17.80     (10.0%)       17.70     (10.7%)   -0.6% ( -19% -   22%)
              AndHighMed      108.97      (0.6%)      108.48      (0.9%)   -0.4% (  -1% -    1%)
         LowSloppyPhrase       20.81      (2.0%)       20.74      (2.2%)   -0.3% (  -4% -    3%)
             MedSpanNear       29.10      (1.3%)       29.03      (1.1%)   -0.2% (  -2% -    2%)
            HighSpanNear        3.57      (1.6%)        3.57      (1.3%)   -0.0% (  -2% -    2%)
             LowSpanNear        8.46      (2.2%)        8.46      (2.0%)    0.0% (  -4% -    4%)
              AndHighLow      665.03      (1.5%)      668.55      (2.0%)    0.5% (  -2% -    4%)
{noformat}

Looks like things got a bit slower ... not sure why.

                
> Cleanup FacetsAccumulator API path
> ----------------------------------
>
>                 Key: LUCENE-4757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4757
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4757.patch, LUCENE-4757.patch
>
>
> FacetsAccumulator and FacetRequest expose too many things to users, even when they are not needed, e.g. complements and partitions. Also, Aggregator is created per-FacetRequest, while in fact applied per category list. This is confusing, because if you want to do two aggregations, e.g. count and sum-score, you need to separate the two dimensions into two different category lists at indexing time.
> It's not so easy to refactor everything in one go, since there's a lot of code involved. So in this issue I will:
> * Remove complements from FacetRequest. It is only relevant to CountFacetRequest anyway. In the future, it should be a special Accumulator.
> * Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend it and handles all the stuff that's relevant to sampling, complements and partitions. Gradually, these things will be migrated to the new API, and hopefully StandardFacetsAccumulator will go away.
> * Aggregator is per-document. I could not break its API b/c some features (e.g. complement) depend on it. So rather I created a new FacetsAggregator, with a bulk, per-segment, API. So far migrated Counting and SumScore to that API.
> ** In the new API, you need to override FacetsAccumulator to define an Aggregator for use, the default is CountingFacetsAggregator.
> * Started to refactor FacetResultsHandler, which its API was guided by the use of partitions. I added a simple {{compute(FacetArrays)}} to it, which by default delegates to the nasty API, but overridden by specific classes. This will get cleaned further along too.
> * FacetRequest has a .getValueOf() which resolves an ordinal to its value (i.e. which of the two arrays to use). I added FacetRequest.FacetArraysSource and specialize when they are INT or FLOAT, creating a special FacetResultsHandler which does not go back to FR.getValueOf for every ordinal. I think that we can migrate other FacetResultsHandlers to behave like that ... at the expense of code duplication.
> ** I also added a TODO to get rid of getValueOf entirely .. will be done separately.
> * Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of a single FacetsCollector which collects matching documents, and optionally scores, per-segment. I wrote a migration class from these per-segment MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code works, but the new code works w/ the optimized per-segment API. I hope performance is still roughly the same w/ these changes too.
> There will be follow-on issues to migrate more features to the new API, and more cleanups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org