You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gautam Worah (Jira)" <ji...@apache.org> on 2020/08/17 21:30:00 UTC

[jira] [Comment Edited] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

    [ https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179235#comment-17179235 ] 

Gautam Worah edited comment on LUCENE-9450 at 8/17/20, 9:29 PM:
----------------------------------------------------------------

Benchmarks output:

 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff
                 Respell      389.13     (11.9%)      341.06     (17.2%)  -12.4% ( -37% -   18%)
        HighSloppyPhrase      354.72      (5.8%)      344.69      (4.6%)   -2.8% ( -12% -    8%)
         MedSloppyPhrase      754.17      (3.7%)      734.47      (9.1%)   -2.6% ( -14% -   10%)
               LowPhrase      422.80      (4.3%)      413.10      (7.4%)   -2.3% ( -13% -    9%)
               OrHighMed      666.99      (4.2%)      654.89      (6.0%)   -1.8% ( -11% -    8%)
                 Prefix3      501.95      (7.1%)      494.34      (6.9%)   -1.5% ( -14% -   13%)
             MedSpanNear      924.07      (4.4%)      914.55      (5.6%)   -1.0% ( -10% -    9%)
    HighIntervalsOrdered      913.74      (2.0%)      905.72      (2.2%)   -0.9% (  -4% -    3%)
                 MedTerm     3841.01      (3.0%)     3811.27      (3.2%)   -0.8% (  -6% -    5%)
               MedPhrase      912.31      (2.3%)      906.87      (2.7%)   -0.6% (  -5% -    4%)
             LowSpanNear      769.51     (13.1%)      766.22     (12.4%)   -0.4% ( -22% -   28%)
BrowseDayOfYearSSDVFacets     2028.19      (2.8%)     2022.17      (2.2%)   -0.3% (  -5% -    4%)
                  IntNRQ     1446.93      (2.4%)     1442.66      (1.6%)   -0.3% (  -4% -    3%)
       HighTermMonthSort     2631.06      (2.7%)     2628.64      (4.9%)   -0.1% (  -7% -    7%)
              AndHighLow     2713.80      (3.0%)     2711.42      (3.4%)   -0.1% (  -6% -    6%)
                HighTerm     2785.67      (2.7%)     2783.94      (3.6%)   -0.1% (  -6% -    6%)
            HighSpanNear      595.30     (11.4%)      595.48     (10.0%)    0.0% ( -19% -   24%)
   BrowseMonthSSDVFacets     2367.18      (2.2%)     2369.20      (2.1%)    0.1% (  -4% -    4%)
             AndHighHigh      885.75      (3.4%)      887.16      (3.9%)    0.2% (  -6% -    7%)
                Wildcard      730.34     (11.7%)      732.30     (12.2%)    0.3% ( -21% -   27%)
              HighPhrase      655.83      (3.5%)      658.13      (2.4%)    0.4% (  -5% -    6%)
   HighTermDayOfYearSort     1724.90      (6.1%)     1731.23      (5.8%)    0.4% ( -10% -   13%)
                 LowTerm     4271.57      (2.7%)     4290.84      (3.9%)    0.5% (  -5% -    7%)
                PKLookup      243.00      (2.9%)      244.62      (1.3%)    0.7% (  -3% -    5%)
         LowSloppyPhrase     1702.96      (2.8%)     1718.94      (3.4%)    0.9% (  -5% -    7%)
                  Fuzzy1      398.68      (7.4%)      403.04      (4.8%)    1.1% ( -10% -   14%)
              OrHighHigh      411.49      (9.1%)      417.82      (5.8%)    1.5% ( -12% -   18%)
               OrHighLow      707.11      (4.2%)      718.78      (6.0%)    1.7% (  -8% -   12%)
              AndHighMed     1110.43      (4.5%)     1134.33      (3.4%)    2.2% (  -5% -   10%)
                  Fuzzy2       46.38     (22.4%)       48.94     (19.1%)    5.5% ( -29% -   60%)
    BrowseDateTaxoFacets     2996.90      (4.3%)     3212.31      (5.2%)    7.2% (  -2% -   17%)
BrowseDayOfYearTaxoFacets     2594.45      (2.8%)     2785.41      (3.7%)    7.4% (   0% -   14%)
   BrowseMonthTaxoFacets     2920.16      (3.7%)     3139.41      (3.9%)    7.5% (   0% -   15%)

{code}
My [localrun.py|https://pastebin.com/YAXZQp4z] script   

 

 

 


was (Author: gworah):
Benchmarks output:

 
{code:java}
                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff
                 Respell      389.13     (11.9%)      341.06     (17.2%)  -12.4% ( -37% -   18%)
        HighSloppyPhrase      354.72      (5.8%)      344.69      (4.6%)   -2.8% ( -12% -    8%)
         MedSloppyPhrase      754.17      (3.7%)      734.47      (9.1%)   -2.6% ( -14% -   10%)
               LowPhrase      422.80      (4.3%)      413.10      (7.4%)   -2.3% ( -13% -    9%)
               OrHighMed      666.99      (4.2%)      654.89      (6.0%)   -1.8% ( -11% -    8%)
                 Prefix3      501.95      (7.1%)      494.34      (6.9%)   -1.5% ( -14% -   13%)
             MedSpanNear      924.07      (4.4%)      914.55      (5.6%)   -1.0% ( -10% -    9%)
    HighIntervalsOrdered      913.74      (2.0%)      905.72      (2.2%)   -0.9% (  -4% -    3%)
                 MedTerm     3841.01      (3.0%)     3811.27      (3.2%)   -0.8% (  -6% -    5%)
               MedPhrase      912.31      (2.3%)      906.87      (2.7%)   -0.6% (  -5% -    4%)
             LowSpanNear      769.51     (13.1%)      766.22     (12.4%)   -0.4% ( -22% -   28%)
BrowseDayOfYearSSDVFacets     2028.19      (2.8%)     2022.17      (2.2%)   -0.3% (  -5% -    4%)
                  IntNRQ     1446.93      (2.4%)     1442.66      (1.6%)   -0.3% (  -4% -    3%)
       HighTermMonthSort     2631.06      (2.7%)     2628.64      (4.9%)   -0.1% (  -7% -    7%)
              AndHighLow     2713.80      (3.0%)     2711.42      (3.4%)   -0.1% (  -6% -    6%)
                HighTerm     2785.67      (2.7%)     2783.94      (3.6%)   -0.1% (  -6% -    6%)
            HighSpanNear      595.30     (11.4%)      595.48     (10.0%)    0.0% ( -19% -   24%)
   BrowseMonthSSDVFacets     2367.18      (2.2%)     2369.20      (2.1%)    0.1% (  -4% -    4%)
             AndHighHigh      885.75      (3.4%)      887.16      (3.9%)    0.2% (  -6% -    7%)
                Wildcard      730.34     (11.7%)      732.30     (12.2%)    0.3% ( -21% -   27%)
              HighPhrase      655.83      (3.5%)      658.13      (2.4%)    0.4% (  -5% -    6%)
   HighTermDayOfYearSort     1724.90      (6.1%)     1731.23      (5.8%)    0.4% ( -10% -   13%)
                 LowTerm     4271.57      (2.7%)     4290.84      (3.9%)    0.5% (  -5% -    7%)
                PKLookup      243.00      (2.9%)      244.62      (1.3%)    0.7% (  -3% -    5%)
         LowSloppyPhrase     1702.96      (2.8%)     1718.94      (3.4%)    0.9% (  -5% -    7%)
                  Fuzzy1      398.68      (7.4%)      403.04      (4.8%)    1.1% ( -10% -   14%)
              OrHighHigh      411.49      (9.1%)      417.82      (5.8%)    1.5% ( -12% -   18%)
               OrHighLow      707.11      (4.2%)      718.78      (6.0%)    1.7% (  -8% -   12%)
              AndHighMed     1110.43      (4.5%)     1134.33      (3.4%)    2.2% (  -5% -   10%)
                  Fuzzy2       46.38     (22.4%)       48.94     (19.1%)    5.5% ( -29% -   60%)
    BrowseDateTaxoFacets     2996.90      (4.3%)     3212.31      (5.2%)    7.2% (  -2% -   17%)
BrowseDayOfYearTaxoFacets     2594.45      (2.8%)     2785.41      (3.7%)    7.4% (   0% -   14%)
   BrowseMonthTaxoFacets     2920.16      (3.7%)     3139.41      (3.9%)    7.5% (   0% -   15%)

{code}
My [localrun.py|https://pastebin.com/YAXZQp4z] script   

 

 

 

> Taxonomy index should use DocValues not StoredFields
> ----------------------------------------------------
>
>                 Key: LUCENE-9450
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9450
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 8.5.2
>            Reporter: Gautam Worah
>            Priority: Minor
>              Labels: performance
>         Attachments: wip_taxonomy_patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org