You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Gautam Worah (Jira)" <ji...@apache.org> on 2020/08/17 21:30:00 UTC
[jira] [Comment Edited] (LUCENE-9450) Taxonomy index should use
DocValues not StoredFields
[ https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179235#comment-17179235 ]
Gautam Worah edited comment on LUCENE-9450 at 8/17/20, 9:29 PM:
----------------------------------------------------------------
Benchmarks output:
{code:java}
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
Respell 389.13 (11.9%) 341.06 (17.2%) -12.4% ( -37% - 18%)
HighSloppyPhrase 354.72 (5.8%) 344.69 (4.6%) -2.8% ( -12% - 8%)
MedSloppyPhrase 754.17 (3.7%) 734.47 (9.1%) -2.6% ( -14% - 10%)
LowPhrase 422.80 (4.3%) 413.10 (7.4%) -2.3% ( -13% - 9%)
OrHighMed 666.99 (4.2%) 654.89 (6.0%) -1.8% ( -11% - 8%)
Prefix3 501.95 (7.1%) 494.34 (6.9%) -1.5% ( -14% - 13%)
MedSpanNear 924.07 (4.4%) 914.55 (5.6%) -1.0% ( -10% - 9%)
HighIntervalsOrdered 913.74 (2.0%) 905.72 (2.2%) -0.9% ( -4% - 3%)
MedTerm 3841.01 (3.0%) 3811.27 (3.2%) -0.8% ( -6% - 5%)
MedPhrase 912.31 (2.3%) 906.87 (2.7%) -0.6% ( -5% - 4%)
LowSpanNear 769.51 (13.1%) 766.22 (12.4%) -0.4% ( -22% - 28%)
BrowseDayOfYearSSDVFacets 2028.19 (2.8%) 2022.17 (2.2%) -0.3% ( -5% - 4%)
IntNRQ 1446.93 (2.4%) 1442.66 (1.6%) -0.3% ( -4% - 3%)
HighTermMonthSort 2631.06 (2.7%) 2628.64 (4.9%) -0.1% ( -7% - 7%)
AndHighLow 2713.80 (3.0%) 2711.42 (3.4%) -0.1% ( -6% - 6%)
HighTerm 2785.67 (2.7%) 2783.94 (3.6%) -0.1% ( -6% - 6%)
HighSpanNear 595.30 (11.4%) 595.48 (10.0%) 0.0% ( -19% - 24%)
BrowseMonthSSDVFacets 2367.18 (2.2%) 2369.20 (2.1%) 0.1% ( -4% - 4%)
AndHighHigh 885.75 (3.4%) 887.16 (3.9%) 0.2% ( -6% - 7%)
Wildcard 730.34 (11.7%) 732.30 (12.2%) 0.3% ( -21% - 27%)
HighPhrase 655.83 (3.5%) 658.13 (2.4%) 0.4% ( -5% - 6%)
HighTermDayOfYearSort 1724.90 (6.1%) 1731.23 (5.8%) 0.4% ( -10% - 13%)
LowTerm 4271.57 (2.7%) 4290.84 (3.9%) 0.5% ( -5% - 7%)
PKLookup 243.00 (2.9%) 244.62 (1.3%) 0.7% ( -3% - 5%)
LowSloppyPhrase 1702.96 (2.8%) 1718.94 (3.4%) 0.9% ( -5% - 7%)
Fuzzy1 398.68 (7.4%) 403.04 (4.8%) 1.1% ( -10% - 14%)
OrHighHigh 411.49 (9.1%) 417.82 (5.8%) 1.5% ( -12% - 18%)
OrHighLow 707.11 (4.2%) 718.78 (6.0%) 1.7% ( -8% - 12%)
AndHighMed 1110.43 (4.5%) 1134.33 (3.4%) 2.2% ( -5% - 10%)
Fuzzy2 46.38 (22.4%) 48.94 (19.1%) 5.5% ( -29% - 60%)
BrowseDateTaxoFacets 2996.90 (4.3%) 3212.31 (5.2%) 7.2% ( -2% - 17%)
BrowseDayOfYearTaxoFacets 2594.45 (2.8%) 2785.41 (3.7%) 7.4% ( 0% - 14%)
BrowseMonthTaxoFacets 2920.16 (3.7%) 3139.41 (3.9%) 7.5% ( 0% - 15%)
{code}
My [localrun.py|https://pastebin.com/YAXZQp4z] script
was (Author: gworah):
Benchmarks output:
{code:java}
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff
Respell 389.13 (11.9%) 341.06 (17.2%) -12.4% ( -37% - 18%)
HighSloppyPhrase 354.72 (5.8%) 344.69 (4.6%) -2.8% ( -12% - 8%)
MedSloppyPhrase 754.17 (3.7%) 734.47 (9.1%) -2.6% ( -14% - 10%)
LowPhrase 422.80 (4.3%) 413.10 (7.4%) -2.3% ( -13% - 9%)
OrHighMed 666.99 (4.2%) 654.89 (6.0%) -1.8% ( -11% - 8%)
Prefix3 501.95 (7.1%) 494.34 (6.9%) -1.5% ( -14% - 13%)
MedSpanNear 924.07 (4.4%) 914.55 (5.6%) -1.0% ( -10% - 9%)
HighIntervalsOrdered 913.74 (2.0%) 905.72 (2.2%) -0.9% ( -4% - 3%)
MedTerm 3841.01 (3.0%) 3811.27 (3.2%) -0.8% ( -6% - 5%)
MedPhrase 912.31 (2.3%) 906.87 (2.7%) -0.6% ( -5% - 4%)
LowSpanNear 769.51 (13.1%) 766.22 (12.4%) -0.4% ( -22% - 28%)
BrowseDayOfYearSSDVFacets 2028.19 (2.8%) 2022.17 (2.2%) -0.3% ( -5% - 4%)
IntNRQ 1446.93 (2.4%) 1442.66 (1.6%) -0.3% ( -4% - 3%)
HighTermMonthSort 2631.06 (2.7%) 2628.64 (4.9%) -0.1% ( -7% - 7%)
AndHighLow 2713.80 (3.0%) 2711.42 (3.4%) -0.1% ( -6% - 6%)
HighTerm 2785.67 (2.7%) 2783.94 (3.6%) -0.1% ( -6% - 6%)
HighSpanNear 595.30 (11.4%) 595.48 (10.0%) 0.0% ( -19% - 24%)
BrowseMonthSSDVFacets 2367.18 (2.2%) 2369.20 (2.1%) 0.1% ( -4% - 4%)
AndHighHigh 885.75 (3.4%) 887.16 (3.9%) 0.2% ( -6% - 7%)
Wildcard 730.34 (11.7%) 732.30 (12.2%) 0.3% ( -21% - 27%)
HighPhrase 655.83 (3.5%) 658.13 (2.4%) 0.4% ( -5% - 6%)
HighTermDayOfYearSort 1724.90 (6.1%) 1731.23 (5.8%) 0.4% ( -10% - 13%)
LowTerm 4271.57 (2.7%) 4290.84 (3.9%) 0.5% ( -5% - 7%)
PKLookup 243.00 (2.9%) 244.62 (1.3%) 0.7% ( -3% - 5%)
LowSloppyPhrase 1702.96 (2.8%) 1718.94 (3.4%) 0.9% ( -5% - 7%)
Fuzzy1 398.68 (7.4%) 403.04 (4.8%) 1.1% ( -10% - 14%)
OrHighHigh 411.49 (9.1%) 417.82 (5.8%) 1.5% ( -12% - 18%)
OrHighLow 707.11 (4.2%) 718.78 (6.0%) 1.7% ( -8% - 12%)
AndHighMed 1110.43 (4.5%) 1134.33 (3.4%) 2.2% ( -5% - 10%)
Fuzzy2 46.38 (22.4%) 48.94 (19.1%) 5.5% ( -29% - 60%)
BrowseDateTaxoFacets 2996.90 (4.3%) 3212.31 (5.2%) 7.2% ( -2% - 17%)
BrowseDayOfYearTaxoFacets 2594.45 (2.8%) 2785.41 (3.7%) 7.4% ( 0% - 14%)
BrowseMonthTaxoFacets 2920.16 (3.7%) 3139.41 (3.9%) 7.5% ( 0% - 15%)
{code}
My [localrun.py|https://pastebin.com/YAXZQp4z] script
> Taxonomy index should use DocValues not StoredFields
> ----------------------------------------------------
>
> Key: LUCENE-9450
> URL: https://issues.apache.org/jira/browse/LUCENE-9450
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: 8.5.2
> Reporter: Gautam Worah
> Priority: Minor
> Labels: performance
> Attachments: wip_taxonomy_patch
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org