You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ankit Jain (JIRA)" <ji...@apache.org> on 2019/02/04 19:36:00 UTC
[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST
offheap using mmap
[ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118 ]
Ankit Jain edited comment on LUCENE-8635 at 2/4/19 7:35 PM:
------------------------------------------------------------
I have created [pull request|https://github.com/apache/lucene-solr/pull/563] with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance. This does not make sense to me, might be my perf run setup.
{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline StdDevQPS candidate StdDev Pct diff
PKLookup 117.45 (2.2%) 108.72 (2.3%) -7.4% ( -11% - -3%)
OrHighNotMed 1094.23 (2.5%) 1057.88 (2.7%) -3.3% ( -8% - 1%)
OrHighNotLow 1047.30 (1.7%) 1012.91 (2.5%) -3.3% ( -7% - 1%)
Fuzzy2 44.10 (2.3%) 42.71 (2.7%) -3.2% ( -7% - 1%)
OrNotHighLow 1022.67 (2.5%) 992.28 (2.4%) -3.0% ( -7% - 1%)
BrowseDayOfYearTaxoFacets 7907.19 (2.0%) 7677.99 (2.7%) -2.9% ( -7% - 1%)
OrNotHighMed 866.37 (1.9%) 843.10 (2.3%) -2.7% ( -6% - 1%)
LowTerm 2103.58 (3.5%) 2048.98 (3.6%) -2.6% ( -9% - 4%)
BrowseMonthTaxoFacets 7883.86 (2.0%) 7692.48 (2.1%) -2.4% ( -6% - 1%)
Fuzzy1 64.44 (1.9%) 62.88 (2.3%) -2.4% ( -6% - 1%)
OrNotHighHigh 779.27 (2.0%) 761.04 (2.1%) -2.3% ( -6% - 1%)
Respell 55.60 (2.6%) 54.34 (2.3%) -2.3% ( -7% - 2%)
OrHighNotHigh 877.28 (2.2%) 858.10 (2.5%) -2.2% ( -6% - 2%)
BrowseMonthSSDVFacets 14.85 (7.9%) 14.57 (10.7%) -1.9% ( -18% - 18%)
MedTerm 1984.26 (3.6%) 1947.76 (2.3%) -1.8% ( -7% - 4%)
AndHighLow 718.71 (1.5%) 706.06 (1.6%) -1.8% ( -4% - 1%)
OrHighLow 523.40 (2.5%) 515.56 (2.4%) -1.5% ( -6% - 3%)
HighTerm 1381.10 (2.9%) 1360.80 (2.7%) -1.5% ( -6% - 4%)
HighTermMonthSort 120.45 (12.3%) 119.00 (16.4%) -1.2% ( -26% - 31%)
BrowseDayOfYearSSDVFacets 11.55 (9.7%) 11.45 (10.0%) -0.8% ( -18% - 20%)
AndHighMed 155.15 (2.6%) 154.25 (2.4%) -0.6% ( -5% - 4%)
OrHighMed 88.00 (2.5%) 87.85 (2.7%) -0.2% ( -5% - 5%)
LowPhrase 80.53 (1.6%) 80.40 (1.4%) -0.2% ( -3% - 2%)
AndHighHigh 41.91 (4.2%) 41.86 (2.9%) -0.1% ( -6% - 7%)
MedPhrase 46.29 (1.4%) 46.33 (1.5%) 0.1% ( -2% - 3%)
IntNRQ 127.54 (0.4%) 127.76 (0.4%) 0.2% ( 0% - 1%)
HighTermDayOfYearSort 48.59 (5.1%) 48.71 (6.0%) 0.2% ( -10% - 12%)
LowSloppyPhrase 13.04 (4.0%) 13.08 (4.3%) 0.3% ( -7% - 8%)
MedSloppyPhrase 19.48 (2.3%) 19.54 (2.4%) 0.3% ( -4% - 5%)
OrHighHigh 23.60 (3.0%) 23.68 (2.9%) 0.3% ( -5% - 6%)
HighPhrase 20.25 (2.4%) 20.32 (1.8%) 0.3% ( -3% - 4%)
HighSloppyPhrase 9.29 (3.3%) 9.32 (3.2%) 0.4% ( -5% - 7%)
LowSpanNear 25.70 (3.8%) 25.89 (3.9%) 0.7% ( -6% - 8%)
MedSpanNear 30.46 (4.1%) 30.69 (4.3%) 0.7% ( -7% - 9%)
HighSpanNear 14.41 (4.3%) 14.60 (4.7%) 1.3% ( -7% - 10%)
Wildcard 70.08 (10.3%) 71.09 (6.1%) 1.4% ( -13% - 19%)
BrowseDateTaxoFacets 2.37 (0.2%) 2.41 (0.3%) 1.5% ( 0% - 1%)
Prefix3 86.71 (11.4%) 89.04 (6.8%) 2.7% ( -13% - 23%)
{code}
was (Author: akjain):
I have created [pull request|https://github.com/apache/lucene-solr/pull/563] with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance.
{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline StdDevQPS candidate StdDev Pct diff
PKLookup 117.45 (2.2%) 108.72 (2.3%) -7.4% ( -11% - -3%)
OrHighNotMed 1094.23 (2.5%) 1057.88 (2.7%) -3.3% ( -8% - 1%)
OrHighNotLow 1047.30 (1.7%) 1012.91 (2.5%) -3.3% ( -7% - 1%)
Fuzzy2 44.10 (2.3%) 42.71 (2.7%) -3.2% ( -7% - 1%)
OrNotHighLow 1022.67 (2.5%) 992.28 (2.4%) -3.0% ( -7% - 1%)
BrowseDayOfYearTaxoFacets 7907.19 (2.0%) 7677.99 (2.7%) -2.9% ( -7% - 1%)
OrNotHighMed 866.37 (1.9%) 843.10 (2.3%) -2.7% ( -6% - 1%)
LowTerm 2103.58 (3.5%) 2048.98 (3.6%) -2.6% ( -9% - 4%)
BrowseMonthTaxoFacets 7883.86 (2.0%) 7692.48 (2.1%) -2.4% ( -6% - 1%)
Fuzzy1 64.44 (1.9%) 62.88 (2.3%) -2.4% ( -6% - 1%)
OrNotHighHigh 779.27 (2.0%) 761.04 (2.1%) -2.3% ( -6% - 1%)
Respell 55.60 (2.6%) 54.34 (2.3%) -2.3% ( -7% - 2%)
OrHighNotHigh 877.28 (2.2%) 858.10 (2.5%) -2.2% ( -6% - 2%)
BrowseMonthSSDVFacets 14.85 (7.9%) 14.57 (10.7%) -1.9% ( -18% - 18%)
MedTerm 1984.26 (3.6%) 1947.76 (2.3%) -1.8% ( -7% - 4%)
AndHighLow 718.71 (1.5%) 706.06 (1.6%) -1.8% ( -4% - 1%)
OrHighLow 523.40 (2.5%) 515.56 (2.4%) -1.5% ( -6% - 3%)
HighTerm 1381.10 (2.9%) 1360.80 (2.7%) -1.5% ( -6% - 4%)
HighTermMonthSort 120.45 (12.3%) 119.00 (16.4%) -1.2% ( -26% - 31%)
BrowseDayOfYearSSDVFacets 11.55 (9.7%) 11.45 (10.0%) -0.8% ( -18% - 20%)
AndHighMed 155.15 (2.6%) 154.25 (2.4%) -0.6% ( -5% - 4%)
OrHighMed 88.00 (2.5%) 87.85 (2.7%) -0.2% ( -5% - 5%)
LowPhrase 80.53 (1.6%) 80.40 (1.4%) -0.2% ( -3% - 2%)
AndHighHigh 41.91 (4.2%) 41.86 (2.9%) -0.1% ( -6% - 7%)
MedPhrase 46.29 (1.4%) 46.33 (1.5%) 0.1% ( -2% - 3%)
IntNRQ 127.54 (0.4%) 127.76 (0.4%) 0.2% ( 0% - 1%)
HighTermDayOfYearSort 48.59 (5.1%) 48.71 (6.0%) 0.2% ( -10% - 12%)
LowSloppyPhrase 13.04 (4.0%) 13.08 (4.3%) 0.3% ( -7% - 8%)
MedSloppyPhrase 19.48 (2.3%) 19.54 (2.4%) 0.3% ( -4% - 5%)
OrHighHigh 23.60 (3.0%) 23.68 (2.9%) 0.3% ( -5% - 6%)
HighPhrase 20.25 (2.4%) 20.32 (1.8%) 0.3% ( -3% - 4%)
HighSloppyPhrase 9.29 (3.3%) 9.32 (3.2%) 0.4% ( -5% - 7%)
LowSpanNear 25.70 (3.8%) 25.89 (3.9%) 0.7% ( -6% - 8%)
MedSpanNear 30.46 (4.1%) 30.69 (4.3%) 0.7% ( -7% - 9%)
HighSpanNear 14.41 (4.3%) 14.60 (4.7%) 1.3% ( -7% - 10%)
Wildcard 70.08 (10.3%) 71.09 (6.1%) 1.4% ( -13% - 19%)
BrowseDateTaxoFacets 2.37 (0.2%) 2.41 (0.3%) 1.5% ( 0% - 1%)
Prefix3 86.71 (11.4%) 89.04 (6.8%) 2.7% ( -13% - 23%)
{code}
> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
> Reporter: Ankit Jain
> Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load FST using mmap. That ensures only the required terms get loaded into memory.
>
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning to take following approach for this:
> # Add a boolean property fstOffHeap in FieldInfo
> # Pass list of offheap fields to lucene during index open (ALL can be special keyword for loading ALL fields offheap)
> # Initialize the fstOffHeap property during lucene index open
> # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap field
>
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally and results look good.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org