You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ankit Jain (JIRA)" <ji...@apache.org> on 2019/02/04 19:36:00 UTC
[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

    [ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118 ] 

Ankit Jain edited comment on LUCENE-8635 at 2/4/19 7:35 PM:
------------------------------------------------------------

I have created [pull request|https://github.com/apache/lucene-solr/pull/563] with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance. This does not make sense to me, might be my perf run setup.

{code:title=wikimedium10m|borderStyle=solid}
                    TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
                PKLookup      117.45      (2.2%)      108.72      (2.3%)   -7.4% ( -11% -   -3%)
            OrHighNotMed     1094.23      (2.5%)     1057.88      (2.7%)   -3.3% (  -8% -    1%)
            OrHighNotLow     1047.30      (1.7%)     1012.91      (2.5%)   -3.3% (  -7% -    1%)
                  Fuzzy2       44.10      (2.3%)       42.71      (2.7%)   -3.2% (  -7% -    1%)
            OrNotHighLow     1022.67      (2.5%)      992.28      (2.4%)   -3.0% (  -7% -    1%)
BrowseDayOfYearTaxoFacets     7907.19      (2.0%)     7677.99      (2.7%)   -2.9% (  -7% -    1%)
            OrNotHighMed      866.37      (1.9%)      843.10      (2.3%)   -2.7% (  -6% -    1%)
                 LowTerm     2103.58      (3.5%)     2048.98      (3.6%)   -2.6% (  -9% -    4%)
   BrowseMonthTaxoFacets     7883.86      (2.0%)     7692.48      (2.1%)   -2.4% (  -6% -    1%)
                  Fuzzy1       64.44      (1.9%)       62.88      (2.3%)   -2.4% (  -6% -    1%)
           OrNotHighHigh      779.27      (2.0%)      761.04      (2.1%)   -2.3% (  -6% -    1%)
                 Respell       55.60      (2.6%)       54.34      (2.3%)   -2.3% (  -7% -    2%)
           OrHighNotHigh      877.28      (2.2%)      858.10      (2.5%)   -2.2% (  -6% -    2%)
   BrowseMonthSSDVFacets       14.85      (7.9%)       14.57     (10.7%)   -1.9% ( -18% -   18%)
                 MedTerm     1984.26      (3.6%)     1947.76      (2.3%)   -1.8% (  -7% -    4%)
              AndHighLow      718.71      (1.5%)      706.06      (1.6%)   -1.8% (  -4% -    1%)
               OrHighLow      523.40      (2.5%)      515.56      (2.4%)   -1.5% (  -6% -    3%)
                HighTerm     1381.10      (2.9%)     1360.80      (2.7%)   -1.5% (  -6% -    4%)
       HighTermMonthSort      120.45     (12.3%)      119.00     (16.4%)   -1.2% ( -26% -   31%)
BrowseDayOfYearSSDVFacets       11.55      (9.7%)       11.45     (10.0%)   -0.8% ( -18% -   20%)
              AndHighMed      155.15      (2.6%)      154.25      (2.4%)   -0.6% (  -5% -    4%)
               OrHighMed       88.00      (2.5%)       87.85      (2.7%)   -0.2% (  -5% -    5%)
               LowPhrase       80.53      (1.6%)       80.40      (1.4%)   -0.2% (  -3% -    2%)
             AndHighHigh       41.91      (4.2%)       41.86      (2.9%)   -0.1% (  -6% -    7%)
               MedPhrase       46.29      (1.4%)       46.33      (1.5%)    0.1% (  -2% -    3%)
                  IntNRQ      127.54      (0.4%)      127.76      (0.4%)    0.2% (   0% -    1%)
   HighTermDayOfYearSort       48.59      (5.1%)       48.71      (6.0%)    0.2% ( -10% -   12%)
         LowSloppyPhrase       13.04      (4.0%)       13.08      (4.3%)    0.3% (  -7% -    8%)
         MedSloppyPhrase       19.48      (2.3%)       19.54      (2.4%)    0.3% (  -4% -    5%)
              OrHighHigh       23.60      (3.0%)       23.68      (2.9%)    0.3% (  -5% -    6%)
              HighPhrase       20.25      (2.4%)       20.32      (1.8%)    0.3% (  -3% -    4%)
        HighSloppyPhrase        9.29      (3.3%)        9.32      (3.2%)    0.4% (  -5% -    7%)
             LowSpanNear       25.70      (3.8%)       25.89      (3.9%)    0.7% (  -6% -    8%)
             MedSpanNear       30.46      (4.1%)       30.69      (4.3%)    0.7% (  -7% -    9%)
            HighSpanNear       14.41      (4.3%)       14.60      (4.7%)    1.3% (  -7% -   10%)
                Wildcard       70.08     (10.3%)       71.09      (6.1%)    1.4% ( -13% -   19%)
    BrowseDateTaxoFacets        2.37      (0.2%)        2.41      (0.3%)    1.5% (   0% -    1%)
                 Prefix3       86.71     (11.4%)       89.04      (6.8%)    2.7% ( -13% -   23%)
{code}


was (Author: akjain):
I have created [pull request|https://github.com/apache/lucene-solr/pull/563] with the proposed changes. Though surprisingly, I still see some impact on the PKLookup performance.

{code:title=wikimedium10m|borderStyle=solid}
                    TaskQPS baseline      StdDevQPS candidate      StdDev                Pct diff
                PKLookup      117.45      (2.2%)      108.72      (2.3%)   -7.4% ( -11% -   -3%)
            OrHighNotMed     1094.23      (2.5%)     1057.88      (2.7%)   -3.3% (  -8% -    1%)
            OrHighNotLow     1047.30      (1.7%)     1012.91      (2.5%)   -3.3% (  -7% -    1%)
                  Fuzzy2       44.10      (2.3%)       42.71      (2.7%)   -3.2% (  -7% -    1%)
            OrNotHighLow     1022.67      (2.5%)      992.28      (2.4%)   -3.0% (  -7% -    1%)
BrowseDayOfYearTaxoFacets     7907.19      (2.0%)     7677.99      (2.7%)   -2.9% (  -7% -    1%)
            OrNotHighMed      866.37      (1.9%)      843.10      (2.3%)   -2.7% (  -6% -    1%)
                 LowTerm     2103.58      (3.5%)     2048.98      (3.6%)   -2.6% (  -9% -    4%)
   BrowseMonthTaxoFacets     7883.86      (2.0%)     7692.48      (2.1%)   -2.4% (  -6% -    1%)
                  Fuzzy1       64.44      (1.9%)       62.88      (2.3%)   -2.4% (  -6% -    1%)
           OrNotHighHigh      779.27      (2.0%)      761.04      (2.1%)   -2.3% (  -6% -    1%)
                 Respell       55.60      (2.6%)       54.34      (2.3%)   -2.3% (  -7% -    2%)
           OrHighNotHigh      877.28      (2.2%)      858.10      (2.5%)   -2.2% (  -6% -    2%)
   BrowseMonthSSDVFacets       14.85      (7.9%)       14.57     (10.7%)   -1.9% ( -18% -   18%)
                 MedTerm     1984.26      (3.6%)     1947.76      (2.3%)   -1.8% (  -7% -    4%)
              AndHighLow      718.71      (1.5%)      706.06      (1.6%)   -1.8% (  -4% -    1%)
               OrHighLow      523.40      (2.5%)      515.56      (2.4%)   -1.5% (  -6% -    3%)
                HighTerm     1381.10      (2.9%)     1360.80      (2.7%)   -1.5% (  -6% -    4%)
       HighTermMonthSort      120.45     (12.3%)      119.00     (16.4%)   -1.2% ( -26% -   31%)
BrowseDayOfYearSSDVFacets       11.55      (9.7%)       11.45     (10.0%)   -0.8% ( -18% -   20%)
              AndHighMed      155.15      (2.6%)      154.25      (2.4%)   -0.6% (  -5% -    4%)
               OrHighMed       88.00      (2.5%)       87.85      (2.7%)   -0.2% (  -5% -    5%)
               LowPhrase       80.53      (1.6%)       80.40      (1.4%)   -0.2% (  -3% -    2%)
             AndHighHigh       41.91      (4.2%)       41.86      (2.9%)   -0.1% (  -6% -    7%)
               MedPhrase       46.29      (1.4%)       46.33      (1.5%)    0.1% (  -2% -    3%)
                  IntNRQ      127.54      (0.4%)      127.76      (0.4%)    0.2% (   0% -    1%)
   HighTermDayOfYearSort       48.59      (5.1%)       48.71      (6.0%)    0.2% ( -10% -   12%)
         LowSloppyPhrase       13.04      (4.0%)       13.08      (4.3%)    0.3% (  -7% -    8%)
         MedSloppyPhrase       19.48      (2.3%)       19.54      (2.4%)    0.3% (  -4% -    5%)
              OrHighHigh       23.60      (3.0%)       23.68      (2.9%)    0.3% (  -5% -    6%)
              HighPhrase       20.25      (2.4%)       20.32      (1.8%)    0.3% (  -3% -    4%)
        HighSloppyPhrase        9.29      (3.3%)        9.32      (3.2%)    0.4% (  -5% -    7%)
             LowSpanNear       25.70      (3.8%)       25.89      (3.9%)    0.7% (  -6% -    8%)
             MedSpanNear       30.46      (4.1%)       30.69      (4.3%)    0.7% (  -7% -    9%)
            HighSpanNear       14.41      (4.3%)       14.60      (4.7%)    1.3% (  -7% -   10%)
                Wildcard       70.08     (10.3%)       71.09      (6.1%)    1.4% ( -13% -   19%)
    BrowseDateTaxoFacets        2.37      (0.2%)        2.41      (0.3%)    1.5% (   0% -    1%)
                 Prefix3       86.71     (11.4%)       89.04      (6.8%)    2.7% ( -13% -   23%)
{code}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load FST using mmap. That ensures only the required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org