You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/03/10 17:52:43 UTC
[jira] [Updated] (LUCENE-3178) Native MMapDir

     [ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3178:
---------------------------------------

    Attachment: LUCENE-3178.patch

bq. It do think it'd be interesting to pair up a NativeMMapDir with a custom postings format that instead uses IndexInput.readLong (via Unsafe.getLong) to pull longs from disk

I was curious about this so I coded up a prototype patch.  It's a
NativeMMapDirectory.java/cpp that does the mmap/munmap in C, and then
a new postings format (NativeMMapPostingsFormat) which requires this
Directory impl and then uses Unsafe.getLong to read the longs for
packed int decode.

This bypasses the extra step we do today of first reading into a
byte[], and then decoding from that, and instead pulls long directly
from the map and decodes from that.  It requires that the byte-order
in the index matches the CPU; e.g. for x86 (little-endian) it's
opposite from the big-endian order that DataInput.write/readLong
expect.

It does not align the long reads; doing so would increase the index
size somewhat because we'd need to insert pad bytes to align the long
reads to every 8 bytes.  But I think on recent x86 CPUs unaligned
reads are not adding much of a penalty...

The patch is very unsafe / tons of nocommits, but seems to work
correctly.  Here's the results:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                  Fuzzy2       47.61      (3.1%)       46.98      (2.9%)   -1.3% (  -7% -    4%)
            HighSpanNear        8.34      (5.8%)        8.42      (5.9%)    0.9% ( -10% -   13%)
                 Respell       48.79      (4.1%)       50.00      (3.3%)    2.5% (  -4% -   10%)
                  IntNRQ        3.68      (1.5%)        3.78      (7.8%)    2.7% (  -6% -   12%)
            OrHighNotMed       37.79      (3.8%)       38.90      (2.8%)    3.0% (  -3% -    9%)
            OrHighNotLow       31.19      (4.2%)       32.13      (3.3%)    3.0% (  -4% -   10%)
                 Prefix3       91.92      (1.9%)       95.11      (6.2%)    3.5% (  -4% -   11%)
               OrHighMed       32.99      (4.0%)       34.15      (3.1%)    3.5% (  -3% -   11%)
                  Fuzzy1       60.40      (3.3%)       62.56      (3.4%)    3.6% (  -3% -   10%)
           OrNotHighHigh       11.17      (3.9%)       11.57      (2.7%)    3.6% (  -2% -   10%)
                HighTerm       69.60     (11.2%)       72.19     (15.5%)    3.7% ( -20% -   34%)
               LowPhrase       13.17      (2.1%)       13.67      (2.7%)    3.8% (   0% -    8%)
              AndHighMed       34.52      (1.0%)       35.85      (1.5%)    3.8% (   1% -    6%)
            OrNotHighLow       25.04      (3.5%)       26.00      (0.4%)    3.8% (   0% -    8%)
               OrHighLow       23.60      (4.2%)       24.50      (3.3%)    3.8% (  -3% -   11%)
                Wildcard       19.93      (2.8%)       20.73      (5.0%)    4.0% (  -3% -   12%)
         MedSloppyPhrase        3.52      (3.8%)        3.67      (4.5%)    4.2% (  -3% -   12%)
           OrHighNotHigh       13.88      (3.7%)       14.46      (2.5%)    4.2% (  -1% -   10%)
              OrHighHigh       10.23      (3.9%)       10.68      (3.1%)    4.4% (  -2% -   11%)
                 LowTerm      330.50      (6.7%)      345.35      (8.9%)    4.5% ( -10% -   21%)
             AndHighHigh       28.53      (1.1%)       29.82      (1.4%)    4.5% (   2% -    7%)
            OrNotHighMed       24.13      (3.4%)       25.23      (0.5%)    4.6% (   0% -    8%)
             LowSpanNear       10.55      (2.7%)       11.06      (3.6%)    4.8% (  -1% -   11%)
              HighPhrase        4.30      (6.7%)        4.55      (6.2%)    5.9% (  -6% -   20%)
                 MedTerm      106.81      (9.0%)      113.26     (12.9%)    6.0% ( -14% -   30%)
        HighSloppyPhrase        3.41      (4.2%)        3.67      (7.2%)    7.7% (  -3% -   19%)
             MedSpanNear       31.66      (3.0%)       34.15      (3.8%)    7.9% (   0% -   15%)
               MedPhrase      212.86      (6.1%)      233.14      (6.1%)    9.5% (  -2% -   23%)
         LowSloppyPhrase       44.91      (2.4%)       49.77      (2.3%)   10.8% (   6% -   15%)
              AndHighLow      404.75      (2.5%)      506.81      (3.6%)   25.2% (  18% -   32%)
{noformat}

Net net, a very minor improvement!  I think this is good news: it
means that the extra abstractions here, which are useful so we can be
safe (not use Unsafe) and agnostic to byte-order are not costing us
too much.


> Native MMapDir
> --------------
>
>                 Key: LUCENE-3178
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3178
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>            Reporter: Michael McCandless
>              Labels: gsoc2014
>         Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code "only" has to open the file handle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org