You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/03/10 17:52:43 UTC
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-3178:
---------------------------------------
Attachment: LUCENE-3178.patch
bq. It do think it'd be interesting to pair up a NativeMMapDir with a custom postings format that instead uses IndexInput.readLong (via Unsafe.getLong) to pull longs from disk
I was curious about this so I coded up a prototype patch. It's a
NativeMMapDirectory.java/cpp that does the mmap/munmap in C, and then
a new postings format (NativeMMapPostingsFormat) which requires this
Directory impl and then uses Unsafe.getLong to read the longs for
packed int decode.
This bypasses the extra step we do today of first reading into a
byte[], and then decoding from that, and instead pulls long directly
from the map and decodes from that. It requires that the byte-order
in the index matches the CPU; e.g. for x86 (little-endian) it's
opposite from the big-endian order that DataInput.write/readLong
expect.
It does not align the long reads; doing so would increase the index
size somewhat because we'd need to insert pad bytes to align the long
reads to every 8 bytes. But I think on recent x86 CPUs unaligned
reads are not adding much of a penalty...
The patch is very unsafe / tons of nocommits, but seems to work
correctly. Here's the results:
{noformat}
Task QPS base StdDev QPS comp StdDev Pct diff
Fuzzy2 47.61 (3.1%) 46.98 (2.9%) -1.3% ( -7% - 4%)
HighSpanNear 8.34 (5.8%) 8.42 (5.9%) 0.9% ( -10% - 13%)
Respell 48.79 (4.1%) 50.00 (3.3%) 2.5% ( -4% - 10%)
IntNRQ 3.68 (1.5%) 3.78 (7.8%) 2.7% ( -6% - 12%)
OrHighNotMed 37.79 (3.8%) 38.90 (2.8%) 3.0% ( -3% - 9%)
OrHighNotLow 31.19 (4.2%) 32.13 (3.3%) 3.0% ( -4% - 10%)
Prefix3 91.92 (1.9%) 95.11 (6.2%) 3.5% ( -4% - 11%)
OrHighMed 32.99 (4.0%) 34.15 (3.1%) 3.5% ( -3% - 11%)
Fuzzy1 60.40 (3.3%) 62.56 (3.4%) 3.6% ( -3% - 10%)
OrNotHighHigh 11.17 (3.9%) 11.57 (2.7%) 3.6% ( -2% - 10%)
HighTerm 69.60 (11.2%) 72.19 (15.5%) 3.7% ( -20% - 34%)
LowPhrase 13.17 (2.1%) 13.67 (2.7%) 3.8% ( 0% - 8%)
AndHighMed 34.52 (1.0%) 35.85 (1.5%) 3.8% ( 1% - 6%)
OrNotHighLow 25.04 (3.5%) 26.00 (0.4%) 3.8% ( 0% - 8%)
OrHighLow 23.60 (4.2%) 24.50 (3.3%) 3.8% ( -3% - 11%)
Wildcard 19.93 (2.8%) 20.73 (5.0%) 4.0% ( -3% - 12%)
MedSloppyPhrase 3.52 (3.8%) 3.67 (4.5%) 4.2% ( -3% - 12%)
OrHighNotHigh 13.88 (3.7%) 14.46 (2.5%) 4.2% ( -1% - 10%)
OrHighHigh 10.23 (3.9%) 10.68 (3.1%) 4.4% ( -2% - 11%)
LowTerm 330.50 (6.7%) 345.35 (8.9%) 4.5% ( -10% - 21%)
AndHighHigh 28.53 (1.1%) 29.82 (1.4%) 4.5% ( 2% - 7%)
OrNotHighMed 24.13 (3.4%) 25.23 (0.5%) 4.6% ( 0% - 8%)
LowSpanNear 10.55 (2.7%) 11.06 (3.6%) 4.8% ( -1% - 11%)
HighPhrase 4.30 (6.7%) 4.55 (6.2%) 5.9% ( -6% - 20%)
MedTerm 106.81 (9.0%) 113.26 (12.9%) 6.0% ( -14% - 30%)
HighSloppyPhrase 3.41 (4.2%) 3.67 (7.2%) 7.7% ( -3% - 19%)
MedSpanNear 31.66 (3.0%) 34.15 (3.8%) 7.9% ( 0% - 15%)
MedPhrase 212.86 (6.1%) 233.14 (6.1%) 9.5% ( -2% - 23%)
LowSloppyPhrase 44.91 (2.4%) 49.77 (2.3%) 10.8% ( 6% - 15%)
AndHighLow 404.75 (2.5%) 506.81 (3.6%) 25.2% ( 18% - 32%)
{noformat}
Net net, a very minor improvement! I think this is good news: it
means that the extra abstractions here, which are useful so we can be
safe (not use Unsafe) and agnostic to byte-order are not costing us
too much.
> Native MMapDir
> --------------
>
> Key: LUCENE-3178
> URL: https://issues.apache.org/jira/browse/LUCENE-3178
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/store
> Reporter: Michael McCandless
> Labels: gsoc2014
> Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code "only" has to open the file handle.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org