You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Greg Bowyer (JIRA)" <ji...@apache.org> on 2013/01/10 22:26:15 UTC

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

    [ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550436#comment-13550436 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM:
--------------------------------------------------------------

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the "cleaner", and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling this bust.

The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there.

(This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in the bucket of "its in misc, its not default and your on your own if it breaks". The fact it yields AFAICT no performance gains is both maddening for me and even more damning . 
                
      was (Author: gbowyer@fastmail.co.uk):
    {quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't readByte(), readByte(), readByte() and hopefully avoids some of the bounds checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory implementation that dumps out all of the called operations, I can share the file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer is largely reduced. It now just reads blocks of data, so MappedByteBuffer can do that efficently using a memcpy(). Some MTQs are still faster because they read much more blocks for a large number of terms. I would have expected no significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer java versions use intrinsics, that may no longer be used with your directory impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are when I apt most of the behaviors of the JVM, the biggest win really is that the code becomes a lot simpler (partly because we don't have to worry about the "cleaner", and partly because we are not bound to int32 sizes so no more slice nonsense); despite the simpler code I don't think there is a sizable win in performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling this bust.

The other reason for this was to see if I get better behavior along the MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically provable there.

(This is all amusing that I don't have some gross oversight in my implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in the bucket of "its in misc, its not default and your on your own if it breaks". The fact it yields AFAICT no performance gains is both maddening for me and even more damning . 
                  
> Native MMapDir
> --------------
>
>                 Key: LUCENE-3178
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3178
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/store
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>         Attachments: LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch, LUCENE-3178-Native-MMap-implementation.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code "only" has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org