You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/12/17 09:36:01 UTC

[jira] Created: (LUCENE-2816) MMapDirectory speedups

MMapDirectory speedups
----------------------

                 Key: LUCENE-2816
                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
    Affects Versions: 3.1, 4.0
            Reporter: Robert Muir
            Assignee: Robert Muir


MMapDirectory has some performance problems:
# When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
Instead, like MMapIndexInput, it should rely upon the contract of these operations
in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
our own bounds checks just slows things down.
# the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
This isn't very important since we don't much use these, but I think there's no reason
users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972416#action_12972416 ] 

Uwe Schindler commented on LUCENE-2816:
---------------------------------------

One thing to add: When using readFloat & co, we should make sure that we set the endianness explicitely in the ctor. I just want to explicitely make sure that the endianness is correct and document it that it is big endian for Lucene.

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2816) MMapDirectory speedups

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972416#action_12972416 ] 

Uwe Schindler edited comment on LUCENE-2816 at 12/17/10 4:43 AM:
-----------------------------------------------------------------

-One thing to add: When using readFloat & co, we should make sure that we set the endianness explicitely in the ctor. I just want to explicitely make sure that the endianness is correct and document it that it is big endian for Lucene.-

We don't need that: "The initial order of a byte buffer is always BIG_ENDIAN."

      was (Author: thetaphi):
    One thing to add: When using readFloat & co, we should make sure that we set the endianness explicitely in the ctor. I just want to explicitely make sure that the endianness is correct and document it that it is big endian for Lucene.
  
> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972880#action_12972880 ] 

Robert Muir commented on LUCENE-2816:
-------------------------------------

committed revision 1050737. I'll wait a bit for branch_3x.

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972398#action_12972398 ] 

Simon Willnauer commented on LUCENE-2816:
-----------------------------------------

Awesome results robert!! :)

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972420#action_12972420 ] 

Michael McCandless commented on LUCENE-2816:
--------------------------------------------

Good grief!  What amazing gains, especially w/ PFor codec which of course makes super heavy use of .readInt().  Awesome Robert!

This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely have a huge edge over NIOFSDir?

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972422#action_12972422 ] 

Robert Muir commented on LUCENE-2816:
-------------------------------------

{quote}
Good grief! What amazing gains, especially w/ PFor codec which of course makes super heavy use of .readInt(). Awesome Robert!
This will mean w/ the cutover to FORPFOR codec for 4.0, MMapDir will likely have a huge edge over NIOFSDir?
{quote}

This isn't really a 'gain' for the bulkpostings branch?
This is just making DataInput.readInt() faster.
Currently the bulkpostings branch uses readByte(byte[]), then wraps into a ByteBuffer and processes an IntBuffer view of that.
I switched to just using readInt() from DataInputDirectly [FrameOfRefDataInput] and found it to be much slower than this IntBuffer method.

this whole benchmark is just benching DataInput.readInt()...

So, we shouldn't change anything in bulkpostings, this isn't faster than the intbuffer method in my tests, at best its equivalent... but we should fix this slowdown in our APIs.


> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2816) MMapDirectory speedups

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2816:
--------------------------------

    Attachment: LUCENE-2816.patch

Here's the most important benchmark: speeding up the MultiMMap's readByte(s) in general:

MultiMMapIndexInput readByte(s) improvements [trunk, Standard codec]
||Query||QPS trunk||QPS patch||Pct diff||||
|spanFirst(unit, 5)|12.72|12.85|{color:green}1.0%{color}|
|+nebraska +state|137.47|139.33|{color:green}1.3%{color}|
|spanNear([unit, state], 10, true)|2.90|2.94|{color:green}1.4%{color}|
|"unit state"|5.88|5.99|{color:green}1.8%{color}|
|unit~2.0|7.06|7.20|{color:green}2.0%{color}|
|+unit +state|8.68|8.87|{color:green}2.2%{color}|
|unit state|8.00|8.23|{color:green}2.9%{color}|
|unit~1.0|7.19|7.41|{color:green}3.0%{color}|
|unit*|22.66|23.41|{color:green}3.3%{color}|
|uni*|12.54|13.12|{color:green}4.6%{color}|
|united~1.0|10.61|11.12|{color:green}4.8%{color}|
|united~2.0|2.52|2.65|{color:green}5.1%{color}|
|state|28.72|30.23|{color:green}5.3%{color}|
|un*d|44.84|48.06|{color:green}7.2%{color}|
|u*d|13.17|14.51|{color:green}10.2%{color}|

In the bulk postings branch, I've been experimenting with various techniques for FOR/PFOR 
and one thing i tried was simply decoding with readInt() from the DataInput. So I adapted For/PFOR
to just take DataInput and work on it directly, instead of reading into a byte[], wrapping it with a ByteBuffer,
and working on an IntBuffer view.

But when I did this, i found that MMap was slow for readInt(), etc. So we implement these primitives
with ByteBuffer.readInt(). This isn't very important since lucene doesn't much use these, and mostly theoretical 
but I still think things like readInt(), readShort(), readLong() should be fast... for example just earlier today 
someone posted an alternative PFOR implementation on LUCENE-1410 that uses DataInput.readInt().

MMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
||Query||QPS branch||QPS patch||Pct diff||||
|spanFirst(unit, 5)|12.14|11.99|{color:red}-1.2%{color}|
|united~1.0|11.32|11.33|{color:green}0.1%{color}|
|united~2.0|2.51|2.56|{color:green}2.1%{color}|
|unit~1.0|6.98|7.19|{color:green}3.0%{color}|
|unit~2.0|6.88|7.11|{color:green}3.3%{color}|
|spanNear([unit, state], 10, true)|2.81|2.96|{color:green}5.2%{color}|
|unit state|8.04|8.59|{color:green}6.8%{color}|
|+unit +state|10.97|12.12|{color:green}10.5%{color}|
|unit*|26.67|29.80|{color:green}11.7%{color}|
|"unit state"|5.59|6.27|{color:green}12.3%{color}|
|uni*|15.10|17.51|{color:green}15.9%{color}|
|state|33.20|38.72|{color:green}16.6%{color}|
|+nebraska +state|59.17|71.45|{color:green}20.8%{color}|
|un*d|35.98|47.14|{color:green}31.0%{color}|
|u*d|9.48|12.46|{color:green}31.4%{color}|

Here's the same benchmark of DataInput.readInt() but with the MultiMMapIndexInput

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
||Query||QPS branch||QPS patch||Pct diff||||
|united~2.0|2.43|2.54|{color:green}4.3%{color}|
|united~1.0|10.78|11.39|{color:green}5.7%{color}|
|unit~1.0|6.81|7.21|{color:green}5.8%{color}|
|unit~2.0|6.62|7.05|{color:green}6.5%{color}|
|spanNear([unit, state], 10, true)|2.77|2.96|{color:green}6.6%{color}|
|unit state|7.85|8.53|{color:green}8.7%{color}|
|spanFirst(unit, 5)|10.50|11.71|{color:green}11.5%{color}|
|+unit +state|10.26|11.94|{color:green}16.3%{color}|
|"unit state"|5.39|6.31|{color:green}17.0%{color}|
|state|31.95|39.17|{color:green}22.6%{color}|
|unit*|24.39|31.02|{color:green}27.2%{color}|
|+nebraska +state|54.68|71.98|{color:green}31.6%{color}|
|u*d|9.53|12.62|{color:green}32.5%{color}|
|uni*|13.72|18.23|{color:green}32.9%{color}|
|un*d|35.87|48.19|{color:green}34.3%{color}|

Just to be sure, I ran this last one on sparc64 (bigendian) also.

MultiMMapIndexInput readInt() improvements [bulkpostings, FrameOfRefDataInput codec]
||Query||QPS branch||QPS patch||Pct diff||||
|united~2.0|2.23|2.26|{color:green}1.5%{color}|
|unit~2.0|6.37|6.47|{color:green}1.6%{color}|
|united~1.0|11.33|11.59|{color:green}2.3%{color}|
|unit~1.0|9.68|10.05|{color:green}3.7%{color}|
|spanNear([unit, state], 10, true)|15.60|17.54|{color:green}12.5%{color}|
|unit*|127.14|144.08|{color:green}13.3%{color}|
|unit state|44.93|51.30|{color:green}14.2%{color}|
|spanFirst(unit, 5)|58.42|68.37|{color:green}17.0%{color}|
|uni*|56.66|67.53|{color:green}19.2%{color}|
|+nebraska +state|215.62|262.99|{color:green}22.0%{color}|
|+unit +state|63.18|77.86|{color:green}23.2%{color}|
|"unit state"|32.24|40.05|{color:green}24.2%{color}|
|u*d|29.13|36.69|{color:green}26.0%{color}|
|state|145.99|188.33|{color:green}29.0%{color}|
|un*d|65.27|87.20|{color:green}33.6%{color}|

I think some of these benchmarks also show that MultiMMapIndexInput might now be
essentially just as fast as MMapIndexInput... but lets not go there yet and keep them separate for now.


> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-2816) MMapDirectory speedups

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2816.
---------------------------------

    Resolution: Fixed

Committed revision 1052892 to branch_3x.

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972413#action_12972413 ] 

Uwe Schindler commented on LUCENE-2816:
---------------------------------------

Awesome, Ro bert is changing to the MMap Performance Policeman!

I like the idea to simply delegate the methods and catch exception to fallback to manual read with boundary transition! I just wanted to be sure that the position pointer in the buffer does not partly go forward when you read request fails at a buffer boundary, but that seems to be the case.

> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2816) MMapDirectory speedups

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972418#action_12972418 ] 

Robert Muir commented on LUCENE-2816:
-------------------------------------

bq.  I just wanted to be sure that the position pointer in the buffer does not partly go forward when you read request fails at a buffer boundary, but that seems to be the case.

Yes, this is guaranteed in the APIs, and also tested well by TestMultiMMap, which uses random chunk sizes between 20 and 100 (including odd numbers etc)
Though we should enhance this test, i think it just retrieves documents at the moment... probably better if it did some searches too.


> MMapDirectory speedups
> ----------------------
>
>                 Key: LUCENE-2816
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2816
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2816.patch
>
>
> MMapDirectory has some performance problems:
> # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
> which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
> Instead, like MMapIndexInput, it should rely upon the contract of these operations
> in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException).
> Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing
> our own bounds checks just slows things down.
> # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc
> This isn't very important since we don't much use these, but I think there's no reason
> users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an 
> IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org