You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/06/14 00:39:47 UTC

[jira] [Created] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
-------------------------------------------------------------------------------------------

                 Key: LUCENE-3200
                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Uwe Schindler


Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.

We had the following ideas:
- MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
- Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
- Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
- the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.

We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049154#comment-13049154 ] 

Simon Willnauer commented on LUCENE-3200:
-----------------------------------------

+1 this looks awesome. Gute Arbeit Uwe :)

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-3200:
-------------------------------------

    Assignee: Uwe Schindler

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3200:
----------------------------------

    Attachment: LUCENE-3200.patch

New patch with some minor issues fixed:
- fixed the RuntimeException
- fixed readByte to throw EOF if we are at the end of the n-1 th buffer. as buffer n may be size 0, we will throw BufferUnderFlow in the chatch block. I added hasRemaining() there, so its consistent with readBytes.
- The check for an invalid power was bogus (0 is allowed, leads to buffer size 1)
- The check for RandomAccessFile too big for maximum buffer size did not respect the additional buffer. nrBuffers can then overflow easily


> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3200:
----------------------------------

    Attachment: LUCENE-3200.patch

Little cleanups & improvements:
- made readByte() consistent with readBytes() [catch block using remaining-while loop for size=0 buffers]
- renamed field names and variables to use chunkSize consistently

I think it's ready to commit, we should only wait for Mike to check on beast.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049115#comment-13049115 ] 

Michael McCandless commented on LUCENE-3200:
--------------------------------------------

+1 to commit!

In my stress NRT test (runs optimize on a full Wiki index with ongoing
indexing / reopening), without this patch, I see performance drop
substantially (like 180 QPS down to 140 QPS) when the JVM cuts over to
the optimized segment.  With the patch I see it jump up a bit after
the optimize completes!  So this seems to make hotspot's job
easier...


> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3200:
----------------------------------

    Attachment: LUCENE-3200.patch

Here the patch.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-3200.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048858#comment-13048858 ] 

Robert Muir commented on LUCENE-3200:
-------------------------------------

also, we can fix the issue Shai brought up for the 3.1 VOTE while we are here.

in seek(long pos) i think we should do:
{code}
try {
 ...
 position()
 ...
} catch (IllegalArgumentException e) {
  if (pos < 0) 
    throw exc;
  else 
    throw new IOException("read past EOF"); 
}
{code}

This would be more consistent with NIOFS/SimpleFS from an exception perspective.


> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155404#comment-13155404 ] 

Uwe Schindler commented on LUCENE-3200:
---------------------------------------

MMapDirectory missed a explicit nulling of curBuf, commited trunk revision: 1205152, 3x in revision: 1205153
                
> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3200:
----------------------------------

    Attachment: LUCENE-3200.patch

Same patch with Robert's tests included.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3200.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.3

Committed trunk revision: 1135537
Committed 3.x revision: 1135538

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049166#comment-13049166 ] 

Uwe Schindler commented on LUCENE-3200:
---------------------------------------

Thanks to Robert for help debugging my stupid + vs | problem and lots of fruitful discussions about the whole stuff and how to improve :) Thanks to Mike for testing on beast!

Now you can refactor CFSIndexInput & Co!

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3200:
--------------------------------

    Attachment: LUCENE-3200.patch

same as uwe's patch, but i also nuked the previous hack in TestTermVectorsReader, as MMapDir returns read past EOF now like the others.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049117#comment-13049117 ] 

Robert Muir commented on LUCENE-3200:
-------------------------------------

+1, great work Uwe.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048923#comment-13048923 ] 

Robert Muir commented on LUCENE-3200:
-------------------------------------

at a glance the patch is looking really good overall! I'll help with some review and testing.

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3200) Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3200:
--------------------------------

    Attachment: LUCENE-3200_tests.patch

here are some additional stress tests for mmap

> Cleanup MMapDirectory to use only one MMapIndexInput impl with mapping sized of powers of 2
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3200
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3200
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>         Attachments: LUCENE-3200.patch, LUCENE-3200_tests.patch
>
>
> Robert and me discussed a little bit after Mike's investigations, that using SingleMMapIndexinput together with MultiMMapIndexInput leads to hotspot slowdowns sometimes.
> We had the following ideas:
> - MultiMMapIndexInput is almost as fast as SingleMMapIndexInput, as the switching between buffer boundaries is done in exception catch blocks. So normal code path is always the same like for Single*
> - Only the seek method uses strange calculations (the modulo is totally bogus, it could be simply: int bufOffset = (int) (pos % maxBufSize); - very strange way of calculating modulo in the original code)
> - Because of speed we suggest to no longer use arbitrary buffer sizes. We should pass only the power of 2 to the indexinput as size. All calculations in seek and anywhere else would be simple bit shifts and AND operations (the and masks for the modulo can be calculated in the ctor like NumericUtils does when calculating precisionSteps).
> - the maximum buffer size will now be 2^30, not 2^31-1. But thats not an issue at all. In my opinion, a buffer size of 2^31-1 is stupid in all cases, as it will no longer fit page boundaries and mmapping gets harder for the O/S.
> We will provide a patch with those cleanups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org