You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/08/27 20:04:53 UTC

[jira] Created: (LUCENE-2627) MMapDirectory chunking is buggy

MMapDirectory chunking is buggy
-------------------------------

                 Key: LUCENE-2627
                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
             Project: Lucene - Java
          Issue Type: Bug
          Components: Store
            Reporter: Robert Muir


MMapDirectory uses chunking with MultiMMapIndexInput.
 
Because Java's ByteBuffer uses an int to address the
values, it's necessary to access a file >
Integer.MAX_VALUE in size using multiple byte buffers.

But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2627:
--------------------------------

    Attachment: LUCENE-2627.patch

off-by-one: here's a patch with the test-case (i removed the seed)

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903549#action_12903549 ] 

Robert Muir commented on LUCENE-2627:
-------------------------------------

for this to fail, your file size has to be an exact multiple of the chunk size (default=Integer.MAX_VALUE), so its not a big deal, but I think we should fix it.

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2627.
---------------------------------

    Fix Version/s: 2.9.4
                   3.0.3
                   3.1
                   4.0
       Resolution: Fixed

Committed revisions:

trunk: 990281
3.x: 990286
3.0: 990293
2.9: 990295

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9.4, 3.0.3, 3.1, 4.0
>
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reassigned LUCENE-2627:
-----------------------------------

    Assignee: Robert Muir

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2627:
--------------------------------

    Attachment: LUCENE-2627_test.patch

attached is a random test case (wired to a value where it fails quickly for standard codec):

ant test-core -Dtestcase=TestMultiMMap -Dtests.codec=Standard

{noformat}
junit-sequential:
    [junit] Testsuite: org.apache.lucene.store.TestMultiMMap
    [junit] Testcase: testRandomChunkSizes(org.apache.lucene.store.TestMultiMMap):      Caused an ERROR
    [junit] 233
    [junit] java.lang.ArrayIndexOutOfBoundsException: 233
    [junit]     at org.apache.lucene.store.MMapDirectory$MultiMMapIndexInput.seek(MMapDirectory.java:371)
    [junit]     at org.apache.lucene.store.MMapDirectory$MultiMMapIndexInput.clone(MMapDirectory.java:394)
    [junit]     at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$SegmentTermsEnum.<init>(S
tandardTermsDictReader.java:288)
    [junit]     at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader.iterator(StandardTermsDic
tReader.java:270)
    [junit]     at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$TermFieldsEnum.terms(StandardTermsDic
tReader.java:240)
    [junit]     at org.apache.lucene.index.MultiFieldsEnum.terms(MultiFieldsEnum.java:103)
    [junit]     at org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:49)
    [junit]     at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:657)
    [junit]     at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
{noformat}

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>         Attachments: LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903646#action_12903646 ] 

Uwe Schindler commented on LUCENE-2627:
---------------------------------------

For not breaking hudson on 32 bit JVMs, we should enable the MMap close hack when testing against that dir, else we may run out of address space.

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903647#action_12903647 ] 

Robert Muir commented on LUCENE-2627:
-------------------------------------

good idea, I will have the test enable the unmap hack, if supported

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903604#action_12903604 ] 

Uwe Schindler commented on LUCENE-2627:
---------------------------------------

Thanks Robert for investigating! I was wondering why I have never seen this error with my 10 Gig CFS file and MMapDir - it just happens on exact multiples of 2^31 :-)

But we should fix this for 2.9 and 3.0, too - its easy. Maybe we have another release (we also have the NRQ bug)

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903622#action_12903622 ] 

Robert Muir commented on LUCENE-2627:
-------------------------------------

bq. But we should fix this for 2.9 and 3.0, too - its easy. Maybe we have another release (we also have the NRQ bug)

OK, i'll test each branch and backport as needed.

Additionally all tests pass with -Dtests.directory=MMapDirectory, so I plan to commit shortly.

> MMapDirectory chunking is buggy
> -------------------------------
>
>                 Key: LUCENE-2627
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2627
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>  
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org