You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/08/27 20:04:53 UTC
[jira] Created: (LUCENE-2627) MMapDirectory chunking is buggy
MMapDirectory chunking is buggy
-------------------------------
Key: LUCENE-2627
URL: https://issues.apache.org/jira/browse/LUCENE-2627
Project: Lucene - Java
Issue Type: Bug
Components: Store
Reporter: Robert Muir
MMapDirectory uses chunking with MultiMMapIndexInput.
Because Java's ByteBuffer uses an int to address the
values, it's necessary to access a file >
Integer.MAX_VALUE in size using multiple byte buffers.
But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2627:
--------------------------------
Attachment: LUCENE-2627.patch
off-by-one: here's a patch with the test-case (i removed the seed)
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903549#action_12903549 ]
Robert Muir commented on LUCENE-2627:
-------------------------------------
for this to fail, your file size has to be an exact multiple of the chunk size (default=Integer.MAX_VALUE), so its not a big deal, but I think we should fix it.
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-2627.
---------------------------------
Fix Version/s: 2.9.4
3.0.3
3.1
4.0
Resolution: Fixed
Committed revisions:
trunk: 990281
3.x: 990286
3.0: 990293
2.9: 990295
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 2.9.4, 3.0.3, 3.1, 4.0
>
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir reassigned LUCENE-2627:
-----------------------------------
Assignee: Robert Muir
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Updated: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2627:
--------------------------------
Attachment: LUCENE-2627_test.patch
attached is a random test case (wired to a value where it fails quickly for standard codec):
ant test-core -Dtestcase=TestMultiMMap -Dtests.codec=Standard
{noformat}
junit-sequential:
[junit] Testsuite: org.apache.lucene.store.TestMultiMMap
[junit] Testcase: testRandomChunkSizes(org.apache.lucene.store.TestMultiMMap): Caused an ERROR
[junit] 233
[junit] java.lang.ArrayIndexOutOfBoundsException: 233
[junit] at org.apache.lucene.store.MMapDirectory$MultiMMapIndexInput.seek(MMapDirectory.java:371)
[junit] at org.apache.lucene.store.MMapDirectory$MultiMMapIndexInput.clone(MMapDirectory.java:394)
[junit] at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$SegmentTermsEnum.<init>(S
tandardTermsDictReader.java:288)
[junit] at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader.iterator(StandardTermsDic
tReader.java:270)
[junit] at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$TermFieldsEnum.terms(StandardTermsDic
tReader.java:240)
[junit] at org.apache.lucene.index.MultiFieldsEnum.terms(MultiFieldsEnum.java:103)
[junit] at org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:49)
[junit] at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:657)
[junit] at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154)
{noformat}
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Attachments: LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903646#action_12903646 ]
Uwe Schindler commented on LUCENE-2627:
---------------------------------------
For not breaking hudson on 32 bit JVMs, we should enable the MMap close hack when testing against that dir, else we may run out of address space.
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903647#action_12903647 ]
Robert Muir commented on LUCENE-2627:
-------------------------------------
good idea, I will have the test enable the unmap hack, if supported
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903604#action_12903604 ]
Uwe Schindler commented on LUCENE-2627:
---------------------------------------
Thanks Robert for investigating! I was wondering why I have never seen this error with my 10 Gig CFS file and MMapDir - it just happens on exact multiples of 2^31 :-)
But we should fix this for 2.9 and 3.0, too - its easy. Maybe we have another release (we also have the NRQ bug)
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (LUCENE-2627) MMapDirectory chunking is buggy
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903622#action_12903622 ]
Robert Muir commented on LUCENE-2627:
-------------------------------------
bq. But we should fix this for 2.9 and 3.0, too - its easy. Maybe we have another release (we also have the NRQ bug)
OK, i'll test each branch and backport as needed.
Additionally all tests pass with -Dtests.directory=MMapDirectory, so I plan to commit shortly.
> MMapDirectory chunking is buggy
> -------------------------------
>
> Key: LUCENE-2627
> URL: https://issues.apache.org/jira/browse/LUCENE-2627
> Project: Lucene - Java
> Issue Type: Bug
> Components: Store
> Reporter: Robert Muir
> Assignee: Robert Muir
> Attachments: LUCENE-2627.patch, LUCENE-2627_test.patch
>
>
> MMapDirectory uses chunking with MultiMMapIndexInput.
>
> Because Java's ByteBuffer uses an int to address the
> values, it's necessary to access a file >
> Integer.MAX_VALUE in size using multiple byte buffers.
> But i noticed from the clover report the entire MultiMMapIndexInput class is completely untested: no surprise since all tests make tiny indexes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org