You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2007/12/31 11:48:43 UTC
[jira] Created: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Document is partially indexed on an unhandled exception
-------------------------------------------------------
Key: LUCENE-1112
URL: https://issues.apache.org/jira/browse/LUCENE-1112
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
Fix For: 2.3
With LUCENE-843, it's now possible for a subset of a document's
fields/terms to be indexed or stored when an exception is hit. This
was not the case in the past (it was "all or none").
I plan to make it "all or none" again by immediately marking a
document as deleted if any exception is hit while indexing it.
Discussion leading up to this:
http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless resolved LUCENE-1112.
----------------------------------------
Resolution: Fixed
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555250#action_12555250 ]
Doron Cohen commented on LUCENE-1112:
-------------------------------------
{quote}
It's not "position increment" that's 3, it's "position" that's 3?
...
Well, invertField increments by positionIncrement minus 1, then addPosition increments by 1 (this mirrors how DocumentWriter used to work).
{quote}
Right, my mistake.
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-1112:
--------------------------------
Attachment: lucene-1112-test.patch
Patch demonstrating the problem: testWickedLongTerm() modified to fail when numDocs grows although addDocument() throws an exception.
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555119 ]
Michael McCandless commented on LUCENE-1112:
--------------------------------------------
Thanks Doron; I'll fold this in (though, I'll move it to the testExceptionFromTokenStream case since it looks like we're going to no longer throw an exception on hitting a wicked-long-term).
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555225 ]
Doron Cohen commented on LUCENE-1112:
-------------------------------------
I skimmed at the long token part of the patch:
* In the test, why is the position increment of 'another' 3,
I think it should be 2?
* assertEquals("failed document should not be in the index",2,reader.numDocs());
should be "document with skipped token should be in the index"?
* I believe that "positon++" in DocumentsWriter is not required because
invertField() already incremented the position before calling addPosition()?
(my fault, I suggested to still increment the position...)
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1112:
---------------------------------------
Attachment: LUCENE-1112.patch
Patch attached. All tests pass. I plan to commit in a day or two.
Here are the changes:
* No longer throw an exception when massive term is hit. Instead,
we now print this message to infoStream, if set:
WARNING: document contains at least one immense term (longer than the max length 16383), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...'
* Still increment position when we hit a massive term
* An unhandled "non-aborting" exception immediately marks the
document that hit the exception as deleted. I added comments at
the top of DocumentsWriter to explain aborting vs non-aborting
exceptions. This change actually adds the infrastructure for
deleting by doc ID, which we've discussed adding to IW in the
past, but, I haven't exposed any public APIs for doing so.
* No longer log to infoStream how many docs were deleted on flush
since that deletion count is not accurate when mixing delete by
term and by docID.
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-1112) Document is partially indexed on an
unhandled exception
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555226 ]
Michael McCandless commented on LUCENE-1112:
--------------------------------------------
{quote}
* In the test, why is the position increment of 'another' 3,
I think it should be 2?
{quote}
It's not "position increment" that's 3, it's "position" that's 3? And I think it should be 3 because this field is "abc xyz <massive-term> another term", so another should have position 3 since we count <massive-term> as one position?
{quote}
* assertEquals("failed document should not be in the index",2,reader.numDocs());
should be "document with skipped token should be in the index"?
{quote}
Woops, yes, I'll fix the string.
{quote}
I believe that "positon++" in DocumentsWriter is not required because
invertField() already incremented the position before calling addPosition()?
(my fault, I suggested to still increment the position...)
{quote}
Well, invertField increments by positionIncrement minus 1, then addPosition increments by 1 (this mirrors how DocumentWriter used to work).
> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
> Key: LUCENE-1112
> URL: https://issues.apache.org/jira/browse/LUCENE-1112
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 2.3
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 2.3
>
> Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit. This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
> http://www.gossamer-threads.com/lists/lucene/java-dev/56103
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org