You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2007/12/31 11:48:43 UTC

[jira] Created: (LUCENE-1112) Document is partially indexed on an unhandled exception

Document is partially indexed on an unhandled exception
-------------------------------------------------------

                 Key: LUCENE-1112
                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3


With LUCENE-843, it's now possible for a subset of a document's
fields/terms to be indexed or stored when an exception is hit.  This
was not the case in the past (it was "all or none").

I plan to make it "all or none" again by immediately marking a
document as deleted if any exception is hit while indexing it.

Discussion leading up to this:

  http://www.gossamer-threads.com/lists/lucene/java-dev/56103


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1112.
----------------------------------------

    Resolution: Fixed

> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555250#action_12555250 ] 

Doron Cohen commented on LUCENE-1112:
-------------------------------------

{quote}
It's not "position increment" that's 3, it's "position" that's 3?
...
Well, invertField increments by positionIncrement minus 1, then addPosition increments by 1 (this mirrors how DocumentWriter used to work).
{quote}
Right, my mistake.

> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1112:
--------------------------------

    Attachment: lucene-1112-test.patch

Patch demonstrating the problem: testWickedLongTerm() modified to fail when numDocs grows although addDocument() throws an exception.

> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555119 ] 

Michael McCandless commented on LUCENE-1112:
--------------------------------------------

Thanks Doron; I'll fold this in (though, I'll move it to the testExceptionFromTokenStream case since it looks like we're going to no longer throw an exception on hitting a wicked-long-term).

> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555225 ] 

Doron Cohen commented on LUCENE-1112:
-------------------------------------

I skimmed at the long token part of the patch:
* In the test, why is the position increment of 'another' 3, 
  I think it should be 2?
* assertEquals("failed document should not be in the index",2,reader.numDocs());
  should be "document with skipped token should be in the index"?
* I believe that "positon++" in DocumentsWriter is not required because 
  invertField() already incremented the position before calling addPosition()?
  (my fault, I suggested to still increment the position...)


> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1112:
---------------------------------------

    Attachment: LUCENE-1112.patch

Patch attached.  All tests pass.  I plan to commit in a day or two.

Here are the changes:

  * No longer throw an exception when massive term is hit.  Instead,
    we now print this message to infoStream, if set:

      WARNING: document contains at least one immense term (longer than the max length 16383), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...'

  * Still increment position when we hit a massive term

  * An unhandled "non-aborting" exception immediately marks the
    document that hit the exception as deleted.  I added comments at
    the top of DocumentsWriter to explain aborting vs non-aborting
    exceptions.  This change actually adds the infrastructure for
    deleting by doc ID, which we've discussed adding to IW in the
    past, but, I haven't exposed any public APIs for doing so.

  * No longer log to infoStream how many docs were deleted on flush
    since that deletion count is not accurate when mixing delete by
    term and by docID.


> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1112) Document is partially indexed on an unhandled exception

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12555226 ] 

Michael McCandless commented on LUCENE-1112:
--------------------------------------------

{quote}
* In the test, why is the position increment of 'another' 3,
I think it should be 2?
{quote}

It's not "position increment" that's 3, it's "position" that's 3?  And I think it should be 3 because this field is "abc xyz <massive-term> another term", so another should have position 3 since we count <massive-term> as one position?

{quote}
* assertEquals("failed document should not be in the index",2,reader.numDocs());
should be "document with skipped token should be in the index"?
{quote}

Woops, yes, I'll fix the string.

{quote}
I believe that "positon++" in DocumentsWriter is not required because
invertField() already incremented the position before calling addPosition()?
(my fault, I suggested to still increment the position...)
{quote}
Well, invertField increments by positionIncrement minus 1, then addPosition increments by 1 (this mirrors how DocumentWriter used to work).

> Document is partially indexed on an unhandled exception
> -------------------------------------------------------
>
>                 Key: LUCENE-1112
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1112
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: lucene-1112-test.patch, LUCENE-1112.patch
>
>
> With LUCENE-843, it's now possible for a subset of a document's
> fields/terms to be indexed or stored when an exception is hit.  This
> was not the case in the past (it was "all or none").
> I plan to make it "all or none" again by immediately marking a
> document as deleted if any exception is hit while indexing it.
> Discussion leading up to this:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/56103

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org