You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2010/12/13 13:06:01 UTC

[jira] Created: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

SegmentInfo should explicitly track whether that segment wrote term vectors
---------------------------------------------------------------------------

                 Key: LUCENE-2811
                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 3.1, 4.0


Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.

This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.

It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.

This also fixes these intermittent failures we've been seeing:

{noformat}
junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
       at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
       at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
       at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
       at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
       at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
       at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
       at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
       at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
       at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971327#action_12971327 ] 

Michael McCandless commented on LUCENE-2811:
--------------------------------------------

Good idea, will do...

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971303#action_12971303 ] 

Earwin Burrfoot commented on LUCENE-2811:
-----------------------------------------

I think SegmentInfo.hasVectors should be a boolean.

If this is an old index, we can check the file presence in SegmentInfo constructor, set it properly, and on next write index is silently upgraded.

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2811:
---------------------------------------

    Attachment: LUCENE-2811.patch

Patch.

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971510#action_12971510 ] 

Earwin Burrfoot commented on LUCENE-2811:
-----------------------------------------

>From IRC:
SegmentMerger.hasVectors carries no new information compared to OneMerge.hasVectors, and can be dropped.
OneMerge.hasVectors is initialized just near OneMerge.info, and is later used to set OneMerge.info.hasVectors, might as well do that from the get go and drop OM.hV.

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970811#action_12970811 ] 

Michael McCandless commented on LUCENE-2811:
--------------------------------------------

bq. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin)

I agree -- we gotta get this stuff under codec control!  No core code should be operating on file names.

bq. I looked at the patch and it looks good to me though except of the one System.out.println:

Woops, thanks, I'll remove the sop!

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2811.
----------------------------------------

    Resolution: Fixed

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970981#action_12970981 ] 

Michael McCandless commented on LUCENE-2811:
--------------------------------------------

Committed to trunk...

I'll let this age a bit before backporting to 3.x.

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970791#action_12970791 ] 

Michael McCandless commented on LUCENE-2811:
--------------------------------------------

Patch also has fix for intermittent failure I hit in TestAddIndexes testAddIndexesWithRollback.

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2811) SegmentInfo should explicitly track whether that segment wrote term vectors

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970794#action_12970794 ] 

Simon Willnauer commented on LUCENE-2811:
-----------------------------------------

Mike good that you figured it out :D - other than that I think this part gets messier and messier each time we change something. Your patch is a good indicator that we need to push stuff into codecs and let codecs decide if a feature is present in a segment. BW code should be handled in PreFlexCodec and new stuff like hasVector should be something a codec holds or rather segmentCodecs encodes really. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin) 

Also all the CFS and Compound Doc Store stuff should be pushed to codecs.

I looked at the patch and it looks good to me though except of the one System.out.println:

{code}
    System.out.println("SI READ 2");
{code}

simon

> SegmentInfo should explicitly track whether that segment wrote term vectors
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-2811
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2811
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2811.patch
>
>
> Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.
> This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.
> It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist.  I don't like that; it should only be opened if they exist.
> This also fixes these intermittent failures we've been seeing:
> {noformat}
> junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
>        at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
>        at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
>        at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
>        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
>        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
>        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
>        at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org