You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2011/08/26 08:37:29 UTC

[jira] [Created] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Term vectors missing after addIndexes + optimize
------------------------------------------------

                 Key: LUCENE-3403
                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
             Project: Lucene - Java
          Issue Type: Bug
          Components: core/index
    Affects Versions: 3.3
            Reporter: Shai Erera
            Assignee: Shai Erera
            Priority: Blocker
             Fix For: 3.4, 4.0


I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.

I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.

will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera resolved LUCENE-3403.
--------------------------------

    Resolution: Fixed

Committed revision 1162300 (3x).
Committed revision 1162301 (trunk -- tests only).

> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091660#comment-13091660 ] 

Simon Willnauer commented on LUCENE-3403:
-----------------------------------------

good catch Shai, Does this happen on 4.0 too? I don't think we have setHasVectors there anymore. I am just wondering since you put 4.0 as a fix version.

> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091709#comment-13091709 ] 

Shai Erera commented on LUCENE-3403:
------------------------------------

You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?

The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.

In fact, the field infos are read only on IW open ... so even if I addIndexes(), commit(), addIndexes(), the field infos would still be missing. A workaround I see for now is to addIndexes(), close(), new IW(), continue with addIndexes() or optimize(). Which is ugly but it's a workaround until we release a new version. I'll try that.

If it's ok, I'll commit the fix to 3x and the tests-only to trunk.

> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091739#comment-13091739 ] 

Michael McCandless commented on LUCENE-3403:
--------------------------------------------

Phew nice catch Shai!


> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091712#comment-13091712 ] 

Simon Willnauer commented on LUCENE-3403:
-----------------------------------------

bq.You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?

don't get me wrong I was just double checking because 4.0 was not in the affected version. I don't wanna miss such a trap. :)

bq. The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.

maybe we should adopt what trunk does, checking all the FI if one of the stores vectors unless you FIs is readonly?

bq. If it's ok, I'll commit the fix to 3x and the tests-only to trunk.

+1 tests are great!



> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3403) Term vectors missing after addIndexes + optimize

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-3403:
-------------------------------

    Attachment: LUCENE-3403.patch

Patch adds 3 test cases to TestTermVectors. If you don't apply the fix to IndexWriter, the tests which call addIndexes fail.

It also moves the setHasVectors call after merger.merge() in IndexWriter.

BTW, if you omit the optimize() call and the fix to IW, the tests pass.

> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
>                 Key: LUCENE-3403
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3403
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 3.3
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org