You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2011/08/26 08:37:29 UTC
[jira] [Created] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Term vectors missing after addIndexes + optimize
------------------------------------------------
Key: LUCENE-3403
URL: https://issues.apache.org/jira/browse/LUCENE-3403
Project: Lucene - Java
Issue Type: Bug
Components: core/index
Affects Versions: 3.3
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Blocker
Fix For: 3.4, 4.0
I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Resolved] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera resolved LUCENE-3403.
--------------------------------
Resolution: Fixed
Committed revision 1162300 (3x).
Committed revision 1162301 (trunk -- tests only).
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091660#comment-13091660 ]
Simon Willnauer commented on LUCENE-3403:
-----------------------------------------
good catch Shai, Does this happen on 4.0 too? I don't think we have setHasVectors there anymore. I am just wondering since you put 4.0 as a fix version.
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091709#comment-13091709 ]
Shai Erera commented on LUCENE-3403:
------------------------------------
You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?
The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.
In fact, the field infos are read only on IW open ... so even if I addIndexes(), commit(), addIndexes(), the field infos would still be missing. A workaround I see for now is to addIndexes(), close(), new IW(), continue with addIndexes() or optimize(). Which is ugly but it's a workaround until we release a new version. I'll try that.
If it's ok, I'll commit the fix to 3x and the tests-only to trunk.
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091739#comment-13091739 ]
Michael McCandless commented on LUCENE-3403:
--------------------------------------------
Phew nice catch Shai!
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091712#comment-13091712 ]
Simon Willnauer commented on LUCENE-3403:
-----------------------------------------
bq.You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?
don't get me wrong I was just double checking because 4.0 was not in the affected version. I don't wanna miss such a trap. :)
bq. The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.
maybe we should adopt what trunk does, checking all the FI if one of the stores vectors unless you FIs is readonly?
bq. If it's ok, I'll commit the fix to 3x and the tests-only to trunk.
+1 tests are great!
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (LUCENE-3403) Term vectors missing after
addIndexes + optimize
Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shai Erera updated LUCENE-3403:
-------------------------------
Attachment: LUCENE-3403.patch
Patch adds 3 test cases to TestTermVectors. If you don't apply the fix to IndexWriter, the tests which call addIndexes fail.
It also moves the setHasVectors call after merger.merge() in IndexWriter.
BTW, if you omit the optimize() call and the fix to IW, the tests pass.
> Term vectors missing after addIndexes + optimize
> ------------------------------------------------
>
> Key: LUCENE-3403
> URL: https://issues.apache.org/jira/browse/LUCENE-3403
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 3.3
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Blocker
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3403.patch
>
>
> I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.
> I think I tracked the problem down to IndexWriter.mergeMiddle() -- it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.
> will post a patch shortly.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org