You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2010/05/17 13:37:42 UTC
[jira] Issue Comment Edited: (LUCENE-2455) Some house cleaning in
addIndexes*
[ https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868177#action_12868177 ]
Andrzej Bialecki edited comment on LUCENE-2455 at 5/17/10 7:36 AM:
--------------------------------------------------------------------
FYI, I'm working on a different version of IndexSplitter that uses the logic in SegmentMerger directly, without going through IW.addIndexes(FilterIndexReader).
However, there are other applications for which this API is crucial, e.g. LUCENE-1812 or IndexSorter (in Nutch) - in short, any client apps that want to merge-in index data that does not correspond 1:1 to a Directory. For this reason I think the pair of IndexWriter.addIndexes(IndexReader...) and FilterIndexReader abstraction is extremely useful and that IndexWriter.addIndexes(Directory...) is not a sufficient replacement.
(edit: unless there is a better user-level API based on the flex producers/consumers...)
was (Author: ab):
FYI, I'm working on a different version of IndexSplitter that uses the logic in SegmentMerger directly, without going through IW.addIndexes(FilterIndexReader).
However, there are other applications for which this API is crucial, e.g. LUCENE-1812 or IndexSorter (in Nutch) - in short, any client apps that want to merge-in index data that does not correspond 1:1 to a Directory. For this reason I think the pair of IndexWriter.addIndexes(IndexReader...) and FilterIndexReader abstraction is extremely useful and that IndexWriter.addIndexes(Directory...) is not a sufficient replacement.
> Some house cleaning in addIndexes*
> ----------------------------------
>
> Key: LUCENE-2455
> URL: https://issues.apache.org/jira/browse/LUCENE-2455
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing -
> especially on when to invoke each. Also, addIndexes calls optimize() in
> the beginning, but only on the target index. It also includes the
> following jdoc statement, which from how I understand the code, is
> wrong: _After this completes, the index is optimized._ -- optimize() is
> called in the beginning and not in the end.
> On the other hand, addIndexesNoOptimize does not call optimize(), and
> relies on the MergeScheduler and MergePolicy to handle the merges.
> After a short discussion about that on the list (Thanks Mike for the
> clarifications!) I understand that there are really two core differences
> between the two:
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of
> calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize()
> before on any of the Directories/Indexes if it's a concern.
> That way, we maintain all the flexibility in the API -
> addIndexes(IndexReader...) allows for using IR extensions,
> addIndexes(Directory...) is considered more efficient, by allowing the
> merges to happen concurrently (depending on MS) and also factors in the
> MP. So unless you have an IR extension, addDirectories is really the one
> you should be using. And you have the freedom to call optimize() before
> each if you care about it, or don't if you don't care. Either way,
> incurring the cost of optimize() is entirely in the user's hands.
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler
> nor MergePolicy, but rather call SegmentMerger directly. This might be
> another place for improvement. I'll look into it, and if it's not too
> complicated, I may cover it by this issue as well. If you have any hints
> that can give me a good head start on that, please don't be shy :).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org