You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Chuck Williams (JIRA)" <ji...@apache.org> on 2009/09/01 00:23:33 UTC

[jira] Commented: (LUCENE-600) ParallelWriter companion to ParallelReader

    [ https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749656#action_12749656 ] 

Chuck Williams commented on LUCENE-600:
---------------------------------------

The version attached here is from over 3 years ago.  Our version has evolved along with Lucene and the whole apparatus is fully functional with the latest lucene.

The fields in each subindex are disjoint.  A logical Document is the collection of all fields from each real Document in each real subindex with same doc-id (i.e., the model Doug started with ParallelReader).  There is no issue with deletion by query or term as it deletes the whole logical Document.  Field updates in our scheme don't use deletion.

Merge-by-size is only an issue if you allow it to be decided independently in each subindex.  In practice that is not very important since one subindex is size-dominant (the one containing the document body field).  One can merge-by-size that subindex and force the others to merge consistently.

The only reason for the corresponding-segment constraint is that deletion changes doc-id's by purging deleted documents.  I know some Lucene apps address this by never purging deleted documents, which is ok in some domains where deletion is rare.  I think there are other ways to resolve it as well.



> ParallelWriter companion to ParallelReader
> ------------------------------------------
>
>                 Key: LUCENE-600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-600
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Chuck Williams
>            Priority: Minor
>         Attachments: ParallelWriter.patch
>
>
> A new class ParallelWriter is provided that serves as a companion to ParallelReader.  ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject to:
>     1.  ParallelWriter.addDocument() is synchronized, which might have an adverse effect on performance.  The writes to the sub-indexes are, however, done in parallel.
>     2.  The application must ensure that the ParallelReader is never reopened inside ParallelWriter.addDocument(), else it might find the sub-indexes out of sync.
>     3.  The application must deal with recovery from ParallelWriter.addDocument() exceptions.  Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s) in one sub-index that were not successfully added to all sub-indexes, and then optimizing all sub-indexes.
> A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter.  This is in the same spirit as the existing Searchable and Fieldable classes.
> This implementation uses java 1.5.  The patch applies against today's svn head.  All tests pass, including the new TestParallelWriter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org