You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/02/07 13:51:19 UTC

[jira] [Updated] (LUCENE-5438) add near-real-time replication

     [ https://issues.apache.org/jira/browse/LUCENE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-5438:
---------------------------------------

    Attachment: LUCENE-5438.patch

Initial, very exploratory patch; it just contains a test case
(TestNRTReplication), showing how NRT replication could work.  It's
not yet at all integrated with the replication module's existing
APIs... and I'm not sure how to do that.

But the test doesn't cheat, i.e. all index changes are pushed via
byte[] / file copy from master to replica, and it does pass... though
the CheckIndex that MDW.close calls is very slow, I think because of
the term vector / postings cross checking.

Flushed segments are "immediately" pushed to the replica; merged
segments are first "warmed" by pre-copying to the replica with lower
priority.  I also created a simple ReferenceManager<IS> that does the
reopen from a provided SegmentInfos, which the app on the replica side
would use to obtain fresh searchers.  From that point it can use
SearcherLifetimeManager "as usual" to track/expire past searchers.


> add near-real-time replication
> ------------------------------
>
>                 Key: LUCENE-5438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5438
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/replicator
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.7
>
>         Attachments: LUCENE-5438.patch
>
>
> Lucene's replication module makes it easy to incrementally sync index
> changes from a master index to any number of replicas, and it
> handles/abstracts all the underlying complexity of holding a
> time-expiring snapshot, finding which files need copying, syncing more
> than one index (e.g., taxo + index), etc.
> But today you must first commit on the master, and then again the
> replica's copied files are fsync'd, because the code operates on
> commit points.  But this isn't "technically" necessary, and it mixes
> up durability and fast turnaround time.
> Long ago we added near-real-time readers to Lucene, for the same
> reason: you shouldn't have to commit just to see the new index
> changes.
> I think we should do the same for replication: allow the new segments
> to be copied out to replica(s), and new NRT readers to be opened, to
> fully decouple committing from visibility.  This way apps can then
> separately choose when to replicate (for freshness), and when to
> commit (for durability).
> I think for some apps this could be a compelling alternative to the
> "re-index all documents on each shard" approach that Solr Cloud /
> ElasticSearch implement today, and it may also mean that the
> transaction log can remain external to / above the cluster.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org