You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/05/26 19:29:13 UTC

[jira] [Commented] (LUCENE-7302) IndexWriter should tell you the order of indexing operations

    [ https://issues.apache.org/jira/browse/LUCENE-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302757#comment-15302757 ] 

Michael McCandless commented on LUCENE-7302:
--------------------------------------------

I've been pushing changes to this branch:

  https://github.com/mikemccand/lucene-solr/tree/sequence_numbers

I think it's close ... I've resolved all nocommits, and created some
fun tests with threads updating the same doc at once, doing concurrent
commits, and verifying what the sequence numbers claim turns out to be
true.

The changes are relatively minor: IW already "knows" the order that
operations were applied, but these methods return {{void}} today and
this changes them to return {{long}} instead.  Callers who don't
care can just ignore the returned long.

It also lets us remove the wrapper class {{TrackingIndexWriter}} which
was doing basically the same thing (returning a long for each op) but
with weaker guarantees.

These sequence numbers are fleeting, not saved into commit points,
etc., and only useful within one IW instance (they reset back to 1 on
the next IW instance).

I'll build an applyable patch and post here ...

> IndexWriter should tell you the order of indexing operations
> ------------------------------------------------------------
>
>                 Key: LUCENE-7302
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7302
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 6.1, master (7.0)
>
>
> Today, when you use multiple threads to concurrently index, Lucene
> knows the effective order that those operations were applied to the
> index, but doesn't return that information back to you.
> But this is important to know, if you want to build a reliable search
> API on top of Lucene.  Combined with the recently added NRT
> replication (LUCENE-5438) it can be a strong basis for an efficient
> distributed search API.
> I think we should return this information, since we already have it,
> and since it could simplify servers (ES/Solr) on top of Lucene:
>   - They would not require locking preventing the same id from being
>     indexed concurrently since they could instead check the returned
>     sequence number to know which update "won", for features like
>     "realtime get".  (Locking is probably still needed for features
>     like optimistic concurrency).
>   - When re-applying operations from a prior commit point, e.g. on
>     recovering after a crash from a transaction log, they can know
>     exactly which operations made it into the commit and which did
>     not, and replay only the truly missing operations.
> Not returning this just hurts people who try to build servers on top
> with clear semantics on crashing/recovering ... I also struggled with
> this when building a simple "server wrapper" on top of Lucene
> (LUCENE-5376).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org