You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gili Nachum <gi...@gmail.com> on 2015/10/04 20:54:51 UTC

Re: How can I get a monotonically increasing field value for docs?

Glad I made that silly statement.
I came to know cursorMark, after noticing how much inefficient is native
deep paging in Solr, where each shard returns rowXstart worth of data to
the shard servicing the query. I then *wrongly* assumed that cursorMark
records the returned doc # of the result set for *each shard*, so that in
the next request the each shard would return the next rows worth of
document from where its previous index.

I now see how the cursorMark value encodes the fields to sort by of the
last returned document, so that on the next requests each shard would fetch
documents post that point (with Lucene's searchAfter()) - just like in my
own custom implementation.

Thanks for clarifying.

On Wed, Sep 30, 2015 at 8:46 PM, Chris Hostetter <ho...@fucit.org>
wrote:

>
> : Small potato: I assume cursor mark breaks when the number of shards
> changes
> : while keeping the original values doesn't, since the relative position is
> : encoded per shard...But that's an edge case.
>
> I don't understand your question ... the encoded cursorMark values don't
> know about thing know/care anyhting about shards.  It only encodes
> information about the *relative* position where you left off according to
> the specified sort -- that position is relative to the abstract orderings
> of all possible values, not relative to any particular shard(s)
>
> in your use case it would function *exactly* the same as keeping track of
> the exact timestamp and unqiueKey of the last doc you recieved, and
> passing that cursorMark value back on the next query would be exactly the
> same as specifying a "fq=timestamp:{X TO *] OR (timestamp:X AND id:[Y TO
> *])" on the next request, except that under the covers the way a
> cursorMark is passed down to the IndexSearcher as a "searchAfter"
> structure should be more efficient then using an fq.
>
> adding shards, removing shards, adding documents, removing documents ...
> cursorMark doesn't care ... what you get back is any doc that, at the
> moment you sent that cursorMark value, has sort values which would place
> that doc *after* the last doc you recevied with the previous request when
> you got that value as the nextCursorMark.
>
> changing the value of a sort field in a document in the middle of
> iteration might affect if it is ever seen, or if it's seen more then once
> (see previusly mentioned URL for detailed examples) but spliting shards or
> what not it's not going to the results of iterating a cursor in any way.
>
>
> -Hoss
> http://www.lucidworks.com/
>