You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2017/08/03 10:47:55 UTC

Solr Pagination

Hi all,

I have a collection that is frequently updated, is it possible that a Solr
Cloud query returns duplicate documents while paginating?

Just to be clear, there is a collection with about 3M of documents and a
Solr query selects just 500K documents sorted by Id, which are returned
simply paginating the results with the parameters start, rows and sort.

The query is like this one:

http://localhost:8983/solr/collection1/select?q=idCat:1&start=0&rows=20000&sort=id
asc

To be honest, I've not verified personally, but the consumer of this query
claims that after few trials, duplicate documents where returned.

Given that the collection is frequently updated, I suppose that adding a
large bunch of new documents during the pagination can affect the index and
change the order of results.

In other words, if I have 500K documents returned by 25 queries (20K
documents for each request) and during the iteration, 1000 new documents
are inserted.
Given that I have a query sorted by Id, I think it is possibile that the
documents returned reflect the new order, so it is possible that a document
returned in a previous query now is also present in the current results.

Again, I'm trying to solve this problem using the deep paging.

I have read that "unlike basic pagination, Cursor pagination does not rely
on using an absolute "offset" into the completed sorted list of matching
documents.  Instead, the cursorMark specified in a request encapsulates
information about the relative position of the last document returned,
based on the absolute sort values of that document.  This means that the
impact of index modifications is much smaller when using a cursor compared
to basic pagination."

What do you think about, am I right? The deep paging can help to solve this
problem?

Best regards and thanks for your time,
Vincenzo

Re: Solr Pagination

Posted by Vincenzo D'Amore <v....@gmail.com>.
Don't spend your time reading this, I've just found an answer in the
documentation:


> *One way to ensure that a document will never be returned more then once,
> is to use the uniqueKey field as the primary (and therefore: only
> significant) sort criterion. **In this situation, you will be guaranteed
> that each document is only returned once, no matter how it may be be
> modified during the use of the cursor.*


https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results



On Thu, Aug 3, 2017 at 12:47 PM, Vincenzo D'Amore <v....@gmail.com>
wrote:

> Hi all,
>
> I have a collection that is frequently updated, is it possible that a Solr
> Cloud query returns duplicate documents while paginating?
>
> Just to be clear, there is a collection with about 3M of documents and a
> Solr query selects just 500K documents sorted by Id, which are returned
> simply paginating the results with the parameters start, rows and sort.
>
> The query is like this one:
>
> http://localhost:8983/solr/collection1/select?q=idCat:1&
> start=0&rows=20000&sort=id asc
>
> To be honest, I've not verified personally, but the consumer of this query
> claims that after few trials, duplicate documents where returned.
>
> Given that the collection is frequently updated, I suppose that adding a
> large bunch of new documents during the pagination can affect the index and
> change the order of results.
>
> In other words, if I have 500K documents returned by 25 queries (20K
> documents for each request) and during the iteration, 1000 new documents
> are inserted.
> Given that I have a query sorted by Id, I think it is possibile that the
> documents returned reflect the new order, so it is possible that a document
> returned in a previous query now is also present in the current results.
>
> Again, I'm trying to solve this problem using the deep paging.
>
> I have read that "unlike basic pagination, Cursor pagination does not rely
> on using an absolute "offset" into the completed sorted list of matching
> documents.  Instead, the cursorMark specified in a request encapsulates
> information about the relative position of the last document returned,
> based on the absolute sort values of that document.  This means that the
> impact of index modifications is much smaller when using a cursor compared
> to basic pagination."
>
> What do you think about, am I right? The deep paging can help to solve
> this problem?
>
> Best regards and thanks for your time,
> Vincenzo
>
>