You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Webster Homer <we...@sial.com> on 2018/01/15 18:56:45 UTC

cursorMark and Solrcloud

I have noticed strange behavior using cursorMark for deep paging in an
application. We use solrcloud for searching. We have several clouds for
development. For our development systems we have two different clouds. One
cloud has 2 shards with 1 replica per shard. All or our other clouds are
set up with 2 shards and 2 replicas per shard.

The application sorts the data by score descending, and the schema's unique
id ascending. According to the documentation, cursor mark requires that the
tie breaker be the schema's unique id.

When I run against the first cloud, I always get consistent results for the
same query. That is not the case with the second cloud. Some queries return
different numbers of results each time it's called. In the code I return
the number found from solr, and I count the number of results for all
iterations against the cursor mark. Sometimes it returns more rows than the
numFound and sometimes less.

I figured that the problem was in my code or in the data to make it easier
to find the problem I changed the sort to just be the unique id from the
schema. The problem went away.

1. The Number Found from solr was always the same
2. It worked when there was only 1 replica per shard
3. From debug statements it appears to return different total counts from
different replicas. When there were 2 replicas per shard I saw 4 different
values being returned.
4. Not sorting on score, and only on the unique id provides consistent
results.

So it appears that we should not include score in the sort when using
cursor mark and solrcloud.

We use solrj and CloudSolrClient. We are currently using the Solr 6.2 solrj
client with Solr 7.2 in our dev environment. We are in the process of
moving completely to 7.2.

Is this a known issue with cursormark and solrcloud?
For debugging purposes can I determine which solr node that cloudSolrClient
is using for a particular query?

I have not yet created a standalone test case for the issue, I'm still not
100% convinced that it is solrcloud, but it certainly looks like it is.

Thanks,
Webster

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Webster Homer <we...@sial.com>.

count is queryResponse.getResults().getNumFound()

The code stops when the cursorMark is equal to the nextCursorMark so how
can it exceed the numFound?
setting the sort order to just the unique id and the code works.

I would try to create an example case, but I'm under a deadline and have to
get this working and I found that using the normal start/rows iteration
seems to work. if less efficiently

On Tue, Jan 16, 2018 at 4:15 PM, Webster Homer <we...@sial.com>
wrote:

> sorry solr_returned is the total count of the documents retrieved from the
> queryResponse. So if I ask for 200 rows at at time it will be the increment
> of all the 200
>
> numberRetrieved += queryResponse.getResults().size();
>
> Where queryResponse is a solrj QueryResponse
>
> On Mon, Jan 15, 2018 at 6:11 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 1/15/2018 12:52 PM, Webster Homer wrote:
>>
>>> When I don't have score in the sort, the solr_returned and count are the
>>> same
>>>
>>
>> I don't know what "solr_returned" means.  I haven't encountered that
>> before, and nothing useful turns up in a google search.
>>
>> If you're getting different numFound values for the same query and the
>> index hasn't changed, there are two possible causes that I know of.  One is
>> replicas out of sync as already described, the other is having documents
>> with the same uniqueKey value in more than one shard.  If the count is
>> always the same with one sort, then I am leaning towards the latter cause.
>>
>> Which router does your collection use?  If it's implicit, how are you
>> deciding which shard gets which document?  If it's compositeId, have you
>> changed your hash ranges without deleting everything and building the index
>> again?
>>
>> Thanks,
>> Shawn
>>
>>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Webster Homer <we...@sial.com>.

sorry solr_returned is the total count of the documents retrieved from the
queryResponse. So if I ask for 200 rows at at time it will be the increment
of all the 200

numberRetrieved += queryResponse.getResults().size();

Where queryResponse is a solrj QueryResponse

On Mon, Jan 15, 2018 at 6:11 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/15/2018 12:52 PM, Webster Homer wrote:
>
>> When I don't have score in the sort, the solr_returned and count are the
>> same
>>
>
> I don't know what "solr_returned" means.  I haven't encountered that
> before, and nothing useful turns up in a google search.
>
> If you're getting different numFound values for the same query and the
> index hasn't changed, there are two possible causes that I know of.  One is
> replicas out of sync as already described, the other is having documents
> with the same uniqueKey value in more than one shard.  If the count is
> always the same with one sort, then I am leaning towards the latter cause.
>
> Which router does your collection use?  If it's implicit, how are you
> deciding which shard gets which document?  If it's compositeId, have you
> changed your hash ranges without deleting everything and building the index
> again?
>
> Thanks,
> Shawn
>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/15/2018 12:52 PM, Webster Homer wrote:
> When I don't have score in the sort, the solr_returned and count are the
> same

I don't know what "solr_returned" means.  I haven't encountered that 
before, and nothing useful turns up in a google search.

If you're getting different numFound values for the same query and the 
index hasn't changed, there are two possible causes that I know of.  One 
is replicas out of sync as already described, the other is having 
documents with the same uniqueKey value in more than one shard.  If the 
count is always the same with one sort, then I am leaning towards the 
latter cause.

Which router does your collection use?  If it's implicit, how are you 
deciding which shard gets which document?  If it's compositeId, have you 
changed your hash ranges without deleting everything and building the 
index again?

Thanks,
Shawn

Re: cursorMark and Solrcloud

Posted by Erick Erickson <er...@gmail.com>.

bq: When I don't have score in the sort, the solr_returned and count
are the same.

Hmmm, I don't know the inner workings of cursor mark all that well. But can you
tell what the score of one of the omitted documents is and how it
compares against
the score of the mark returned on the previous call?

Say the mark returned was 10.333 and an omitted doc's score was 10.332. That
would be a hint a there being an issue with scoring being used as a
primary sort.

You could further nail it down if you fired a query like
solr/collection/collection_shad1_replica1/select?q=(your orginal query
here)&fq=docId:(the doc IDs in question)&distrib=false.
at both replicas in the shard once you've found one that's omitted.


The other possibility is to use distributed IDF, see Configuring
statsCache here:
https://lucene.apache.org/solr/guide/7_0/distributed-requests.html.
I'm not entirely sure
that'd fix the problem, but if if did it would be another bit of evidence.

I'm assuming there's no indexing going on.

Best,
Erick

On Mon, Jan 15, 2018 at 11:52 AM, Webster Homer <we...@sial.com> wrote:
> When I don't have score in the sort, the solr_returned and count are the
> same
>
> On Mon, Jan 15, 2018 at 1:50 PM, Webster Homer <we...@sial.com>
> wrote:
>
>> The problem is that the cursor mark query returns different numbers of
>> documents each time it is called when the collection has multiple replicas
>> per shard.
>>
>> I meant collection. The same collection is on different clouds. The
>> collection in one cloud 1 has 2 shards with 1 replica per shard. In the
>> second cloud the collection has 2 shards with 2 replicas per shard.
>>
>> The same query using cursorMark against the second cloud returns different
>> numbers of documents. It appears that each replica returns a slightly
>> different number of documents. when run against cloud #1 it always returns
>> the same documents.
>> Here is a little bit from my debug statements.
>> count is the number found, solr_retrieved is a counter for all the
>> documents actually returned over all the calls to the cursor mark Why are
>> they different?
>> Each of these represent a search against our collection.
>>
>>     "count": 1382,
>>     "solr_returned": 1281,
>>
>>     "count": 1382,
>>     "solr_returned": 1366,
>>
>>     "count": 1382,
>>     "solr_returned": 1225,
>>
>>     "count": 1382,
>>     "solr_returned": 1397,
>>
>>
>> Taking score out of the sort, cloud #2 will return consistent result sets.
>>
>>
>>
>> On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>>> On 1/15/2018 11:56 AM, Webster Homer wrote:
>>>
>>>> I have noticed strange behavior using cursorMark for deep paging in an
>>>> application. We use solrcloud for searching. We have several clouds for
>>>> development. For our development systems we have two different clouds.
>>>> One
>>>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>>>> set up with 2 shards and 2 replicas per shard.
>>>>
>>>
>>> A cloud doesn't get set up with shards and replicas.  A collection does.
>>> One SolrCloud cluster can contain many collections.
>>>
>>> When you say "cloud" are you referring to a collection, or are you
>>> referring to a set of servers running ZooKeeper and Solr? The latter is
>>> what I would expect cloud to mean.
>>>
>>> When I run against the first cloud, I always get consistent results for
>>>> the
>>>> same query. That is not the case with the second cloud. Some queries
>>>> return
>>>> different numbers of results each time it's called. In the code I return
>>>> the number found from solr, and I count the number of results for all
>>>> iterations against the cursor mark. Sometimes it returns more rows than
>>>> the
>>>> numFound and sometimes less.
>>>>
>>>> I figured that the problem was in my code or in the data to make it
>>>> easier
>>>> to find the problem I changed the sort to just be the unique id from the
>>>> schema. The problem went away.
>>>>
>>>> 1. The Number Found from solr was always the same
>>>> 2. It worked when there was only 1 replica per shard
>>>> 3. From debug statements it appears to return different total counts from
>>>> different replicas. When there were 2 replicas per shard I saw 4
>>>> different
>>>> values being returned.
>>>> 4. Not sorting on score, and only on the unique id provides consistent
>>>> results.
>>>>
>>>
>>> When you have multiple replicas, each replica may have different numbers
>>> of deleted documents.  Deleted documents will almost always affect
>>> scoring.  Because SolrCloud load balances across replicas, one page of your
>>> cursorMark query can be served by a different replica than the next one, so
>>> the order of results can differ.
>>>
>>> When sorting by unique ID, deleted documents will not affect sort order.
>>> When there is only one replica, then sorting by score will always produce
>>> the same order, unless the index gets modified.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Webster Homer <we...@sial.com>.

When I don't have score in the sort, the solr_returned and count are the
same

On Mon, Jan 15, 2018 at 1:50 PM, Webster Homer <we...@sial.com>
wrote:

> The problem is that the cursor mark query returns different numbers of
> documents each time it is called when the collection has multiple replicas
> per shard.
>
> I meant collection. The same collection is on different clouds. The
> collection in one cloud 1 has 2 shards with 1 replica per shard. In the
> second cloud the collection has 2 shards with 2 replicas per shard.
>
> The same query using cursorMark against the second cloud returns different
> numbers of documents. It appears that each replica returns a slightly
> different number of documents. when run against cloud #1 it always returns
> the same documents.
> Here is a little bit from my debug statements.
> count is the number found, solr_retrieved is a counter for all the
> documents actually returned over all the calls to the cursor mark Why are
> they different?
> Each of these represent a search against our collection.
>
>     "count": 1382,
>     "solr_returned": 1281,
>
>     "count": 1382,
>     "solr_returned": 1366,
>
>     "count": 1382,
>     "solr_returned": 1225,
>
>     "count": 1382,
>     "solr_returned": 1397,
>
>
> Taking score out of the sort, cloud #2 will return consistent result sets.
>
>
>
> On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 1/15/2018 11:56 AM, Webster Homer wrote:
>>
>>> I have noticed strange behavior using cursorMark for deep paging in an
>>> application. We use solrcloud for searching. We have several clouds for
>>> development. For our development systems we have two different clouds.
>>> One
>>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>>> set up with 2 shards and 2 replicas per shard.
>>>
>>
>> A cloud doesn't get set up with shards and replicas.  A collection does.
>> One SolrCloud cluster can contain many collections.
>>
>> When you say "cloud" are you referring to a collection, or are you
>> referring to a set of servers running ZooKeeper and Solr? The latter is
>> what I would expect cloud to mean.
>>
>> When I run against the first cloud, I always get consistent results for
>>> the
>>> same query. That is not the case with the second cloud. Some queries
>>> return
>>> different numbers of results each time it's called. In the code I return
>>> the number found from solr, and I count the number of results for all
>>> iterations against the cursor mark. Sometimes it returns more rows than
>>> the
>>> numFound and sometimes less.
>>>
>>> I figured that the problem was in my code or in the data to make it
>>> easier
>>> to find the problem I changed the sort to just be the unique id from the
>>> schema. The problem went away.
>>>
>>> 1. The Number Found from solr was always the same
>>> 2. It worked when there was only 1 replica per shard
>>> 3. From debug statements it appears to return different total counts from
>>> different replicas. When there were 2 replicas per shard I saw 4
>>> different
>>> values being returned.
>>> 4. Not sorting on score, and only on the unique id provides consistent
>>> results.
>>>
>>
>> When you have multiple replicas, each replica may have different numbers
>> of deleted documents.  Deleted documents will almost always affect
>> scoring.  Because SolrCloud load balances across replicas, one page of your
>> cursorMark query can be served by a different replica than the next one, so
>> the order of results can differ.
>>
>> When sorting by unique ID, deleted documents will not affect sort order.
>> When there is only one replica, then sorting by score will always produce
>> the same order, unless the index gets modified.
>>
>> Thanks,
>> Shawn
>>
>>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Webster Homer <we...@sial.com>.

The problem is that the cursor mark query returns different numbers of
documents each time it is called when the collection has multiple replicas
per shard.

I meant collection. The same collection is on different clouds. The
collection in one cloud 1 has 2 shards with 1 replica per shard. In the
second cloud the collection has 2 shards with 2 replicas per shard.

The same query using cursorMark against the second cloud returns different
numbers of documents. It appears that each replica returns a slightly
different number of documents. when run against cloud #1 it always returns
the same documents.
Here is a little bit from my debug statements.
count is the number found, solr_retrieved is a counter for all the
documents actually returned over all the calls to the cursor mark Why are
they different?
Each of these represent a search against our collection.

    "count": 1382,
    "solr_returned": 1281,

    "count": 1382,
    "solr_returned": 1366,

    "count": 1382,
    "solr_returned": 1225,

    "count": 1382,
    "solr_returned": 1397,

Taking score out of the sort, cloud #2 will return consistent result sets.

On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/15/2018 11:56 AM, Webster Homer wrote:
>
>> I have noticed strange behavior using cursorMark for deep paging in an
>> application. We use solrcloud for searching. We have several clouds for
>> development. For our development systems we have two different clouds. One
>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>> set up with 2 shards and 2 replicas per shard.
>>
>
> A cloud doesn't get set up with shards and replicas.  A collection does.
> One SolrCloud cluster can contain many collections.
>
> When you say "cloud" are you referring to a collection, or are you
> referring to a set of servers running ZooKeeper and Solr? The latter is
> what I would expect cloud to mean.
>
> When I run against the first cloud, I always get consistent results for the
>> same query. That is not the case with the second cloud. Some queries
>> return
>> different numbers of results each time it's called. In the code I return
>> the number found from solr, and I count the number of results for all
>> iterations against the cursor mark. Sometimes it returns more rows than
>> the
>> numFound and sometimes less.
>>
>> I figured that the problem was in my code or in the data to make it easier
>> to find the problem I changed the sort to just be the unique id from the
>> schema. The problem went away.
>>
>> 1. The Number Found from solr was always the same
>> 2. It worked when there was only 1 replica per shard
>> 3. From debug statements it appears to return different total counts from
>> different replicas. When there were 2 replicas per shard I saw 4 different
>> values being returned.
>> 4. Not sorting on score, and only on the unique id provides consistent
>> results.
>>
>
> When you have multiple replicas, each replica may have different numbers
> of deleted documents.  Deleted documents will almost always affect
> scoring.  Because SolrCloud load balances across replicas, one page of your
> cursorMark query can be served by a different replica than the next one, so
> the order of results can differ.
>
> When sorting by unique ID, deleted documents will not affect sort order.
> When there is only one replica, then sorting by score will always produce
> the same order, unless the index gets modified.
>
> Thanks,
> Shawn
>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: cursorMark and Solrcloud

Posted by Shawn Heisey <ap...@elyograg.org>.

On 1/15/2018 11:56 AM, Webster Homer wrote:
> I have noticed strange behavior using cursorMark for deep paging in an
> application. We use solrcloud for searching. We have several clouds for
> development. For our development systems we have two different clouds. One
> cloud has 2 shards with 1 replica per shard. All or our other clouds are
> set up with 2 shards and 2 replicas per shard.

A cloud doesn't get set up with shards and replicas.  A collection 
does.  One SolrCloud cluster can contain many collections.

When you say "cloud" are you referring to a collection, or are you 
referring to a set of servers running ZooKeeper and Solr? The latter is 
what I would expect cloud to mean.

> When I run against the first cloud, I always get consistent results for the
> same query. That is not the case with the second cloud. Some queries return
> different numbers of results each time it's called. In the code I return
> the number found from solr, and I count the number of results for all
> iterations against the cursor mark. Sometimes it returns more rows than the
> numFound and sometimes less.
>
> I figured that the problem was in my code or in the data to make it easier
> to find the problem I changed the sort to just be the unique id from the
> schema. The problem went away.
>
> 1. The Number Found from solr was always the same
> 2. It worked when there was only 1 replica per shard
> 3. From debug statements it appears to return different total counts from
> different replicas. When there were 2 replicas per shard I saw 4 different
> values being returned.
> 4. Not sorting on score, and only on the unique id provides consistent
> results.

When you have multiple replicas, each replica may have different numbers 
of deleted documents.  Deleted documents will almost always affect 
scoring.  Because SolrCloud load balances across replicas, one page of 
your cursorMark query can be served by a different replica than the next 
one, so the order of results can differ.

When sorting by unique ID, deleted documents will not affect sort 
order.  When there is only one replica, then sorting by score will 
always produce the same order, unless the index gets modified.

Thanks,
Shawn