You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Mark Hieber <hi...@gmail.com> on 2023/02/17 19:52:06 UTC

Solr Query not always returning correct results

We have a cluster of hosts running Solr 8.4 Each host has an application
which listens to an external source for updated documents. When it gets a
document we care about, it indexes that document into the correct Solr core
(we are not running cloud).

In our API service, when we get a request to put this type of document, we
first query Solr to see if the document exists. If it does not, we then
create a new document in our database and the document is sent to the
application to be indexed into Solr. If the document we are trying to *put* (in
the API Service) exists in Solr, then we throw an exception back to the
user if they have not specified the existing version (not the _version_
field from Solr, rather an increasing counter).

As part of our write logic, after we put the document into the database, we
query the Solr stack until we get the response containing the newly written
document. So we know the document was written at this point.

Some time later (maybe 5-10 minutes), we get another put request for the
same document id. We query Solr, and in some cases, we get no documents
returned, even though just before we actually found the document. The
document has not been deleted in the interim.

We use the same query for both checking for existence at the beginning of
the logic, and for checking for eventual consistency after writing to the
database,

I could add a retry to the first part of the logic (retry if we don't find
a document), but the question is why we don't find it the first time (but
for the second put).

If I query for the document (using the same query), I find the document on
each host.

Why are we not seeing documents which are actually there?

Re: Solr Query not always returning correct results

Posted by Mark Hieber <hi...@gmail.com>.
Could this be related to thread issues in Solr?

On Fri, Feb 17, 2023 at 4:44 PM Mark Hieber <hi...@gmail.com> wrote:

> I agree. However, what we are seeing is that a query *will return the
> document* when we check for eventual consistency, but when we check some
> time later (say 5 minutes), then we do not get the document returned. So it
> got the document correctly, and returned the results, but then later it did
> not return the result for the same query.
>
> On Fri, Feb 17, 2023 at 4:32 PM Walter Underwood <wu...@wunderwood.org>
> wrote:
>
>> Query the database to see whether the document is in the database.
>>
>> The problem happens when you don’t follow the design pattern “single
>> source of truth”. Solr has a delayed version of the true state, so it will
>> sometimes give wrong answers.
>>
>> Single source of truth
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> en.wikipedia.org <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> [image: wikipedia.png]
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> On Feb 17, 2023, at 11:52 AM, Mark Hieber <hi...@gmail.com> wrote:
>>
>> We have a cluster of hosts running Solr 8.4 Each host has an application
>> which listens to an external source for updated documents. When it gets a
>> document we care about, it indexes that document into the correct Solr
>> core
>> (we are not running cloud).
>>
>> In our API service, when we get a request to put this type of document, we
>> first query Solr to see if the document exists. If it does not, we then
>> create a new document in our database and the document is sent to the
>> application to be indexed into Solr. If the document we are trying to
>> *put* (in
>> the API Service) exists in Solr, then we throw an exception back to the
>> user if they have not specified the existing version (not the _version_
>> field from Solr, rather an increasing counter).
>>
>> As part of our write logic, after we put the document into the database,
>> we
>> query the Solr stack until we get the response containing the newly
>> written
>> document. So we know the document was written at this point.
>>
>> Some time later (maybe 5-10 minutes), we get another put request for the
>> same document id. We query Solr, and in some cases, we get no documents
>> returned, even though just before we actually found the document. The
>> document has not been deleted in the interim.
>>
>> We use the same query for both checking for existence at the beginning of
>> the logic, and for checking for eventual consistency after writing to the
>> database,
>>
>> I could add a retry to the first part of the logic (retry if we don't find
>> a document), but the question is why we don't find it the first time (but
>> for the second put).
>>
>> If I query for the document (using the same query), I find the document on
>> each host.
>>
>> Why are we not seeing documents which are actually there?
>>
>>
>>

Re: Solr Query not always returning correct results

Posted by Mark Hieber <hi...@gmail.com>.
I agree. However, what we are seeing is that a query *will return the
document* when we check for eventual consistency, but when we check some
time later (say 5 minutes), then we do not get the document returned. So it
got the document correctly, and returned the results, but then later it did
not return the result for the same query.

On Fri, Feb 17, 2023 at 4:32 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> Query the database to see whether the document is in the database.
>
> The problem happens when you don’t follow the design pattern “single
> source of truth”. Solr has a delayed version of the true state, so it will
> sometimes give wrong answers.
>
> Single source of truth
> <https://en.wikipedia.org/wiki/Single_source_of_truth>
> en.wikipedia.org <https://en.wikipedia.org/wiki/Single_source_of_truth>
> [image: wikipedia.png]
> <https://en.wikipedia.org/wiki/Single_source_of_truth>
> <https://en.wikipedia.org/wiki/Single_source_of_truth>
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> On Feb 17, 2023, at 11:52 AM, Mark Hieber <hi...@gmail.com> wrote:
>
> We have a cluster of hosts running Solr 8.4 Each host has an application
> which listens to an external source for updated documents. When it gets a
> document we care about, it indexes that document into the correct Solr core
> (we are not running cloud).
>
> In our API service, when we get a request to put this type of document, we
> first query Solr to see if the document exists. If it does not, we then
> create a new document in our database and the document is sent to the
> application to be indexed into Solr. If the document we are trying to
> *put* (in
> the API Service) exists in Solr, then we throw an exception back to the
> user if they have not specified the existing version (not the _version_
> field from Solr, rather an increasing counter).
>
> As part of our write logic, after we put the document into the database, we
> query the Solr stack until we get the response containing the newly written
> document. So we know the document was written at this point.
>
> Some time later (maybe 5-10 minutes), we get another put request for the
> same document id. We query Solr, and in some cases, we get no documents
> returned, even though just before we actually found the document. The
> document has not been deleted in the interim.
>
> We use the same query for both checking for existence at the beginning of
> the logic, and for checking for eventual consistency after writing to the
> database,
>
> I could add a retry to the first part of the logic (retry if we don't find
> a document), but the question is why we don't find it the first time (but
> for the second put).
>
> If I query for the document (using the same query), I find the document on
> each host.
>
> Why are we not seeing documents which are actually there?
>
>
>

Re: Solr Query not always returning correct results

Posted by Walter Underwood <wu...@wunderwood.org>.
Query the database to see whether the document is in the database.

The problem happens when you don’t follow the design pattern “single source of truth”. Solr has a delayed version of the true state, so it will sometimes give wrong answers.

https://en.wikipedia.org/wiki/Single_source_of_truth

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 17, 2023, at 11:52 AM, Mark Hieber <hi...@gmail.com> wrote:
> 
> We have a cluster of hosts running Solr 8.4 Each host has an application
> which listens to an external source for updated documents. When it gets a
> document we care about, it indexes that document into the correct Solr core
> (we are not running cloud).
> 
> In our API service, when we get a request to put this type of document, we
> first query Solr to see if the document exists. If it does not, we then
> create a new document in our database and the document is sent to the
> application to be indexed into Solr. If the document we are trying to *put* (in
> the API Service) exists in Solr, then we throw an exception back to the
> user if they have not specified the existing version (not the _version_
> field from Solr, rather an increasing counter).
> 
> As part of our write logic, after we put the document into the database, we
> query the Solr stack until we get the response containing the newly written
> document. So we know the document was written at this point.
> 
> Some time later (maybe 5-10 minutes), we get another put request for the
> same document id. We query Solr, and in some cases, we get no documents
> returned, even though just before we actually found the document. The
> document has not been deleted in the interim.
> 
> We use the same query for both checking for existence at the beginning of
> the logic, and for checking for eventual consistency after writing to the
> database,
> 
> I could add a retry to the first part of the logic (retry if we don't find
> a document), but the question is why we don't find it the first time (but
> for the second put).
> 
> If I query for the document (using the same query), I find the document on
> each host.
> 
> Why are we not seeing documents which are actually there?