You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andreas Nilsson <an...@atex.com> on 2019/01/23 12:58:53 UTC

Adding and deleting documents in the same update request

Hi all,

I am updating a Solr Collection (Solr 7.3.1 in Cloud mode using SolrJ Java API) with requests that include both adding new documents as well as deleting existing ones (by query). The deletion part is meant to make sure any earlier revisions of the indexed source are deleted as part of the index update. This has worked well for a long time, but in some rare cases, there has been issues where the update process returns success, but the added document(s) are nowhere to be found in the collection.

After some investigation, I'm suspecting that there is an edge case where the delete query can actually overlap the documents added in the same update. Obviously the first suspect to look at here is the delete query, but I also had to start looking into what the documented semantics (if any) for the multi-command update API (JSON update command) actually are. I cannot find any documentation that seems to even touch on this subject.

I've looked through most of the online Solr documentation chapters (https://lucene.apache.org/solr/guide/7_3/), though only as an overview. The documentation detailing multi-operation JSON update requests (https://lucene.apache.org/solr/guide/7_3/uploading-data-with-index-handlers.html#solr-style-json - JSON Update Command) doesn't seem to have any details or even link to further reading. I've also read the javadoc for org.apache.solr.client.solrj.request.UpdateRequest (part of SolrJ).

Is there is a specific order in which operations in an update request will be executed? Is the order guaranteed for any of the possible operations (add, delete by id / query, optimize, commit) in a single update command? Since I cannot find any details, I have to assume it's undefined and that I should never rely on any order.

I suspect that the developers that did this part of our code either assumed it would always be performed in the same order or that the delete query could never overlap. Or perhaps it was just an oversight and we've been lucky so far.

Related: in the case where I cannot rely on the operations order in a single update request, is there a recommended way to do these kinds of updates "atomically" in a single request? Ideally, I obviously don't want the collection to be left in a state where the deletion has happened but not the additions or the other way around.

Thanks in advance,
Andreas


Re: Adding and deleting documents in the same update request

Posted by Luiz Armesto <lu...@gmail.com>.
You're correct. It' not a good idea mixing different operation types in the
same request. You can't rely on the operations order. There is a
presentation about SolrJ where they explain it:

https://youtu.be/ACPUR_GL5zM?t=1985



On Sun, Jan 27, 2019, 09:14 Andreas Nilsson <anilsson@atex.com wrote:

> Thanks for the suggestions, Shawn.
>
>
> Unfortunately in this case, I don't think there is a natural key to use as
> the primary key due to the requirements of having multiple versions of the
> source indexed at the same time.
>
>
> I have now found a way to tweak the delete query in order for it to not
> overlap the added documents. I will go with either that or sending the
> deletes as separate requests.
>
>
> Just as a clarification, however: am I correct to assume that the
> multi-update operations are executed in an undefined order and can fail
> partially when sent like this? It's my leading theory for a bug I am
> investigating at the moment, and seems very likely given what I've seen,
> but it's also very hard to reproduce.
>
>
> Regards,
>
> Andreas Nilsson
>
>
> ________________________________
> From: Shawn Heisey <ap...@elyograg.org>
> Sent: Wednesday, January 23, 2019 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding and deleting documents in the same update request
>
> On 1/23/2019 5:58 AM, Andreas Nilsson wrote:
> > Related: in the case where I cannot rely on the operations order in a
> single update request, is there a recommended way to do these kinds of
> updates "atomically" in a single request? Ideally, I obviously don't want
> the collection to be left in a state where the deletion has happened but
> not the additions or the other way around.
>
> Assuming that you have a uniqueKey field and that you are replacing an
> existing document, do not issue a delete for that document at all.  When
> you index a document with the same value in the uniqueKey field as an
> existing document, Solr will handle the delete of the existing document
> for you.
>
> When a uniqueKey is present, you should only issue delete commands for
> documents that will be permanently deleted.
>
> Alternatively, send deletes in their own request, separate from
> inserts.  If you take this route, wait for acknowledgement from the
> delete before sending the insert.
>
> Thanks,
> Shawn
>
>
On Jan 27, 2019 09:14, "Andreas Nilsson" <an...@atex.com> wrote:

Thanks for the suggestions, Shawn.


Unfortunately in this case, I don't think there is a natural key to use as
the primary key due to the requirements of having multiple versions of the
source indexed at the same time.


I have now found a way to tweak the delete query in order for it to not
overlap the added documents. I will go with either that or sending the
deletes as separate requests.


Just as a clarification, however: am I correct to assume that the
multi-update operations are executed in an undefined order and can fail
partially when sent like this? It's my leading theory for a bug I am
investigating at the moment, and seems very likely given what I've seen,
but it's also very hard to reproduce.


Regards,

Andreas Nilsson


________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Wednesday, January 23, 2019 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding and deleting documents in the same update request


On 1/23/2019 5:58 AM, Andreas Nilsson wrote:
> Related: in the case where I cannot rely on the operations order in a
single update request, is there a recommended way to do these kinds of
updates "atomically" in a single request? Ideally, I obviously don't want
the collection to be left in a state where the deletion has happened but
not the additions or the other way around.

Assuming that you have a uniqueKey field and that you are replacing an
existing document, do not issue a delete for that document at all.  When
you index a document with the same value in the uniqueKey field as an
existing document, Solr will handle the delete of the existing document
for you.

When a uniqueKey is present, you should only issue delete commands for
documents that will be permanently deleted.

Alternatively, send deletes in their own request, separate from
inserts.  If you take this route, wait for acknowledgement from the
delete before sending the insert.

Thanks,
Shawn

Re: Adding and deleting documents in the same update request

Posted by Andreas Nilsson <an...@atex.com>.
Thanks for the suggestions, Shawn.


Unfortunately in this case, I don't think there is a natural key to use as the primary key due to the requirements of having multiple versions of the source indexed at the same time.


I have now found a way to tweak the delete query in order for it to not overlap the added documents. I will go with either that or sending the deletes as separate requests.


Just as a clarification, however: am I correct to assume that the multi-update operations are executed in an undefined order and can fail partially when sent like this? It's my leading theory for a bug I am investigating at the moment, and seems very likely given what I've seen, but it's also very hard to reproduce.


Regards,

Andreas Nilsson


________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Wednesday, January 23, 2019 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding and deleting documents in the same update request

On 1/23/2019 5:58 AM, Andreas Nilsson wrote:
> Related: in the case where I cannot rely on the operations order in a single update request, is there a recommended way to do these kinds of updates "atomically" in a single request? Ideally, I obviously don't want the collection to be left in a state where the deletion has happened but not the additions or the other way around.

Assuming that you have a uniqueKey field and that you are replacing an
existing document, do not issue a delete for that document at all.  When
you index a document with the same value in the uniqueKey field as an
existing document, Solr will handle the delete of the existing document
for you.

When a uniqueKey is present, you should only issue delete commands for
documents that will be permanently deleted.

Alternatively, send deletes in their own request, separate from
inserts.  If you take this route, wait for acknowledgement from the
delete before sending the insert.

Thanks,
Shawn


Re: Adding and deleting documents in the same update request

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/23/2019 5:58 AM, Andreas Nilsson wrote:
> Related: in the case where I cannot rely on the operations order in a single update request, is there a recommended way to do these kinds of updates "atomically" in a single request? Ideally, I obviously don't want the collection to be left in a state where the deletion has happened but not the additions or the other way around.

Assuming that you have a uniqueKey field and that you are replacing an 
existing document, do not issue a delete for that document at all.  When 
you index a document with the same value in the uniqueKey field as an 
existing document, Solr will handle the delete of the existing document 
for you.

When a uniqueKey is present, you should only issue delete commands for 
documents that will be permanently deleted.

Alternatively, send deletes in their own request, separate from 
inserts.  If you take this route, wait for acknowledgement from the 
delete before sending the insert.

Thanks,
Shawn