You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by christopher palm <cp...@gmail.com> on 2014/01/25 19:21:53 UTC

How to handle multiple sub second updates to same SOLR Document

I have a scenario where the same SOLR document is being updated several
times within a few ms of each other due to how the source system is sending
in field updates on the document.

The problem I am trying to solve is that the order of these updates isn’t
guaranteed once the multi threaded SOLRJ client starts sending them to
SOLR, and older updates are overlaying the newer updates on the same
document.

I would like to use a timestamp versioning so that the older document
change won’t be sent into SOLR, but I didn’t see any automated way of doing
this based on the document timestamp.

Is there a good way to handle this scenario in SOLR 4.6?

It seems that we would have to be soft auto committing with a  subsecond
level as well, is that even possible?

Thanks,

Chris

Re: How to handle multiple sub second updates to same SOLR Document

Posted by Bram Van Dam <br...@intix.eu>.
On 01/25/2014 07:21 PM, christopher palm wrote:
> The problem I am trying to solve is that the order of these updates isn’t
> guaranteed once the multi threaded SOLRJ client starts sending them to
> SOLR, and older updates are overlaying the newer updates on the same
> document.

Don't do that. There is no way to guarantee what your updates will look 
like. We deal with this by keeping a list of document IDs that are 
currently being updated. If the ID is already in the list, do nothing. 
If it isn't, go ahead. Some synchronization required :-)

If you can think of a better way, I'd love to hear it, but I haven't 
found one.


Re: How to handle multiple sub second updates to same SOLR Document

Posted by Elisabeth Benoit <el...@gmail.com>.
yutz

Envoyé de mon iPhoneippj

Le 26 janv. 2014 à 06:13, Shalin Shekhar Mangar <sh...@gmail.com> a écrit :

> There is no timestamp versioning as such in Solr but there is a new
> document based versioning which will allow you to specify your own
> (externally assigned) versions.
> 
> See the "Document Centric Versioning Constraints" section at
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> 
> Sub-second soft auto commit can be expensive but it is hard to say if
> it will be too expensive for your use-case. You must benchmark it
> yourself.
> 
> On Sat, Jan 25, 2014 at 11:51 PM, christopher palm <cp...@gmail.com> wrote:
>> I have a scenario where the same SOLR document is being updated several
>> times within a few ms of each other due to how the source system is sending
>> in field updates on the document.
>> 
>> The problem I am trying to solve is that the order of these updates isn’t
>> guaranteed once the multi threaded SOLRJ client starts sending them to
>> SOLR, and older updates are overlaying the newer updates on the same
>> document.
>> 
>> I would like to use a timestamp versioning so that the older document
>> change won’t be sent into SOLR, but I didn’t see any automated way of doing
>> this based on the document timestamp.
>> 
>> Is there a good way to handle this scenario in SOLR 4.6?
>> 
>> It seems that we would have to be soft auto committing with a  subsecond
>> level as well, is that even possible?
>> 
>> Thanks,
>> 
>> Chris
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: How to handle multiple sub second updates to same SOLR Document

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
There is no timestamp versioning as such in Solr but there is a new
document based versioning which will allow you to specify your own
(externally assigned) versions.

See the "Document Centric Versioning Constraints" section at
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Sub-second soft auto commit can be expensive but it is hard to say if
it will be too expensive for your use-case. You must benchmark it
yourself.

On Sat, Jan 25, 2014 at 11:51 PM, christopher palm <cp...@gmail.com> wrote:
> I have a scenario where the same SOLR document is being updated several
> times within a few ms of each other due to how the source system is sending
> in field updates on the document.
>
> The problem I am trying to solve is that the order of these updates isn’t
> guaranteed once the multi threaded SOLRJ client starts sending them to
> SOLR, and older updates are overlaying the newer updates on the same
> document.
>
> I would like to use a timestamp versioning so that the older document
> change won’t be sent into SOLR, but I didn’t see any automated way of doing
> this based on the document timestamp.
>
> Is there a good way to handle this scenario in SOLR 4.6?
>
> It seems that we would have to be soft auto committing with a  subsecond
> level as well, is that even possible?
>
> Thanks,
>
> Chris



-- 
Regards,
Shalin Shekhar Mangar.