You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Yee <by...@wayfair.com> on 2018/02/12 16:02:04 UTC

DovValues and in-place udpates

I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.

Consider this scenario:
InStock = 1 for a product.
InStock changes to 0 which triggers a fast in-place update with docValues.
But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
The slow update finishes and overwrites InStock=0 which is incorrect.

How can we deal with this situation?

Re: DovValues and in-place udpates

Posted by Charlie Hull <ch...@flax.co.uk>.
On 12/02/2018 16:02, Brian Yee wrote:
> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
> 
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
> 
> How can we deal with this situation?
> 
It's a slightly crazy idea, but in the past we've solved a similar 
problem by building a custom Lucene codec that is backed by a Redis 
database. You change the stock value in Redis and Lucene doesn't 
actually notice and re-index.
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

Not sure if this is a better way than DocValues, it was quite a while 
ago and Lucene has moved on a bit since then....

Cheers

Charlie

-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

RE: DovValues and in-place udpates

Posted by Chris Hostetter <ho...@fucit.org>.
: True, I could remove the trigger to rebuild the entire document. But 
: what if a different field changes and the whole document is triggered 
: for update for a different field. We have the same problem.

at a high level, your concern is really compleltey orthoginal to the
question of in-place updates, it's a broader question of having 2 diff    
systems that might want to modify the same document in solr, but one
system is "slower" then the other (because it has to fetch more external  
data or only operates in batches, etc...)

This is where things like optimistic concurrency are really powerful.

When you trigger your "slow" updates (or any updates for that matter),
keep track of the current (aka "expected") _version_ field of the solr
document when your updater starts processing -- and pass that in along       
with the new update -- solr will reject an update if the specified
_version_ doesn't match what's in the index.

https://lucene.apache.org/solr/guide/updating-parts-of-documents.html#optimistic-concurrency

So imagine the current instock=1 version of your product is 42, and you
start a "slow" update to change the "name" field ... while that's in 
progress a "fast" update sets instock=0 and now you have a new 
_version_=666.  When the "slow" updater is done building up the entire 
document, and sends it to solr along with the _version_=42 assumption, 
solr will reject the update with a "Conflict (409)" HTTP Status, and your 
slow update code can say "ok ... i must have stale data, let's try again"



: 
: -----Original Message-----
: From: Erick Erickson [mailto:erickerickson@gmail.com] 
: Sent: Monday, February 12, 2018 11:17 AM
: To: solr-user <so...@lucene.apache.org>
: Subject: Re: DovValues and in-place udpates
: 
: "But it also triggers a slow update that will rebuild the entire document..."
: 
: Why do you think this? The whole _point_ of in-place updates is that they don't have to re-index the whole document.... And the only way to do that effectively would be if all the fields are stored, which is not a requirement for in-place updates.
: 
: Best,
: Erick
: 
: On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <by...@wayfair.com> wrote:
: > I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
: >
: > Consider this scenario:
: > InStock = 1 for a product.
: > InStock changes to 0 which triggers a fast in-place update with docValues.
: > But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
: > During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
: > The slow update finishes and overwrites InStock=0 which is incorrect.
: >
: > How can we deal with this situation?
: 

-Hoss
http://www.lucidworks.com/

RE: DovValues and in-place udpates

Posted by Brian Yee <by...@wayfair.com>.
True, I could remove the trigger to rebuild the entire document. But what if a different field changes and the whole document is triggered for update for a different field. We have the same problem.

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, February 12, 2018 11:17 AM
To: solr-user <so...@lucene.apache.org>
Subject: Re: DovValues and in-place udpates

"But it also triggers a slow update that will rebuild the entire document..."

Why do you think this? The whole _point_ of in-place updates is that they don't have to re-index the whole document.... And the only way to do that effectively would be if all the fields are stored, which is not a requirement for in-place updates.

Best,
Erick

On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <by...@wayfair.com> wrote:
> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
>
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
>
> How can we deal with this situation?

Re: DovValues and in-place udpates

Posted by Erick Erickson <er...@gmail.com>.
"But it also triggers a slow update that will rebuild the entire document..."

Why do you think this? The whole _point_ of in-place updates is that
they don't have to re-index the whole document.... And the only way to
do that effectively would be if all the fields are stored, which is
not a requirement for in-place updates.

Best,
Erick

On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <by...@wayfair.com> wrote:
> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
>
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
>
> How can we deal with this situation?