You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kshitij tyagi <ks...@gmail.com> on 2018/01/08 11:05:50 UTC

In-place update vs Atomic updates

Hi,

What are the major differences between atomic and in-place updates, I have
gone through the documentation but it does not give detail internal
information.

1. Does doing in-place update prevents solr cache burst or not, what are
the benefits of using in-place updates?

I want to update one of the fields of the documnet but I do not want to
burst my cache.

What is the best approach to achieve the same.

Thanks,
Kshitij

Re: In-place update vs Atomic updates

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/8/2018 10:17 PM, kshitij tyagi wrote:
> 1. Does in place updates opens a new searcher by itself or not?
> 2. As the entire segment is rewriten, it means that frequent in place
> updates are expensive as each in place update will rewrite the entire
> segment again? Correct me here if my understanding is not correct.

Opening a new searcher is not related to the update.  It's something 
that happens at commit time, if the commit has openSearcher=true (which 
is the default setting).

In-place updates don't rewrite the entire segment, they only rewrite 
part of the docValues information for the segment -- only the portion 
for the fields that got updated.  The information is written into a new 
file, and the original file is untouched.

If there are multiple fields with docValues and not all of them are 
updated, then it would not be possible to delete the old file until the 
segment gets merged.  I am not sure about what happens if *every* field 
with docValues is eligible for in-place updates and all of them get 
updated.  If that were the case, then it would be possible to have an 
optimization that removes the old docValues file, but I have no idea 
whether Lucene actually has that as an optimization.  I would not expect 
most indexes to be eligible for the optimization even if Lucene can do it.

Yes, frequent in-place updates can be expensive, and can make the index 
larger, because the values in the updated field for every document in 
the segment will be written to a new file.  If you never optimize the 
index and mostly update recently added documents, then the segments 
involved will probably be small, and performance would be pretty good.

Thanks,
Shawn

Re: In-place update vs Atomic updates

Posted by kshitij tyagi <ks...@gmail.com>.
Hi Shawn,

Thanks for the information,

1. Does in place updates opens a new searcher by itself or not?
2. As the entire segment is rewriten, it means that frequent in place
updates are expensive as each in place update will rewrite the entire
segment again? Correct me here if my understanding is not correct.

Thanks,
Kshitij

On Mon, Jan 8, 2018 at 9:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/8/2018 4:05 AM, kshitij tyagi wrote:
>
>> What are the major differences between atomic and in-place updates, I have
>> gone through the documentation but it does not give detail internal
>> information.
>>
>
> Atomic updates are nearly identical to simple indexing, except that the
> existing document is read from the index to populate a new document along
> with whatever updates were requested, then the new document is indexed and
> the old one is deleted.
>
> 1. Does doing in-place update prevents solr cache burst or not, what are
>> the benefits of using in-place updates?
>>
>
> In-place updates are only possible on a field where only docValues is
> true.  The settings for things like indexed and stored must be false.
>
> An in-place update finds the segment containing the document and writes a
> whole new file containing the value of every document in the segment for
> the updated field.  If the segment contains ten million documents, then
> information for ten million values will be written for a single document
> update.
>
> I want to update one of the fields of the documnet but I do not want to
>> burst my cache.
>>
>
> When the index changes for ANY reason, no matter how the change is
> accomplished, caches must be thrown away when a new searcher is built.
> Lucene and Solr have no way of knowing that a change doesn't affect some
> cache entries, so the only thing it can do is assume that all the
> information in the cache is now invalid.  What you are asking for here is
> not possible at the moment, and chances are that if code was written to do
> it, that it would be far slower than simply invalidating the caches and
> doing autowarming.
>
> Thanks,
> Shawn
>

Re: In-place update vs Atomic updates

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/14/2020 12:21 PM, raj.yadav wrote:
> As per the above statement in atomic-update, it reindex the entire document
> and deletes the old one.
> But I was going through solr documentation regarding the  ( solr document
> update policy
> <https://lucene.apache.org/solr/guide/8_5/updating-parts-of-documents.html>
> ) and found these two contradicting statements:
> 
> 1. /The first is atomic updates. This approach allows changing only one or
> more fields of a document without having to reindex the entire document./

Here is how I would rewrite that paragraph to make it correct.  The 
asterisks represent bold text:

1. The first is atomic updates.  This approach allows the indexing 
request to contain *only* the desired changes, instead of the entire 
document.

> 2./In regular atomic updates, the entire document is reindexed internally
> during the application of the update. /

This is correct as written.

Thanks,
Shawn

Re: In-place update vs Atomic updates

Posted by "raj.yadav" <ra...@cse.ism.ac.in>.
Shawn Heisey-2 wrote
> Atomic updates are nearly identical to simple indexing, except that the 
> existing document is read from the index to populate a new document 
> along with whatever updates were requested, then the new document is 
> indexed and the old one is deleted.

As per the above statement in atomic-update, it reindex the entire document
and deletes the old one.
But I was going through solr documentation regarding the  ( solr document
update policy
<https://lucene.apache.org/solr/guide/8_5/updating-parts-of-documents.html> 
) and found these two contradicting statements:

1. /The first is atomic updates. This approach allows changing only one or
more fields of a document without having to reindex the entire document./

2./In regular atomic updates, the entire document is reindexed internally
during the application of the update. /

Is there something I'm missing here?

Regards,
Raj



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: In-place update vs Atomic updates

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/8/2018 4:05 AM, kshitij tyagi wrote:
> What are the major differences between atomic and in-place updates, I have
> gone through the documentation but it does not give detail internal
> information.

Atomic updates are nearly identical to simple indexing, except that the 
existing document is read from the index to populate a new document 
along with whatever updates were requested, then the new document is 
indexed and the old one is deleted.

> 1. Does doing in-place update prevents solr cache burst or not, what are
> the benefits of using in-place updates?

In-place updates are only possible on a field where only docValues is 
true.  The settings for things like indexed and stored must be false.

An in-place update finds the segment containing the document and writes 
a whole new file containing the value of every document in the segment 
for the updated field.  If the segment contains ten million documents, 
then information for ten million values will be written for a single 
document update.

> I want to update one of the fields of the documnet but I do not want to
> burst my cache.

When the index changes for ANY reason, no matter how the change is 
accomplished, caches must be thrown away when a new searcher is built. 
Lucene and Solr have no way of knowing that a change doesn't affect some 
cache entries, so the only thing it can do is assume that all the 
information in the cache is now invalid.  What you are asking for here 
is not possible at the moment, and chances are that if code was written 
to do it, that it would be far slower than simply invalidating the 
caches and doing autowarming.

Thanks,
Shawn