You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kshitij tyagi <ks...@gmail.com> on 2018/01/08 11:05:50 UTC
In-place update vs Atomic updates
Hi,
What are the major differences between atomic and in-place updates, I have
gone through the documentation but it does not give detail internal
information.
1. Does doing in-place update prevents solr cache burst or not, what are
the benefits of using in-place updates?
I want to update one of the fields of the documnet but I do not want to
burst my cache.
What is the best approach to achieve the same.
Thanks,
Kshitij
Re: In-place update vs Atomic updates
Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/8/2018 10:17 PM, kshitij tyagi wrote:
> 1. Does in place updates opens a new searcher by itself or not?
> 2. As the entire segment is rewriten, it means that frequent in place
> updates are expensive as each in place update will rewrite the entire
> segment again? Correct me here if my understanding is not correct.
Opening a new searcher is not related to the update. It's something
that happens at commit time, if the commit has openSearcher=true (which
is the default setting).
In-place updates don't rewrite the entire segment, they only rewrite
part of the docValues information for the segment -- only the portion
for the fields that got updated. The information is written into a new
file, and the original file is untouched.
If there are multiple fields with docValues and not all of them are
updated, then it would not be possible to delete the old file until the
segment gets merged. I am not sure about what happens if *every* field
with docValues is eligible for in-place updates and all of them get
updated. If that were the case, then it would be possible to have an
optimization that removes the old docValues file, but I have no idea
whether Lucene actually has that as an optimization. I would not expect
most indexes to be eligible for the optimization even if Lucene can do it.
Yes, frequent in-place updates can be expensive, and can make the index
larger, because the values in the updated field for every document in
the segment will be written to a new file. If you never optimize the
index and mostly update recently added documents, then the segments
involved will probably be small, and performance would be pretty good.
Thanks,
Shawn
Re: In-place update vs Atomic updates
Posted by kshitij tyagi <ks...@gmail.com>.
Hi Shawn,
Thanks for the information,
1. Does in place updates opens a new searcher by itself or not?
2. As the entire segment is rewriten, it means that frequent in place
updates are expensive as each in place update will rewrite the entire
segment again? Correct me here if my understanding is not correct.
Thanks,
Kshitij
On Mon, Jan 8, 2018 at 9:19 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 1/8/2018 4:05 AM, kshitij tyagi wrote:
>
>> What are the major differences between atomic and in-place updates, I have
>> gone through the documentation but it does not give detail internal
>> information.
>>
>
> Atomic updates are nearly identical to simple indexing, except that the
> existing document is read from the index to populate a new document along
> with whatever updates were requested, then the new document is indexed and
> the old one is deleted.
>
> 1. Does doing in-place update prevents solr cache burst or not, what are
>> the benefits of using in-place updates?
>>
>
> In-place updates are only possible on a field where only docValues is
> true. The settings for things like indexed and stored must be false.
>
> An in-place update finds the segment containing the document and writes a
> whole new file containing the value of every document in the segment for
> the updated field. If the segment contains ten million documents, then
> information for ten million values will be written for a single document
> update.
>
> I want to update one of the fields of the documnet but I do not want to
>> burst my cache.
>>
>
> When the index changes for ANY reason, no matter how the change is
> accomplished, caches must be thrown away when a new searcher is built.
> Lucene and Solr have no way of knowing that a change doesn't affect some
> cache entries, so the only thing it can do is assume that all the
> information in the cache is now invalid. What you are asking for here is
> not possible at the moment, and chances are that if code was written to do
> it, that it would be far slower than simply invalidating the caches and
> doing autowarming.
>
> Thanks,
> Shawn
>
Re: In-place update vs Atomic updates
Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/14/2020 12:21 PM, raj.yadav wrote:
> As per the above statement in atomic-update, it reindex the entire document
> and deletes the old one.
> But I was going through solr documentation regarding the ( solr document
> update policy
> <https://lucene.apache.org/solr/guide/8_5/updating-parts-of-documents.html>
> ) and found these two contradicting statements:
>
> 1. /The first is atomic updates. This approach allows changing only one or
> more fields of a document without having to reindex the entire document./
Here is how I would rewrite that paragraph to make it correct. The
asterisks represent bold text:
1. The first is atomic updates. This approach allows the indexing
request to contain *only* the desired changes, instead of the entire
document.
> 2./In regular atomic updates, the entire document is reindexed internally
> during the application of the update. /
This is correct as written.
Thanks,
Shawn
Re: In-place update vs Atomic updates
Posted by "raj.yadav" <ra...@cse.ism.ac.in>.
Shawn Heisey-2 wrote
> Atomic updates are nearly identical to simple indexing, except that the
> existing document is read from the index to populate a new document
> along with whatever updates were requested, then the new document is
> indexed and the old one is deleted.
As per the above statement in atomic-update, it reindex the entire document
and deletes the old one.
But I was going through solr documentation regarding the ( solr document
update policy
<https://lucene.apache.org/solr/guide/8_5/updating-parts-of-documents.html>
) and found these two contradicting statements:
1. /The first is atomic updates. This approach allows changing only one or
more fields of a document without having to reindex the entire document./
2./In regular atomic updates, the entire document is reindexed internally
during the application of the update. /
Is there something I'm missing here?
Regards,
Raj
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: In-place update vs Atomic updates
Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/8/2018 4:05 AM, kshitij tyagi wrote:
> What are the major differences between atomic and in-place updates, I have
> gone through the documentation but it does not give detail internal
> information.
Atomic updates are nearly identical to simple indexing, except that the
existing document is read from the index to populate a new document
along with whatever updates were requested, then the new document is
indexed and the old one is deleted.
> 1. Does doing in-place update prevents solr cache burst or not, what are
> the benefits of using in-place updates?
In-place updates are only possible on a field where only docValues is
true. The settings for things like indexed and stored must be false.
An in-place update finds the segment containing the document and writes
a whole new file containing the value of every document in the segment
for the updated field. If the segment contains ten million documents,
then information for ten million values will be written for a single
document update.
> I want to update one of the fields of the documnet but I do not want to
> burst my cache.
When the index changes for ANY reason, no matter how the change is
accomplished, caches must be thrown away when a new searcher is built.
Lucene and Solr have no way of knowing that a change doesn't affect some
cache entries, so the only thing it can do is assume that all the
information in the cache is now invalid. What you are asking for here
is not possible at the moment, and chances are that if code was written
to do it, that it would be far slower than simply invalidating the
caches and doing autowarming.
Thanks,
Shawn