You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bill Au <bi...@gmail.com> on 2014/07/08 23:30:50 UTC

Solr atomic updates question

Solr atomic update allows for changing only one or more fields of a
document without having to re-index the entire document.  But what about
the case where I am sending in the entire document?  In that case the whole
document will be re-indexed anyway, right?  So I assume that there will be
no saving.  I am actually thinking that there will be a performance penalty
since atomic update requires Solr to first retrieve all the fields first
before updating.

Bill

Re: Solr atomic updates question

Posted by Steve McKay <st...@b.abbies.us>.
Right. Without atomic updates, the client needs to fetch the document (or rebuild it from the system of record), apply changes, and send the entire document to Solr, including fields that haven't changed. With atomic updates, the client sends a list of changes to Solr and the server handles the read/modify/write steps internally. That's the closest Solr can get to updating a doc in place.

Steve

On Jul 8, 2014, at 10:42 PM, Bill Au <bi...@gmail.com> wrote:

> I see what you mean now.  Thanks for the example.  It makes things very
> clear.
> 
> I have been thinking about the explanation in the original response more.
> According to that, both regular update with entire doc and atomic update
> involves a delete by id followed by a add.  But both the Solr reference doc
> (
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
> says that:
> 
> "The first is *atomic updates*. This approach allows changing only one or
> more fields of a document without having to re-index the entire document."
> 
> But since Solr is doing a delete by id followed by a add, so "without
> having to re-index the entire document" apply to the client side only?  On
> the server side the add means that the entire document is re-indexed, right?
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay <st...@b.abbies.us> wrote:
> 
>> Take a look at this update XML:
>> 
>> <add>
>>  <doc>
>>    <field name="employeeId">05991</field>
>>    <field name="employeeName">Steve McKay</field>
>>    <field name="office" update="set">Walla Walla</field>
>>    <field name="skills" update="add">Python</field>
>>  </doc>
>> </add>
>> 
>> Let's say employeeId is the key. If there's a fourth field, salary, on the
>> existing doc, should it be deleted or retained? With this update it will
>> obviously be deleted:
>> 
>> <add>
>>  <doc>
>>    <field name="employeeId">05991</field>
>>    <field name="employeeName">Steve McKay</field>
>>  </doc>
>> </add>
>> 
>> With this XML it will be retained:
>> 
>> <add>
>>  <doc>
>>    <field name="employeeId">05991</field>
>>    <field name="office" update="set">Walla Walla</field>
>>    <field name="skills" update="add">Python</field>
>>  </doc>
>> </add>
>> 
>> I'm not willing to guess what will happen in the case where non-atomic and
>> atomic updates are present on the same add because I haven't looked at that
>> code since 4.0, but I think I could make a case for retaining salary or for
>> discarding it. That by itself reeks--and it's also not well documented.
>> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
>> time.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 7:02 PM, Bill Au <bi...@gmail.com> wrote:
>> 
>>> Thanks for that under-the-cover explanation.
>>> 
>>> I am not sure what you mean by "mix atomic updates with regular field
>>> values".  Can you give an example?
>>> 
>>> Thanks.
>>> 
>>> Bill
>>> 
>>> 
>>> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay <st...@b.abbies.us> wrote:
>>> 
>>>> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
>>>> fetched doc, then reindex. Whether you use atomic updates or send the
>>>> entire doc to Solr, it has to deleteById then add. The perf difference
>>>> between the atomic updates and "normal" updates is likely minimal.
>>>> 
>>>> Atomic updates are for when you have changes and want to apply them to a
>>>> document without affecting the other fields. A regular add will replace
>> an
>>>> existing document completely. AFAIK Solr will let you mix atomic updates
>>>> with regular field values, but I don't think it's a good idea.
>>>> 
>>>> Steve
>>>> 
>>>> On Jul 8, 2014, at 5:30 PM, Bill Au <bi...@gmail.com> wrote:
>>>> 
>>>>> Solr atomic update allows for changing only one or more fields of a
>>>>> document without having to re-index the entire document.  But what
>> about
>>>>> the case where I am sending in the entire document?  In that case the
>>>> whole
>>>>> document will be re-indexed anyway, right?  So I assume that there will
>>>> be
>>>>> no saving.  I am actually thinking that there will be a performance
>>>> penalty
>>>>> since atomic update requires Solr to first retrieve all the fields
>> first
>>>>> before updating.
>>>>> 
>>>>> Bill
>>>> 
>>>> 
>> 
>> 


Re: Solr atomic updates question

Posted by Bill Au <bi...@gmail.com>.
I see what you mean now.  Thanks for the example.  It makes things very
clear.

I have been thinking about the explanation in the original response more.
 According to that, both regular update with entire doc and atomic update
involves a delete by id followed by a add.  But both the Solr reference doc
(
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents)
says that:

"The first is *atomic updates*. This approach allows changing only one or
more fields of a document without having to re-index the entire document."

But since Solr is doing a delete by id followed by a add, so "without
having to re-index the entire document" apply to the client side only?  On
the server side the add means that the entire document is re-indexed, right?

Bill


On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay <st...@b.abbies.us> wrote:

> Take a look at this update XML:
>
> <add>
>   <doc>
>     <field name="employeeId">05991</field>
>     <field name="employeeName">Steve McKay</field>
>     <field name="office" update="set">Walla Walla</field>
>     <field name="skills" update="add">Python</field>
>   </doc>
> </add>
>
> Let's say employeeId is the key. If there's a fourth field, salary, on the
> existing doc, should it be deleted or retained? With this update it will
> obviously be deleted:
>
> <add>
>   <doc>
>     <field name="employeeId">05991</field>
>     <field name="employeeName">Steve McKay</field>
>   </doc>
> </add>
>
> With this XML it will be retained:
>
> <add>
>   <doc>
>     <field name="employeeId">05991</field>
>     <field name="office" update="set">Walla Walla</field>
>     <field name="skills" update="add">Python</field>
>   </doc>
> </add>
>
> I'm not willing to guess what will happen in the case where non-atomic and
> atomic updates are present on the same add because I haven't looked at that
> code since 4.0, but I think I could make a case for retaining salary or for
> discarding it. That by itself reeks--and it's also not well documented.
> Relying on iffy, poorly-documented behavior is asking for pain at upgrade
> time.
>
> Steve
>
> On Jul 8, 2014, at 7:02 PM, Bill Au <bi...@gmail.com> wrote:
>
> > Thanks for that under-the-cover explanation.
> >
> > I am not sure what you mean by "mix atomic updates with regular field
> > values".  Can you give an example?
> >
> > Thanks.
> >
> > Bill
> >
> >
> > On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay <st...@b.abbies.us> wrote:
> >
> >> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> >> fetched doc, then reindex. Whether you use atomic updates or send the
> >> entire doc to Solr, it has to deleteById then add. The perf difference
> >> between the atomic updates and "normal" updates is likely minimal.
> >>
> >> Atomic updates are for when you have changes and want to apply them to a
> >> document without affecting the other fields. A regular add will replace
> an
> >> existing document completely. AFAIK Solr will let you mix atomic updates
> >> with regular field values, but I don't think it's a good idea.
> >>
> >> Steve
> >>
> >> On Jul 8, 2014, at 5:30 PM, Bill Au <bi...@gmail.com> wrote:
> >>
> >>> Solr atomic update allows for changing only one or more fields of a
> >>> document without having to re-index the entire document.  But what
> about
> >>> the case where I am sending in the entire document?  In that case the
> >> whole
> >>> document will be re-indexed anyway, right?  So I assume that there will
> >> be
> >>> no saving.  I am actually thinking that there will be a performance
> >> penalty
> >>> since atomic update requires Solr to first retrieve all the fields
> first
> >>> before updating.
> >>>
> >>> Bill
> >>
> >>
>
>

Re: Solr atomic updates question

Posted by Steve McKay <st...@b.abbies.us>.
Take a look at this update XML:

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="employeeName">Steve McKay</field>
    <field name="office" update="set">Walla Walla</field>
    <field name="skills" update="add">Python</field>
  </doc>
</add>

Let's say employeeId is the key. If there's a fourth field, salary, on the existing doc, should it be deleted or retained? With this update it will obviously be deleted:

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="employeeName">Steve McKay</field>
  </doc>
</add>

With this XML it will be retained:

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="office" update="set">Walla Walla</field>
    <field name="skills" update="add">Python</field>
  </doc>
</add>

I'm not willing to guess what will happen in the case where non-atomic and atomic updates are present on the same add because I haven't looked at that code since 4.0, but I think I could make a case for retaining salary or for discarding it. That by itself reeks--and it's also not well documented. Relying on iffy, poorly-documented behavior is asking for pain at upgrade time.

Steve

On Jul 8, 2014, at 7:02 PM, Bill Au <bi...@gmail.com> wrote:

> Thanks for that under-the-cover explanation.
> 
> I am not sure what you mean by "mix atomic updates with regular field
> values".  Can you give an example?
> 
> Thanks.
> 
> Bill
> 
> 
> On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay <st...@b.abbies.us> wrote:
> 
>> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
>> fetched doc, then reindex. Whether you use atomic updates or send the
>> entire doc to Solr, it has to deleteById then add. The perf difference
>> between the atomic updates and "normal" updates is likely minimal.
>> 
>> Atomic updates are for when you have changes and want to apply them to a
>> document without affecting the other fields. A regular add will replace an
>> existing document completely. AFAIK Solr will let you mix atomic updates
>> with regular field values, but I don't think it's a good idea.
>> 
>> Steve
>> 
>> On Jul 8, 2014, at 5:30 PM, Bill Au <bi...@gmail.com> wrote:
>> 
>>> Solr atomic update allows for changing only one or more fields of a
>>> document without having to re-index the entire document.  But what about
>>> the case where I am sending in the entire document?  In that case the
>> whole
>>> document will be re-indexed anyway, right?  So I assume that there will
>> be
>>> no saving.  I am actually thinking that there will be a performance
>> penalty
>>> since atomic update requires Solr to first retrieve all the fields first
>>> before updating.
>>> 
>>> Bill
>> 
>> 


Re: Solr atomic updates question

Posted by Bill Au <bi...@gmail.com>.
Thanks for that under-the-cover explanation.

I am not sure what you mean by "mix atomic updates with regular field
values".  Can you give an example?

Thanks.

Bill


On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay <st...@b.abbies.us> wrote:

> Atomic updates fetch the doc with RealTimeGet, apply the updates to the
> fetched doc, then reindex. Whether you use atomic updates or send the
> entire doc to Solr, it has to deleteById then add. The perf difference
> between the atomic updates and "normal" updates is likely minimal.
>
> Atomic updates are for when you have changes and want to apply them to a
> document without affecting the other fields. A regular add will replace an
> existing document completely. AFAIK Solr will let you mix atomic updates
> with regular field values, but I don't think it's a good idea.
>
> Steve
>
> On Jul 8, 2014, at 5:30 PM, Bill Au <bi...@gmail.com> wrote:
>
> > Solr atomic update allows for changing only one or more fields of a
> > document without having to re-index the entire document.  But what about
> > the case where I am sending in the entire document?  In that case the
> whole
> > document will be re-indexed anyway, right?  So I assume that there will
> be
> > no saving.  I am actually thinking that there will be a performance
> penalty
> > since atomic update requires Solr to first retrieve all the fields first
> > before updating.
> >
> > Bill
>
>

Re: Solr atomic updates question

Posted by Steve McKay <st...@b.abbies.us>.
Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and "normal" updates is likely minimal.

Atomic updates are for when you have changes and want to apply them to a document without affecting the other fields. A regular add will replace an existing document completely. AFAIK Solr will let you mix atomic updates with regular field values, but I don't think it's a good idea.

Steve

On Jul 8, 2014, at 5:30 PM, Bill Au <bi...@gmail.com> wrote:

> Solr atomic update allows for changing only one or more fields of a
> document without having to re-index the entire document.  But what about
> the case where I am sending in the entire document?  In that case the whole
> document will be re-indexed anyway, right?  So I assume that there will be
> no saving.  I am actually thinking that there will be a performance penalty
> since atomic update requires Solr to first retrieve all the fields first
> before updating.
> 
> Bill