You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2022/07/30 09:41:37 UTC

Solr update only if field differs

Hi all,

As far as I know it is not possible, but just to be sure I'm asking from
your experience, do you know if there is any way, on Solr side, to update a
document only if one or more fields differs from the stored document?

Best regards,
Vincenzo


-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Mikhail Khludnev <mk...@apache.org>.
Sorry, Vincenzo. Have no idea. Don't hesitate to post the answer if you
find it out.

On Tue, Aug 2, 2022 at 1:50 AM Vincenzo D'Amore <v....@gmail.com> wrote:

> Thanks for sharing this Mikhail.
> Do you know how big is the overhead for Solr in handling documents that do
> not have a new version?
> For example, we have to update ten thousand documents, but only 100 of them
> have a newer version.
> How does Solr behave?
>
> On Sun, Jul 31, 2022 at 2:16 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Hi, Vincenzo.
> > I can only remember version control via checking a particular field.
> >
> >
> https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#document-centric-versioning-constraints
> >
> > On Sun, Jul 31, 2022 at 2:52 AM Vincenzo D'Amore <v....@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > As far as I know it is not possible, but just to be sure I'm asking
> from
> > > your experience, do you know if there is any way, on Solr side, to
> > update a
> > > document only if one or more fields differs from the stored document?
> > >
> > > Best regards,
> > > Vincenzo
> > >
> > >
> > > --
> > > Vincenzo D'Amore
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
Thanks for sharing this Mikhail.
Do you know how big is the overhead for Solr in handling documents that do
not have a new version?
For example, we have to update ten thousand documents, but only 100 of them
have a newer version.
How does Solr behave?

On Sun, Jul 31, 2022 at 2:16 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Hi, Vincenzo.
> I can only remember version control via checking a particular field.
>
> https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#document-centric-versioning-constraints
>
> On Sun, Jul 31, 2022 at 2:52 AM Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
> > Hi all,
> >
> > As far as I know it is not possible, but just to be sure I'm asking from
> > your experience, do you know if there is any way, on Solr side, to
> update a
> > document only if one or more fields differs from the stored document?
> >
> > Best regards,
> > Vincenzo
> >
> >
> > --
> > Vincenzo D'Amore
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Mikhail Khludnev <mk...@apache.org>.
Hi, Vincenzo.
I can only remember version control via checking a particular field.
https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#document-centric-versioning-constraints

On Sun, Jul 31, 2022 at 2:52 AM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi all,
>
> As far as I know it is not possible, but just to be sure I'm asking from
> your experience, do you know if there is any way, on Solr side, to update a
> document only if one or more fields differs from the stored document?
>
> Best regards,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
Unfortunately in my architecture I cannot rely on a database and on a
updated/created
time field. There is a potentially infinite stream of documents with a
possible huge amount of duplication.
So avoid the indexing of the duplicate documents (I suppose) should improve
the performance.

On Fri, 5 Aug 2022 at 01:10, Dave <ha...@gmail.com> wrote:

> ——
>
> At this point it would be interesting to see how this Processor would
> increase the indexing performance when you have many duplicates
>
> - when it comes to indexing performance with duplicates, there isn’t any
> difference than a new document. It’s mark as original destroyed, and new
> one replaces.  Update isn’t a real thing, and the first operation is pretty
> much a joke speed wise and the second is as fast as indexing, and solr will
> manage the segments as needed when it determines to do so.  Your best bet
> is to manage this code wise. Have an updated/created time field and when
> indexing only run on those that fits your automated schedule against such
> fields.  In a database this takes like 5 minutes to write into your
> indexer, and I can promise you will be faster than trying to use a built in
> solr operation to figure it out for you.
>
> If I’m wrong I would love to know, but indexing code logic will always be
> faster than relying on a built in server function for these sorts of
> things.
>
>
>
>
>
> > On Aug 4, 2022, at 6:41 PM, Vincenzo D'Amore <v....@gmail.com> wrote:
> >
> >
> > At this point it would be interesting to see how this Processor would
> > increase the indexing performance when you have many duplicates
>
-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Dave <ha...@gmail.com>.
——

At this point it would be interesting to see how this Processor would
increase the indexing performance when you have many duplicates

- when it comes to indexing performance with duplicates, there isn’t any difference than a new document. It’s mark as original destroyed, and new one replaces.  Update isn’t a real thing, and the first operation is pretty much a joke speed wise and the second is as fast as indexing, and solr will manage the segments as needed when it determines to do so.  Your best bet is to manage this code wise. Have an updated/created time field and when indexing only run on those that fits your automated schedule against such fields.  In a database this takes like 5 minutes to write into your indexer, and I can promise you will be faster than trying to use a built in solr operation to figure it out for you. 

If I’m wrong I would love to know, but indexing code logic will always be faster than relying on a built in server function for these sorts of things.  





> On Aug 4, 2022, at 6:41 PM, Vincenzo D'Amore <v....@gmail.com> wrote:
> 
> 
> At this point it would be interesting to see how this Processor would
> increase the indexing performance when you have many duplicates

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Koji, thank you so much for the details.
At first glance, looking at Javadoc, I didn't realize two things: I can use
SignatureUpdateProcessorFactory on a signatureField different from the 'id'
and also, very important, that there was a “overwriteDupes” parameter.
In my current schema I cannot change the id field and there are also
another fields I need to take in account to calculate the document
signature.

Again, in my case I have to set overwriteDupes=“false”, but reading the
Solr guide I see a lot caveats when overwriteDupes=“true”. When there is
the needs to calculate a signature an still overwrite the document? This
should be a niche behavior.

At this point it would be interesting to see how this Processor would
increase the indexing performance when you have many duplicates.

I think this is the part of Solr Reference guide you were looking for:
https://solr.apache.org/guide/8_11/de-duplication.html
There is also a very useful example that explains how to implement
deduplication with all SolrCloud caveats (my case).

Thanks again for sharing this with me, best regards
Vincenzo

On Thu, 4 Aug 2022 at 08:31, Koji Sekiguchi <ko...@rondhuit.com>
wrote:

> Hi Vincenzo,
>
> I see. then I still think SignatureUpdateProcessorFactory is the one you
> are looking for.
> I tried to look for the explanation how it works in its javadoc and Solr
> Ref Guide, but no luck.
> Then I found the good one which was written by the contributor when
> SignatureUpdateProcessorFactory
> was contributed.
>
> Please read:
>
> Add support for hash based exact/near duplicate document handling
> https://issues.apache.org/jira/browse/SOLR-799
>
> Deduplication
> https://cwiki.apache.org/confluence/display/solr/Deduplication
>
> Koji
>
> On 2022/08/03 23:40, Vincenzo D'Amore wrote:
> > I mean, the problem I need to solve is how to avoid a second update when
> > there are no changes in the document, in other words to update a document
> > only if one or more fields differs from the stored document.
> >
> > On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi <
> koji.sekiguchi@rondhuit.com>
> > wrote:
> >
> >> Hi Vincenzo,
> >>
> >> I cannot understand what "the second update" means...
> >>
> >> Koji
> >>
> >> On 2022/08/02 0:39, Vincenzo D'Amore wrote:
> >>> Koji, on second thought, this SignatureUpdateProcessorFactory does not
> >>> avoid the second update...
> >>>
> >>> On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v....@gmail.com>
> >> wrote:
> >>>
> >>>> Hi Koji, thanks! It is exactly what I was looking for!
> >>>>
> >>>> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <
> >> koji.sekiguchi@rondhuit.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Vincenzo,
> >>>>>
> >>>>> I think SignatureUpdateProcessor is what you are looking for.
> >>>>>
> >>>>>
> >>>>>
> >>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
> >>>>>
> >>>>> Koji
> >>>>>
> >>>>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> As far as I know it is not possible, but just to be sure I'm asking
> >> from
> >>>>>> your experience, do you know if there is any way, on Solr side, to
> >>>>> update a
> >>>>>> document only if one or more fields differs from the stored
> document?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Vincenzo
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Vincenzo D'Amore
> >>>>
> >>>>
> >>>
> >>
> >
> >
>
-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Vincenzo,

I see. then I still think SignatureUpdateProcessorFactory is the one you are looking for.
I tried to look for the explanation how it works in its javadoc and Solr Ref Guide, but no luck.
Then I found the good one which was written by the contributor when SignatureUpdateProcessorFactory 
was contributed.

Please read:

Add support for hash based exact/near duplicate document handling
https://issues.apache.org/jira/browse/SOLR-799

Deduplication
https://cwiki.apache.org/confluence/display/solr/Deduplication

Koji

On 2022/08/03 23:40, Vincenzo D'Amore wrote:
> I mean, the problem I need to solve is how to avoid a second update when
> there are no changes in the document, in other words to update a document
> only if one or more fields differs from the stored document.
> 
> On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi <ko...@rondhuit.com>
> wrote:
> 
>> Hi Vincenzo,
>>
>> I cannot understand what "the second update" means...
>>
>> Koji
>>
>> On 2022/08/02 0:39, Vincenzo D'Amore wrote:
>>> Koji, on second thought, this SignatureUpdateProcessorFactory does not
>>> avoid the second update...
>>>
>>> On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v....@gmail.com>
>> wrote:
>>>
>>>> Hi Koji, thanks! It is exactly what I was looking for!
>>>>
>>>> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <
>> koji.sekiguchi@rondhuit.com>
>>>> wrote:
>>>>
>>>>> Hi Vincenzo,
>>>>>
>>>>> I think SignatureUpdateProcessor is what you are looking for.
>>>>>
>>>>>
>>>>>
>> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
>>>>>
>>>>> Koji
>>>>>
>>>>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> As far as I know it is not possible, but just to be sure I'm asking
>> from
>>>>>> your experience, do you know if there is any way, on Solr side, to
>>>>> update a
>>>>>> document only if one or more fields differs from the stored document?
>>>>>>
>>>>>> Best regards,
>>>>>> Vincenzo
>>>>>
>>>>
>>>>
>>>> --
>>>> Vincenzo D'Amore
>>>>
>>>>
>>>
>>
> 
> 

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
I mean, the problem I need to solve is how to avoid a second update when
there are no changes in the document, in other words to update a document
only if one or more fields differs from the stored document.

On Tue, Aug 2, 2022 at 6:16 AM Koji Sekiguchi <ko...@rondhuit.com>
wrote:

> Hi Vincenzo,
>
> I cannot understand what "the second update" means...
>
> Koji
>
> On 2022/08/02 0:39, Vincenzo D'Amore wrote:
> > Koji, on second thought, this SignatureUpdateProcessorFactory does not
> > avoid the second update...
> >
> > On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v....@gmail.com>
> wrote:
> >
> >> Hi Koji, thanks! It is exactly what I was looking for!
> >>
> >> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <
> koji.sekiguchi@rondhuit.com>
> >> wrote:
> >>
> >>> Hi Vincenzo,
> >>>
> >>> I think SignatureUpdateProcessor is what you are looking for.
> >>>
> >>>
> >>>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
> >>>
> >>> Koji
> >>>
> >>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
> >>>> Hi all,
> >>>>
> >>>> As far as I know it is not possible, but just to be sure I'm asking
> from
> >>>> your experience, do you know if there is any way, on Solr side, to
> >>> update a
> >>>> document only if one or more fields differs from the stored document?
> >>>>
> >>>> Best regards,
> >>>> Vincenzo
> >>>
> >>
> >>
> >> --
> >> Vincenzo D'Amore
> >>
> >>
> >
>


-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Vincenzo,

I cannot understand what "the second update" means...

Koji

On 2022/08/02 0:39, Vincenzo D'Amore wrote:
> Koji, on second thought, this SignatureUpdateProcessorFactory does not
> avoid the second update...
> 
> On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v....@gmail.com> wrote:
> 
>> Hi Koji, thanks! It is exactly what I was looking for!
>>
>> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <ko...@rondhuit.com>
>> wrote:
>>
>>> Hi Vincenzo,
>>>
>>> I think SignatureUpdateProcessor is what you are looking for.
>>>
>>>
>>> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
>>>
>>> Koji
>>>
>>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
>>>> Hi all,
>>>>
>>>> As far as I know it is not possible, but just to be sure I'm asking from
>>>> your experience, do you know if there is any way, on Solr side, to
>>> update a
>>>> document only if one or more fields differs from the stored document?
>>>>
>>>> Best regards,
>>>> Vincenzo
>>>
>>
>>
>> --
>> Vincenzo D'Amore
>>
>>
> 

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
Koji, on second thought, this SignatureUpdateProcessorFactory does not
avoid the second update...

On Mon, Aug 1, 2022 at 5:36 PM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi Koji, thanks! It is exactly what I was looking for!
>
> On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <ko...@rondhuit.com>
> wrote:
>
>> Hi Vincenzo,
>>
>> I think SignatureUpdateProcessor is what you are looking for.
>>
>>
>> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
>>
>> Koji
>>
>> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
>> > Hi all,
>> >
>> > As far as I know it is not possible, but just to be sure I'm asking from
>> > your experience, do you know if there is any way, on Solr side, to
>> update a
>> > document only if one or more fields differs from the stored document?
>> >
>> > Best regards,
>> > Vincenzo
>>
>
>
> --
> Vincenzo D'Amore
>
>

-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Koji, thanks! It is exactly what I was looking for!

On Mon, Aug 1, 2022 at 4:28 AM Koji Sekiguchi <ko...@rondhuit.com>
wrote:

> Hi Vincenzo,
>
> I think SignatureUpdateProcessor is what you are looking for.
>
>
> https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java
>
> Koji
>
> On 2022/07/30 18:41, Vincenzo D'Amore wrote:
> > Hi all,
> >
> > As far as I know it is not possible, but just to be sure I'm asking from
> > your experience, do you know if there is any way, on Solr side, to
> update a
> > document only if one or more fields differs from the stored document?
> >
> > Best regards,
> > Vincenzo
>


-- 
Vincenzo D'Amore

Re: Solr update only if field differs

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
Hi Vincenzo,

I think SignatureUpdateProcessor is what you are looking for.

https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.java

Koji

On 2022/07/30 18:41, Vincenzo D'Amore wrote:
> Hi all,
> 
> As far as I know it is not possible, but just to be sure I'm asking from
> your experience, do you know if there is any way, on Solr side, to update a
> document only if one or more fields differs from the stored document?
> 
> Best regards,
> Vincenzo