You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Srinivas Kashyap <sr...@bamboorose.com.INVALID> on 2020/10/29 04:49:52 UTC

Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued fields?

Thanks,
Srinivas
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Avoiding duplicate entry for a multivalued field

Posted by Munendra S N <sn...@gmail.com>.
add-distinct is similar to add but does contains check before adding the
value. In general, performance overhead should be minimal

Regards,
Munendra S N



On Fri, Oct 30, 2020 at 7:29 PM Srinivas Kashyap
<sr...@bamboorose.com.invalid> wrote:

> Thanks Munendra, this will really help me. Are there any performance
> overhead with this?
>
> Thanks,
> Srinivas
>
>
> From: Munendra S N <sn...@gmail.com>
> Sent: 30 October 2020 19:20
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
>
> Srinivas,
>
> For atomic updates, you could use add-distinct operation to avoid
> duplicates -
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<
> https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
> This operation is available from Solr 7.3
>
> Regards,
> Munendra S N
>
>
>
> On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood <wunder@wunderwood.org
> <ma...@wunderwood.org>>
> wrote:
>
> > Since you are already taking the performance hit of atomic updates,
> > I doubt you’ll see any impact from field types or update request
> > processors.
> > The extra cost of atomic updates will be much greater than indexing cost.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org<ma...@wunderwood.org>
> > http://observer.wunderwood.org/<http://observer.wunderwood.org> (my
> blog)
> >
> > > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap <srinivas@bamboorose.com
> .INVALID<ma...@bamboorose.com.INVALID>>
> > wrote:
> > >
> > > Thanks Dwane,
> > >
> > > I have a doubt, according to the java doc, the duplicates still
> continue
> > to exist in the field. May be during query time, the field returns only
> > unique values? Am I right with my assumption?
> > >
> > > And also, what is the performance overhead for this
> UniqueFiled*Factory?
> > >
> > > Thanks,
> > > Srinivas
> > >
> > > From: Dwane Hall <dw...@hotmail.com>>
> > > Sent: 29 October 2020 14:33
> > > To: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> > > Subject: Re: Avoiding duplicate entry for a multivalued field
> > >
> > > Srinivas this is possible by adding an unique field update processor to
> > the update processor chain you are using to perform your updates
> (/update,
> > /update/json, /update/json/docs, .../a_custom_one)
> > >
> > > The Java Documents explain its use nicely
> > > (
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >
> > <
> >
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> >>)
> > or there are articles on stack overflow addressing this exact problem (
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > <
> >
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >
> > >)
> > >
> > > Thanks,
> > >
> > > Dwane
> > > ________________________________
> > > From: Srinivas Kashyap <srinivas@bamboorose.com.INVALID<mailto:
> <mailto:srinivas@bamboorose.com.INVALID%3cmailto:%0b>>
> srinivas@bamboorose.com.INVALID<ma...@bamboorose.com.INVALID>>>
> > > Sent: Thursday, 29 October 2020 3:49 PM
> > > To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org
> <ma...@lucene.apache.org>>
> <
> > solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org<mailto:
> solr-user@lucene.apache.org%3cmailto:solr-user@lucene.apache.org>>>
> > > Subject: Avoiding duplicate entry for a multivalued field
> > >
> > > Hello,
> > >
> > > Say, I have a schema field which is multivalued. Is there a way to
> > maintain distinct values for that field though I continue to add
> duplicate
> > values through atomic update via solrj?
> > >
> > > Is there some property setting to have only unique values in a multi
> > valued fields?
> > >
> > > Thanks,
> > > Srinivas
> > > ________________________________
> > > DISCLAIMER:
> > > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > > If you are not the intended recipient, please notify the sender
> > immediately by replying to the e-mail, and then delete it without making
> > copies or using it in any way.
> > > No representation is made that this email or any attachments are free
> of
> > viruses. Virus scanning is recommended and is the responsibility of the
> > recipient.
> > >
> > > Disclaimer
> > >
> > > The information contained in this communication from the sender is
> > confidential. It is intended solely for use by the recipient and others
> > authorized to receive it. If you are not the recipient, you are hereby
> > notified that any disclosure, copying, distribution or taking action in
> > relation of the contents of this information is strictly prohibited and
> may
> > be unlawful.
> > >
> > > This email has been scanned for viruses and malware, and may have been
> > automatically archived by Mimecast Ltd, an innovator in Software as a
> > Service (SaaS) for business. Providing a safer and more useful place for
> > your human generated data. Specializing in; Security, archiving and
> > compliance. To find out more visit the Mimecast website.
> >
> >
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>

RE: Avoiding duplicate entry for a multivalued field

Posted by Srinivas Kashyap <sr...@bamboorose.com.INVALID>.
Thanks Munendra, this will really help me. Are there any performance overhead with this?

Thanks,
Srinivas


From: Munendra S N <sn...@gmail.com>
Sent: 30 October 2020 19:20
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html<https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html>
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood <wu...@wunderwood.org>>
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org<ma...@wunderwood.org>
> http://observer.wunderwood.org/<http://observer.wunderwood.org> (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap <sr...@bamboorose.com.INVALID>>
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall <dw...@hotmail.com>>
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org<ma...@lucene.apache.org>
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>
> >)
> >
> > Thanks,
> >
> > Dwane
> > ________________________________
> > From: Srinivas Kashyap <srinivas@bamboorose.com.INVALID<mailto:
<mailto:srinivas@bamboorose.com.INVALID%3cmailto:%0b>> srinivas@bamboorose.com.INVALID<ma...@bamboorose.com.INVALID>>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: solr-user@lucene.apache.org<ma...@lucene.apache.org>> <
> solr-user@lucene.apache.org<ma...@lucene.apache.org>>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > ________________________________
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>
>

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Avoiding duplicate entry for a multivalued field

Posted by Munendra S N <sn...@gmail.com>.
Srinivas,

For atomic updates, you could use add-distinct operation to avoid
duplicates -
https://lucene.apache.org/solr/guide/8_6/updating-parts-of-documents.html
This operation is available from Solr 7.3

Regards,
Munendra S N



On Thu, Oct 29, 2020 at 10:27 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> Since you are already taking the performance hit of atomic updates,
> I doubt you’ll see any impact from field types or update request
> processors.
> The extra cost of atomic updates will be much greater than indexing cost.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap <sr...@bamboorose.com.INVALID>
> wrote:
> >
> > Thanks Dwane,
> >
> > I have a doubt, according to the java doc, the duplicates still continue
> to exist in the field. May be during query time, the field returns only
> unique values? Am I right with my assumption?
> >
> > And also, what is the performance overhead for this UniqueFiled*Factory?
> >
> > Thanks,
> > Srinivas
> >
> > From: Dwane Hall <dw...@hotmail.com>
> > Sent: 29 October 2020 14:33
> > To: solr-user@lucene.apache.org
> > Subject: Re: Avoiding duplicate entry for a multivalued field
> >
> > Srinivas this is possible by adding an unique field update processor to
> the update processor chain you are using to perform your updates (/update,
> /update/json, /update/json/docs, .../a_custom_one)
> >
> > The Java Documents explain its use nicely
> > (
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html
> <
> https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>)
> or there are articles on stack overflow addressing this exact problem (
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> <
> https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655
> >)
> >
> > Thanks,
> >
> > Dwane
> > ________________________________
> > From: Srinivas Kashyap <srinivas@bamboorose.com.INVALID<mailto:
> srinivas@bamboorose.com.INVALID>>
> > Sent: Thursday, 29 October 2020 3:49 PM
> > To: solr-user@lucene.apache.org<ma...@lucene.apache.org> <
> solr-user@lucene.apache.org<ma...@lucene.apache.org>>
> > Subject: Avoiding duplicate entry for a multivalued field
> >
> > Hello,
> >
> > Say, I have a schema field which is multivalued. Is there a way to
> maintain distinct values for that field though I continue to add duplicate
> values through atomic update via solrj?
> >
> > Is there some property setting to have only unique values in a multi
> valued fields?
> >
> > Thanks,
> > Srinivas
> > ________________________________
> > DISCLAIMER:
> > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> > No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
> >
> > Disclaimer
> >
> > The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others
> authorized to receive it. If you are not the recipient, you are hereby
> notified that any disclosure, copying, distribution or taking action in
> relation of the contents of this information is strictly prohibited and may
> be unlawful.
> >
> > This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business. Providing a safer and more useful place for
> your human generated data. Specializing in; Security, archiving and
> compliance. To find out more visit the Mimecast website.
>
>

Re: Avoiding duplicate entry for a multivalued field

Posted by Walter Underwood <wu...@wunderwood.org>.
Since you are already taking the performance hit of atomic updates, 
I doubt you’ll see any impact from field types or update request processors.
The extra cost of atomic updates will be much greater than indexing cost.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 29, 2020, at 3:16 AM, Srinivas Kashyap <sr...@bamboorose.com.INVALID> wrote:
> 
> Thanks Dwane,
> 
> I have a doubt, according to the java doc, the duplicates still continue to exist in the field. May be during query time, the field returns only unique values? Am I right with my assumption?
> 
> And also, what is the performance overhead for this UniqueFiled*Factory?
> 
> Thanks,
> Srinivas
> 
> From: Dwane Hall <dw...@hotmail.com>
> Sent: 29 October 2020 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
> 
> Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one)
> 
> The Java Documents explain its use nicely
> (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)
> 
> Thanks,
> 
> Dwane
> ________________________________
> From: Srinivas Kashyap <sr...@bamboorose.com.INVALID>>
> Sent: Thursday, 29 October 2020 3:49 PM
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org> <so...@lucene.apache.org>>
> Subject: Avoiding duplicate entry for a multivalued field
> 
> Hello,
> 
> Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj?
> 
> Is there some property setting to have only unique values in a multi valued fields?
> 
> Thanks,
> Srinivas
> ________________________________
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
> No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
> 
> Disclaimer
> 
> The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
> 
> This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.


Re: Avoiding duplicate entry for a multivalued field

Posted by Michael Gibney <mi...@michaelgibney.net>.
If I understand correctly what you're trying to do, docValues for a
number of field types are (at least in their multivalued incarnation)
backed by SortedSetDocValues, which inherently deduplicate values
per-document. In your case it sounds like you could maybe rely on that
behavior as a feature, set stored=false, docValues=true,
useDocValuesAsStored=true, and achieve the desired behavior?
Michael

On Thu, Oct 29, 2020 at 6:17 AM Srinivas Kashyap
<sr...@bamboorose.com.invalid> wrote:
>
> Thanks Dwane,
>
> I have a doubt, according to the java doc, the duplicates still continue to exist in the field. May be during query time, the field returns only unique values? Am I right with my assumption?
>
> And also, what is the performance overhead for this UniqueFiled*Factory?
>
> Thanks,
> Srinivas
>
> From: Dwane Hall <dw...@hotmail.com>
> Sent: 29 October 2020 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Avoiding duplicate entry for a multivalued field
>
> Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one)
>
> The Java Documents explain its use nicely
> (https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)
>
> Thanks,
>
> Dwane
> ________________________________
> From: Srinivas Kashyap <sr...@bamboorose.com.INVALID>>
> Sent: Thursday, 29 October 2020 3:49 PM
> To: solr-user@lucene.apache.org<ma...@lucene.apache.org> <so...@lucene.apache.org>>
> Subject: Avoiding duplicate entry for a multivalued field
>
> Hello,
>
> Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj?
>
> Is there some property setting to have only unique values in a multi valued fields?
>
> Thanks,
> Srinivas
> ________________________________
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
> No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
>
> Disclaimer
>
> The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
>
> This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

RE: Avoiding duplicate entry for a multivalued field

Posted by Srinivas Kashyap <sr...@bamboorose.com.INVALID>.
Thanks Dwane,

I have a doubt, according to the java doc, the duplicates still continue to exist in the field. May be during query time, the field returns only unique values? Am I right with my assumption?

And also, what is the performance overhead for this UniqueFiled*Factory?

Thanks,
Srinivas

From: Dwane Hall <dw...@hotmail.com>
Sent: 29 October 2020 14:33
To: solr-user@lucene.apache.org
Subject: Re: Avoiding duplicate entry for a multivalued field

Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one)

The Java Documents explain its use nicely
(https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html<https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html>) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655<https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655>)

Thanks,

Dwane
________________________________
From: Srinivas Kashyap <sr...@bamboorose.com.INVALID>>
Sent: Thursday, 29 October 2020 3:49 PM
To: solr-user@lucene.apache.org<ma...@lucene.apache.org> <so...@lucene.apache.org>>
Subject: Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued fields?

Thanks,
Srinivas
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.

Re: Avoiding duplicate entry for a multivalued field

Posted by Dwane Hall <dw...@hotmail.com>.
Srinivas this is possible by adding an unique field update processor to the update processor chain you are using to perform your updates (/update, /update/json, /update/json/docs, .../a_custom_one)

The Java Documents explain its use nicely
(https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html) or there are articles on stack overflow addressing this exact problem (https://stackoverflow.com/questions/37005747/how-to-remove-duplicates-from-multivalued-fields-in-solr#37006655)

Thanks,

Dwane
________________________________
From: Srinivas Kashyap <sr...@bamboorose.com.INVALID>
Sent: Thursday, 29 October 2020 3:49 PM
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Avoiding duplicate entry for a multivalued field

Hello,

Say, I have a schema field which is multivalued. Is there a way to maintain distinct values for that field though I continue to add duplicate values through atomic update via solrj?

Is there some property setting to have only unique values in a multi valued fields?

Thanks,
Srinivas
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way.
No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.