You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anupam Bhattacharya <an...@gmail.com> on 2013/10/30 11:05:07 UTC

Atomic Updates in SOLR

I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam

Re: Atomic Updates in SOLR

Posted by Anshum Gupta <an...@anshumgupta.net>.
I am not sure if optimistic concurrency would help in deduplicating but
yes, as Shalin points out, you'll be able to spot issues with your client
code.




On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Perhaps you are running the update request more than once accidentally?
>
> Can you try using optimistic update with _version_ while sending the
> update? This way, if some part of your code is making a duplicate request
> then Solr would throw an error.
>
> See
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
>
> On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <anupamb82@gmail.com
> >wrote:
>
> > I am working on a offline tagging capability to tag records with a
> > thesaurus dictionary of key concepts. I am able to use the update="add"
> > option using xml and json update calls for a field to update specific
> > document field information. Although if I run the same atomic update
> query
> > twice then the multivalued string fields start showing duplicate value in
> > the multivalued field.
> > e.g. for a field name as tag at the initial it was having copper, iron,
> > steel
> > After running the atomic update query with <field name="tag"
> > update="add">steel</field> I will get the tag field values as following:
> > copper, iron, steel, steel. (Thus steel get added twice).
> > I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
> token
> > duplicate not multivalued field duplicates. Is there any updateProcessor
> to
> > stop the incoming duplicate value from indexing ?
> >
> > Thanks in advance for any help.
> >
> > Regards
> > Anupam
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Atomic Updates in SOLR

Posted by Anshum Gupta <an...@anshumgupta.net>.
Think it'll be a good thing to have.
I just created a JIRA for that.
https://issues.apache.org/jira/browse/SOLR-5403

Will try and get to it soon.


On Wed, Oct 30, 2013 at 4:28 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Ah I misread your email. You are actually sending the update twice and
> asking about how to dedup the multi-valued field values.
>
> No I don't think we have an update processor which can do that.
>
>
> On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
> > Perhaps you are running the update request more than once accidentally?
> >
> > Can you try using optimistic update with _version_ while sending the
> > update? This way, if some part of your code is making a duplicate request
> > then Solr would throw an error.
> >
> > See
> >
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
> >
> >
> > On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <
> anupamb82@gmail.com>wrote:
> >
> >> I am working on a offline tagging capability to tag records with a
> >> thesaurus dictionary of key concepts. I am able to use the update="add"
> >> option using xml and json update calls for a field to update specific
> >> document field information. Although if I run the same atomic update
> query
> >> twice then the multivalued string fields start showing duplicate value
> in
> >> the multivalued field.
> >> e.g. for a field name as tag at the initial it was having copper, iron,
> >> steel
> >> After running the atomic update query with <field name="tag"
> >> update="add">steel</field> I will get the tag field values as following:
> >> copper, iron, steel, steel. (Thus steel get added twice).
> >> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
> >> token
> >> duplicate not multivalued field duplicates. Is there any updateProcessor
> >> to
> >> stop the incoming duplicate value from indexing ?
> >>
> >> Thanks in advance for any help.
> >>
> >> Regards
> >> Anupam
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Atomic Updates in SOLR

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Ah I misread your email. You are actually sending the update twice and
asking about how to dedup the multi-valued field values.

No I don't think we have an update processor which can do that.


On Wed, Oct 30, 2013 at 4:18 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Perhaps you are running the update request more than once accidentally?
>
> Can you try using optimistic update with _version_ while sending the
> update? This way, if some part of your code is making a duplicate request
> then Solr would throw an error.
>
> See
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
>
> On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <an...@gmail.com>wrote:
>
>> I am working on a offline tagging capability to tag records with a
>> thesaurus dictionary of key concepts. I am able to use the update="add"
>> option using xml and json update calls for a field to update specific
>> document field information. Although if I run the same atomic update query
>> twice then the multivalued string fields start showing duplicate value in
>> the multivalued field.
>> e.g. for a field name as tag at the initial it was having copper, iron,
>> steel
>> After running the atomic update query with <field name="tag"
>> update="add">steel</field> I will get the tag field values as following:
>> copper, iron, steel, steel. (Thus steel get added twice).
>> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove
>> token
>> duplicate not multivalued field duplicates. Is there any updateProcessor
>> to
>> stop the incoming duplicate value from indexing ?
>>
>> Thanks in advance for any help.
>>
>> Regards
>> Anupam
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Atomic Updates in SOLR

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Perhaps you are running the update request more than once accidentally?

Can you try using optimistic update with _version_ while sending the
update? This way, if some part of your code is making a duplicate request
then Solr would throw an error.

See
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


On Wed, Oct 30, 2013 at 3:35 PM, Anupam Bhattacharya <an...@gmail.com>wrote:

> I am working on a offline tagging capability to tag records with a
> thesaurus dictionary of key concepts. I am able to use the update="add"
> option using xml and json update calls for a field to update specific
> document field information. Although if I run the same atomic update query
> twice then the multivalued string fields start showing duplicate value in
> the multivalued field.
> e.g. for a field name as tag at the initial it was having copper, iron,
> steel
> After running the atomic update query with <field name="tag"
> update="add">steel</field> I will get the tag field values as following:
> copper, iron, steel, steel. (Thus steel get added twice).
> I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
> duplicate not multivalued field duplicates. Is there any updateProcessor to
> stop the incoming duplicate value from indexing ?
>
> Thanks in advance for any help.
>
> Regards
> Anupam
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Atomic Updates in SOLR

Posted by Jack Krupansky <ja...@basetechnology.com>.
Oops... need to note that the parameters have changed since Solr 4.4 - I 
gave the link for 4.5.1, but for 4.4 and earlier, use:

http://lucene.eu.apache.org/solr/4_4_0/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

(My book is for 4.4, but hasn't been updated for 4.5 yet, but the gist of 
the examples is the same.)

-- Jack Krupansky

-----Original Message----- 
From: Jack Krupansky
Sent: Wednesday, October 30, 2013 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Atomic Updates in SOLR

Unfortunately, atomic "add" is add to a "list" (append) rather than add to a
"set" (only unique values). But, you can use the unique fields update
processor (solr.UniqFieldsUpdateProcessorFactory) to de-dupe specified
multivalued fields.

See:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

My e-book has more examples as well.

-- Jack Krupansky

-----Original Message----- 
From: Anupam Bhattacharya
Sent: Wednesday, October 30, 2013 6:05 AM
To: solr-user@lucene.apache.org
Subject: Atomic Updates in SOLR

I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam 


Re: Atomic Updates in SOLR

Posted by Jack Krupansky <ja...@basetechnology.com>.
Unfortunately, atomic "add" is add to a "list" (append) rather than add to a 
"set" (only unique values). But, you can use the unique fields update 
processor (solr.UniqFieldsUpdateProcessorFactory) to de-dupe specified 
multivalued fields.

See:
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html

My e-book has more examples as well.

-- Jack Krupansky

-----Original Message----- 
From: Anupam Bhattacharya
Sent: Wednesday, October 30, 2013 6:05 AM
To: solr-user@lucene.apache.org
Subject: Atomic Updates in SOLR

I am working on a offline tagging capability to tag records with a
thesaurus dictionary of key concepts. I am able to use the update="add"
option using xml and json update calls for a field to update specific
document field information. Although if I run the same atomic update query
twice then the multivalued string fields start showing duplicate value in
the multivalued field.
e.g. for a field name as tag at the initial it was having copper, iron,
steel
After running the atomic update query with <field name="tag"
update="add">steel</field> I will get the tag field values as following:
copper, iron, steel, steel. (Thus steel get added twice).
I looked at RemoveDuplicatesTokenFilterFactory but it helps to remove token
duplicate not multivalued field duplicates. Is there any updateProcessor to
stop the incoming duplicate value from indexing ?

Thanks in advance for any help.

Regards
Anupam