You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nikhil Kumar <ni...@hashedin.com> on 2013/05/06 10:14:15 UTC

solr adding unique values

Hey,
   I have recently started using solr, I have a list of users, which are
subscribed to some lists.
eg.
user a[
    id:a
    liists[
     list_a
   ]
]
user b[
   id:b
    liists[
     list_a
   ]
]
I am using {"id": a, "lists":{"add":"list_a"}} to add particular list a
user.
but what is happening if I use the same command again, it again adds the
same list, which i want to avoid.
user a[
    id:a
    liists[
     list_a,
     list_a
   ]
]
I searched the documentation and tutorials, i found

   -

   overwrite = "true" | "false" — default is "true", meaning newer
   documents will replace previously added documents with the same uniqueKey.
   -

   commitWithin = "(milliseconds)" if the "commitWithin" attribute is
   present, the document will be added within that time. [image: <!>]
   Solr1.4 <http://wiki.apache.org/solr/Solr1.4>. See
CommitWithin<http://wiki.apache.org/solr/CommitWithin>
   -

   (deprecated) allowDups = "true" | "false" — default is "false"
   -

   (deprecated) overwritePending = "true" | "false" — default is negation
   of allowDups
   -

   (deprecated) overwriteCommitted = "true"|"false" — default is negation
   of allowDups


   but using overwrite and allowDups didn't solve the problem either, seems
   because there is no unique id but just value.

   So the question is how to solve this problem?

-- 
Thank You and Regards,
Nikhil Kumar
+91-9916343619
Technical Analyst
Hashed In Technologies Pvt. Ltd.

Re: solr adding unique values

Posted by Erick Erickson <er...@gmail.com>.
Depends on your goal here. I'm guessing you're using
atomic updates, in which case you need to use "set"
rather than "add" as the former replaces the contents.
See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

If you're simply re-indexing the documents, just send the entire
fresh document to solr and it'll replace the earlier document
completely.

Best
Erick

On Mon, May 6, 2013 at 4:14 AM, Nikhil Kumar <ni...@hashedin.com> wrote:
> Hey,
>    I have recently started using solr, I have a list of users, which are
> subscribed to some lists.
> eg.
> user a[
>     id:a
>     liists[
>      list_a
>    ]
> ]
> user b[
>    id:b
>     liists[
>      list_a
>    ]
> ]
> I am using {"id": a, "lists":{"add":"list_a"}} to add particular list a
> user.
> but what is happening if I use the same command again, it again adds the
> same list, which i want to avoid.
> user a[
>     id:a
>     liists[
>      list_a,
>      list_a
>    ]
> ]
> I searched the documentation and tutorials, i found
>
>    -
>
>    overwrite = "true" | "false" — default is "true", meaning newer
>    documents will replace previously added documents with the same uniqueKey.
>    -
>
>    commitWithin = "(milliseconds)" if the "commitWithin" attribute is
>    present, the document will be added within that time. [image: <!>]
>    Solr1.4 <http://wiki.apache.org/solr/Solr1.4>. See
> CommitWithin<http://wiki.apache.org/solr/CommitWithin>
>    -
>
>    (deprecated) allowDups = "true" | "false" — default is "false"
>    -
>
>    (deprecated) overwritePending = "true" | "false" — default is negation
>    of allowDups
>    -
>
>    (deprecated) overwriteCommitted = "true"|"false" — default is negation
>    of allowDups
>
>
>    but using overwrite and allowDups didn't solve the problem either, seems
>    because there is no unique id but just value.
>
>    So the question is how to solve this problem?
>
> --
> Thank You and Regards,
> Nikhil Kumar
> +91-9916343619
> Technical Analyst
> Hashed In Technologies Pvt. Ltd.

Re: solr adding unique values

Posted by Nikhil Kumar <ni...@hashedin.com>.
Thanks Erick,
   I had a look on
deduplication<http://docs.lucidworks.com/display/solr/De-Duplication>
 .
I added :
     <updateRequestProcessorChain name="dedupe">
       <processor class="solr.processor.SignatureUpdateProcessorFactory">
         <bool name="enabled">true</bool>
         <str name="signatureField">listed_id</str>
         <bool name="overwriteDupes">true</bool>
         <str name="fields">listed</str>
         <str name="signatureClass">solr.processor.Lookup3Signature</str>
       </processor>
       <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>

 <requestHandler name="/update" class="solr.UpdateRequestHandler" >
    <lst name="defaults">
      <str name="update.chain">dedupe</str>
    </lst>
  </requestHandler>

in solrconfig.xml and i added

  <field name="listed" type="comaSplit" indexed="true" stored="true"
multiValued="true"/>
  <field name="listed_id" type="comaSplit" indexed="true" stored="true"
multiValued="true"/>

in schema.xml. Should i be achieve it in this way, because i could not?
Or should i use a different approach?


On Tue, May 7, 2013 at 10:59 PM, Erick Erickson <er...@gmail.com>wrote:

> Ah. OK. There's no "dedupe values" that I know of, I think you'd need to
> implement that yourself by fetching the field in question and doing a "set"
> on the field.
>
> You might be able to do that better in a custom update handler.
>
> Best
> Erick
>
>
> On Tue, May 7, 2013 at 6:54 AM, Nikhil Kumar <ni...@hashedin.com>wrote:
>
>> Thanks Erik,
>>  For the reply ! I know about 'set' but that's not my goal, i had to give
>> a better example.
>> I want this and if i have to add another list_c
>> user a[
>>     id:a
>>     liists[
>>      list_a,
>>      list_b
>>    ]
>> ]
>> It Should look like:
>>  user a[
>>     id:a
>>     liists[
>>      list_a,
>>      list_b,
>>      list_c
>>    ]
>> ]
>> However if i again add list_a, it should *not* be:
>> user a[
>>     id:a
>>     liists[
>>      list_a,
>>      list_b,
>>      list_c,
>>      list_a,
>>    ]
>> ]
>> I am *not* reindexing the documents.
>>
>> Depends on your goal here. I'm guessing you're using
>> atomic updates, in which case you need to use "set"
>> rather than "add" as the former replaces the contents.
>> See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example
>>
>> If you're simply re-indexing the documents, just send the entire
>> fresh document to solr and it'll replace the earlier document
>> completely.
>>
>> Best
>> Erick
>>
>>
>> On Mon, May 6, 2013 at 1:44 PM, Nikhil Kumar <ni...@hashedin.com>wrote:
>>
>>> Hey,
>>>    I have recently started using solr, I have a list of users, which are
>>> subscribed to some lists.
>>> eg.
>>> user a[
>>>     id:a
>>>     liists[
>>>      list_a
>>>    ]
>>> ]
>>> user b[
>>>     id:b
>>>     liists[
>>>      list_a
>>>    ]
>>> ]
>>> I am using {"id": a, "lists":{"add":"list_a"}} to add particular list a
>>> user.
>>> but what is happening if I use the same command again, it again adds the
>>> same list, which i want to avoid.
>>> user a[
>>>     id:a
>>>     liists[
>>>      list_a,
>>>      list_a
>>>    ]
>>> ]
>>> I searched the documentation and tutorials, i found
>>>
>>>    -
>>>
>>>    overwrite = "true" | "false" — default is "true", meaning newer
>>>    documents will replace previously added documents with the same uniqueKey.
>>>    -
>>>
>>>    commitWithin = "(milliseconds)" if the "commitWithin" attribute is
>>>    present, the document will be added within that time. [image: <!>]
>>>    Solr1.4 <http://wiki.apache.org/solr/Solr1.4>. See CommitWithin<http://wiki.apache.org/solr/CommitWithin>
>>>    -
>>>
>>>    (deprecated) allowDups = "true" | "false" — default is "false"
>>>    -
>>>
>>>    (deprecated) overwritePending = "true" | "false" — default is
>>>    negation of allowDups
>>>    -
>>>
>>>    (deprecated) overwriteCommitted = "true"|"false" — default is
>>>    negation of allowDups
>>>
>>>
>>>    but using overwrite and allowDups didn't solve the problem either,
>>>    seems because there is no unique id but just value.
>>>
>>>    So the question is how to solve this problem?
>>>
>>> --
>>> Thank You and Regards,
>>> Nikhil Kumar
>>> +91-9916343619
>>> Technical Analyst
>>> Hashed In Technologies Pvt. Ltd.
>>>
>>
>>
>>
>> --
>> Thank You and Regards,
>> Nikhil Kumar
>>  +91-9916343619
>> Technical Analyst
>> Hashed In Technologies Pvt. Ltd.
>>
>
>


-- 
Thank You and Regards,
Nikhil Kumar
+91-9916343619
Technical Analyst
Hashed In Technologies Pvt. Ltd.

Re: solr adding unique values

Posted by Nikhil Kumar <ni...@hashedin.com>.
Thanks Erik,
 For the reply ! I know about 'set' but that's not my goal, i had to give a
better example.
I want this and if i have to add another list_c
user a[
    id:a
    liists[
     list_a,
     list_b
   ]
]
It Should look like:
user a[
    id:a
    liists[
     list_a,
     list_b,
     list_c
   ]
]
However if i again add list_a, it should *not* be:
user a[
    id:a
    liists[
     list_a,
     list_b,
     list_c,
     list_a,
   ]
]
I am *not* reindexing the documents.

Depends on your goal here. I'm guessing you're using
atomic updates, in which case you need to use "set"
rather than "add" as the former replaces the contents.
See: http://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example

If you're simply re-indexing the documents, just send the entire
fresh document to solr and it'll replace the earlier document
completely.

Best
Erick


On Mon, May 6, 2013 at 1:44 PM, Nikhil Kumar <ni...@hashedin.com>wrote:

> Hey,
>    I have recently started using solr, I have a list of users, which are
> subscribed to some lists.
> eg.
> user a[
>     id:a
>     liists[
>      list_a
>    ]
> ]
> user b[
>    id:b
>     liists[
>      list_a
>    ]
> ]
> I am using {"id": a, "lists":{"add":"list_a"}} to add particular list a
> user.
> but what is happening if I use the same command again, it again adds the
> same list, which i want to avoid.
> user a[
>     id:a
>     liists[
>      list_a,
>      list_a
>    ]
> ]
> I searched the documentation and tutorials, i found
>
>    -
>
>    overwrite = "true" | "false" — default is "true", meaning newer
>    documents will replace previously added documents with the same uniqueKey.
>    -
>
>    commitWithin = "(milliseconds)" if the "commitWithin" attribute is
>    present, the document will be added within that time. [image: <!>]
>    Solr1.4 <http://wiki.apache.org/solr/Solr1.4>. See CommitWithin<http://wiki.apache.org/solr/CommitWithin>
>    -
>
>    (deprecated) allowDups = "true" | "false" — default is "false"
>    -
>
>    (deprecated) overwritePending = "true" | "false" — default is negation
>    of allowDups
>    -
>
>    (deprecated) overwriteCommitted = "true"|"false" — default is negation
>    of allowDups
>
>
>    but using overwrite and allowDups didn't solve the problem either,
>    seems because there is no unique id but just value.
>
>    So the question is how to solve this problem?
>
> --
> Thank You and Regards,
> Nikhil Kumar
> +91-9916343619
> Technical Analyst
> Hashed In Technologies Pvt. Ltd.
>



-- 
Thank You and Regards,
Nikhil Kumar
+91-9916343619
Technical Analyst
Hashed In Technologies Pvt. Ltd.