You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thibaut Colar <tc...@colar.net> on 2011/10/28 19:04:55 UTC

Updating a document multi-value field (no dup values) without needed it to be already committed

Sorry for the lengthy text, it's a bit difficult to explain:

We are using Solr to index some user info like username, email (among 
other things).

I'm also trying to use facets for search, so for example, I added a 
multi-value field to user called "organizations" where I would store the 
name of the organizations that user work for.

So i can use that field for facetted search and be able to filter a user 
search query result by the organizations this user work for.

So now, the issue I have is my code does something like: 1) Add users 
documents to Solr 2) When a user is assigned an organization 
membership(role), update the user doc to set the organizations field

Now I have the following issue with step 2: If I just do a 
addField("organizations", "BigCorp") on the user doc, it will add that 
value regardless if organizations already have that value("BigCorp") or 
not, but I want each org name to appear only once.

So only way I found to get that behavior is to query the user document, 
get the values of "organization" and only add the new value if it's not 
already in there - if !userDoc.getValues("organiations").contains(value) 
{... add the value to the doc and save it ...}-

Now that works well, but only if I commit all the time(between step 1 & 
2 at least), because the document query will not work unless it has been 
committed already. Obviously in theory its best not to commit all the 
time performance-wise, and unpractical since I process those inserts in 
batches.

*So I guess the main issue would be:*

  *

    Is there a way to update a multi-value field, without allowing
    duplicates, that would not require querying the doc to manually
    prevent duplicates ?

  *

    Maybe some better way to do this ?

Thanks.


Re: Updating a document multi-value field (no dup values) without needed it to be already committed

Posted by Erick Erickson <er...@gmail.com>.
Before going too far down that path, let's check something.

I assume you're storing *all* the fields for each document,
right? Because unless you are, you'll lose data if you're reading
the document from Solr and then updating it.

When you fetch a document from Solr, only the *stored*
fields are returned, so the cycle is lossy.

If you have access to all of the original information from
the system-of-record, a reasonable approach is to re-get
all the original data and simply replace the entire document.

Best
Erick


On Fri, Oct 28, 2011 at 1:22 PM, Thibaut Colar <tc...@colar.net> wrote:
> Related questions is:
> Is there a way to update a doc to remove a specific value from a multi-value
> field (in my case remove a role)
>
> I manage to do that by querying the doc and reading all the other values
> "manually" then saving, but that has the same issues and is inefficient.
>
> On 10/28/11 10:04 AM, Thibaut Colar wrote:
>>
>> Sorry for the lengthy text, it's a bit difficult to explain:
>>
>> We are using Solr to index some user info like username, email (among
>> other things).
>>
>> I'm also trying to use facets for search, so for example, I added a
>> multi-value field to user called "organizations" where I would store the
>> name of the organizations that user work for.
>>
>> So i can use that field for facetted search and be able to filter a user
>> search query result by the organizations this user work for.
>>
>> So now, the issue I have is my code does something like: 1) Add users
>> documents to Solr 2) When a user is assigned an organization
>> membership(role), update the user doc to set the organizations field
>>
>> Now I have the following issue with step 2: If I just do a
>> addField("organizations", "BigCorp") on the user doc, it will add that value
>> regardless if organizations already have that value("BigCorp") or not, but I
>> want each org name to appear only once.
>>
>> So only way I found to get that behavior is to query the user document,
>> get the values of "organization" and only add the new value if it's not
>> already in there - if !userDoc.getValues("organiations").contains(value)
>> {... add the value to the doc and save it ...}-
>>
>> Now that works well, but only if I commit all the time(between step 1 & 2
>> at least), because the document query will not work unless it has been
>> committed already. Obviously in theory its best not to commit all the time
>> performance-wise, and unpractical since I process those inserts in batches.
>>
>> *So I guess the main issue would be:*
>>
>>  *
>>
>>   Is there a way to update a multi-value field, without allowing
>>   duplicates, that would not require querying the doc to manually
>>   prevent duplicates ?
>>
>>  *
>>
>>   Maybe some better way to do this ?
>>
>> Thanks.
>>
>>
>
>

Re: Updating a document multi-value field (no dup values) without needed it to be already committed

Posted by Thibaut Colar <tc...@colar.net>.
Related questions is:
Is there a way to update a doc to remove a specific value from a 
multi-value field (in my case remove a role)

I manage to do that by querying the doc and reading all the other values 
"manually" then saving, but that has the same issues and is inefficient.

On 10/28/11 10:04 AM, Thibaut Colar wrote:
> Sorry for the lengthy text, it's a bit difficult to explain:
>
> We are using Solr to index some user info like username, email (among 
> other things).
>
> I'm also trying to use facets for search, so for example, I added a 
> multi-value field to user called "organizations" where I would store 
> the name of the organizations that user work for.
>
> So i can use that field for facetted search and be able to filter a 
> user search query result by the organizations this user work for.
>
> So now, the issue I have is my code does something like: 1) Add users 
> documents to Solr 2) When a user is assigned an organization 
> membership(role), update the user doc to set the organizations field
>
> Now I have the following issue with step 2: If I just do a 
> addField("organizations", "BigCorp") on the user doc, it will add that 
> value regardless if organizations already have that value("BigCorp") 
> or not, but I want each org name to appear only once.
>
> So only way I found to get that behavior is to query the user 
> document, get the values of "organization" and only add the new value 
> if it's not already in there - if 
> !userDoc.getValues("organiations").contains(value) {... add the value 
> to the doc and save it ...}-
>
> Now that works well, but only if I commit all the time(between step 1 
> & 2 at least), because the document query will not work unless it has 
> been committed already. Obviously in theory its best not to commit all 
> the time performance-wise, and unpractical since I process those 
> inserts in batches.
>
> *So I guess the main issue would be:*
>
>  *
>
>    Is there a way to update a multi-value field, without allowing
>    duplicates, that would not require querying the doc to manually
>    prevent duplicates ?
>
>  *
>
>    Maybe some better way to do this ?
>
> Thanks.
>
>