You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thibaut Colar <tc...@colar.net> on 2011/10/28 19:04:55 UTC
Updating a document multi-value field (no dup values) without needed
it to be already committed
Sorry for the lengthy text, it's a bit difficult to explain:
We are using Solr to index some user info like username, email (among
other things).
I'm also trying to use facets for search, so for example, I added a
multi-value field to user called "organizations" where I would store the
name of the organizations that user work for.
So i can use that field for facetted search and be able to filter a user
search query result by the organizations this user work for.
So now, the issue I have is my code does something like: 1) Add users
documents to Solr 2) When a user is assigned an organization
membership(role), update the user doc to set the organizations field
Now I have the following issue with step 2: If I just do a
addField("organizations", "BigCorp") on the user doc, it will add that
value regardless if organizations already have that value("BigCorp") or
not, but I want each org name to appear only once.
So only way I found to get that behavior is to query the user document,
get the values of "organization" and only add the new value if it's not
already in there - if !userDoc.getValues("organiations").contains(value)
{... add the value to the doc and save it ...}-
Now that works well, but only if I commit all the time(between step 1 &
2 at least), because the document query will not work unless it has been
committed already. Obviously in theory its best not to commit all the
time performance-wise, and unpractical since I process those inserts in
batches.
*So I guess the main issue would be:*
*
Is there a way to update a multi-value field, without allowing
duplicates, that would not require querying the doc to manually
prevent duplicates ?
*
Maybe some better way to do this ?
Thanks.
Re: Updating a document multi-value field (no dup values) without
needed it to be already committed
Posted by Erick Erickson <er...@gmail.com>.
Before going too far down that path, let's check something.
I assume you're storing *all* the fields for each document,
right? Because unless you are, you'll lose data if you're reading
the document from Solr and then updating it.
When you fetch a document from Solr, only the *stored*
fields are returned, so the cycle is lossy.
If you have access to all of the original information from
the system-of-record, a reasonable approach is to re-get
all the original data and simply replace the entire document.
Best
Erick
On Fri, Oct 28, 2011 at 1:22 PM, Thibaut Colar <tc...@colar.net> wrote:
> Related questions is:
> Is there a way to update a doc to remove a specific value from a multi-value
> field (in my case remove a role)
>
> I manage to do that by querying the doc and reading all the other values
> "manually" then saving, but that has the same issues and is inefficient.
>
> On 10/28/11 10:04 AM, Thibaut Colar wrote:
>>
>> Sorry for the lengthy text, it's a bit difficult to explain:
>>
>> We are using Solr to index some user info like username, email (among
>> other things).
>>
>> I'm also trying to use facets for search, so for example, I added a
>> multi-value field to user called "organizations" where I would store the
>> name of the organizations that user work for.
>>
>> So i can use that field for facetted search and be able to filter a user
>> search query result by the organizations this user work for.
>>
>> So now, the issue I have is my code does something like: 1) Add users
>> documents to Solr 2) When a user is assigned an organization
>> membership(role), update the user doc to set the organizations field
>>
>> Now I have the following issue with step 2: If I just do a
>> addField("organizations", "BigCorp") on the user doc, it will add that value
>> regardless if organizations already have that value("BigCorp") or not, but I
>> want each org name to appear only once.
>>
>> So only way I found to get that behavior is to query the user document,
>> get the values of "organization" and only add the new value if it's not
>> already in there - if !userDoc.getValues("organiations").contains(value)
>> {... add the value to the doc and save it ...}-
>>
>> Now that works well, but only if I commit all the time(between step 1 & 2
>> at least), because the document query will not work unless it has been
>> committed already. Obviously in theory its best not to commit all the time
>> performance-wise, and unpractical since I process those inserts in batches.
>>
>> *So I guess the main issue would be:*
>>
>> *
>>
>> Is there a way to update a multi-value field, without allowing
>> duplicates, that would not require querying the doc to manually
>> prevent duplicates ?
>>
>> *
>>
>> Maybe some better way to do this ?
>>
>> Thanks.
>>
>>
>
>
Re: Updating a document multi-value field (no dup values) without
needed it to be already committed
Posted by Thibaut Colar <tc...@colar.net>.
Related questions is:
Is there a way to update a doc to remove a specific value from a
multi-value field (in my case remove a role)
I manage to do that by querying the doc and reading all the other values
"manually" then saving, but that has the same issues and is inefficient.
On 10/28/11 10:04 AM, Thibaut Colar wrote:
> Sorry for the lengthy text, it's a bit difficult to explain:
>
> We are using Solr to index some user info like username, email (among
> other things).
>
> I'm also trying to use facets for search, so for example, I added a
> multi-value field to user called "organizations" where I would store
> the name of the organizations that user work for.
>
> So i can use that field for facetted search and be able to filter a
> user search query result by the organizations this user work for.
>
> So now, the issue I have is my code does something like: 1) Add users
> documents to Solr 2) When a user is assigned an organization
> membership(role), update the user doc to set the organizations field
>
> Now I have the following issue with step 2: If I just do a
> addField("organizations", "BigCorp") on the user doc, it will add that
> value regardless if organizations already have that value("BigCorp")
> or not, but I want each org name to appear only once.
>
> So only way I found to get that behavior is to query the user
> document, get the values of "organization" and only add the new value
> if it's not already in there - if
> !userDoc.getValues("organiations").contains(value) {... add the value
> to the doc and save it ...}-
>
> Now that works well, but only if I commit all the time(between step 1
> & 2 at least), because the document query will not work unless it has
> been committed already. Obviously in theory its best not to commit all
> the time performance-wise, and unpractical since I process those
> inserts in batches.
>
> *So I guess the main issue would be:*
>
> *
>
> Is there a way to update a multi-value field, without allowing
> duplicates, that would not require querying the doc to manually
> prevent duplicates ?
>
> *
>
> Maybe some better way to do this ?
>
> Thanks.
>
>