You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Kellerer <sp...@gmx.net> on 2010/10/15 15:14:03 UTC

Term is duplicated when updating a document

Hi,

we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling
SolrServer.add(SolrInputDocument) with the updated document.

My understanding is, that there is no other way of updating an existing document.


However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on.

One the index is optimized (by calling SolServer.optimize()) the count is correct again.

Am I missing something or is this a bug in Solr/Lucene?

Thanks in advance
Thomas


Re: Term is duplicated when updating a document

Posted by Thomas Kellerer <sp...@gmx.net>.
Thanks.

Not really the answer I wanted to hear, but at least I know this is not my fault ;)

Regards
Thomas

Erick Erickson, 15.10.2010 20:42:
> This is actually known behavior. The problem is that when you update
> a document, it's deleted and re-added, but the original is marked as
> deleted. However, the terms aren't touched, both the original and the new
> document's terms are counted. It'd be hard, very hard, to remove
> the terms from the inverted index efficiently.
>
> But when you optimize, all the deleted documents (and their assiociated
> terms) are physically removed from the files, thus your term counts change.
>
> HTH
> Erick
>
> On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer<sp...@gmx.net>wrote:
>
>> Thanks for the answer.
>>
>>
>>   Which fields are modified when the document is updated/replaced.
>>>
>>
>> Only one field was changed, but it was not the one where the auto-suggest
>> term is coming from.
>>
>>
>>   Are there any differences in the content of the fields that you are using
>>> for the AutoSuggest.
>>>
>> No
>>
>>
>>   Have you changed you schema.xml file recently? If you have, then there may
>>> have been changes in the way these fields are analyzed and broken down to
>>> terms.
>>>
>>
>> No, I did a complete index rebuild to rule out things like that.
>> Then after startup, did a search, then updated the document and did a
>> search again.
>>
>> Regards
>> Thomas
>>
>>
>>
>>> This may be a bug if you did not change the field or the schema file but
>>> the
>>> terms count is changing.
>>>
>>> On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer<sp...@gmx.net>
>>>   wrote:
>>>
>>>   Hi,
>>>>
>>>> we are updating our documents (that represent products in our shop) when
>>>> a
>>>> dealer modifies them, by calling
>>>> SolrServer.add(SolrInputDocument) with the updated document.
>>>>
>>>> My understanding is, that there is no other way of updating an existing
>>>> document.
>>>>
>>>>
>>>> However we also use a term query to autocomplete the search field for the
>>>> user, but each time adocument is updated (added) the term count is
>>>> incremented. So after starting with a new index the count is e.g. 1, then
>>>> the document (that contains that term) is updated, and the count is 2,
>>>> the
>>>> next update will set this to 3 and so on.
>>>>
>>>> One the index is optimized (by calling SolServer.optimize()) the count is
>>>> correct again.
>>>>
>>>> Am I missing something or is this a bug in Solr/Lucene?
>>>>
>>>> Thanks in advance
>>>> Thomas
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>



Re: Term is duplicated when updating a document

Posted by Erick Erickson <er...@gmail.com>.
This is actually known behavior. The problem is that when you update
a document, it's deleted and re-added, but the original is marked as
deleted. However, the terms aren't touched, both the original and the new
document's terms are counted. It'd be hard, very hard, to remove
the terms from the inverted index efficiently.

But when you optimize, all the deleted documents (and their assiociated
terms) are physically removed from the files, thus your term counts change.

HTH
Erick

On Fri, Oct 15, 2010 at 10:05 AM, Thomas Kellerer <sp...@gmx.net>wrote:

> Thanks for the answer.
>
>
>  Which fields are modified when the document is updated/replaced.
>>
>
> Only one field was changed, but it was not the one where the auto-suggest
> term is coming from.
>
>
>  Are there any differences in the content of the fields that you are using
>> for the AutoSuggest.
>>
> No
>
>
>  Have you changed you schema.xml file recently? If you have, then there may
>> have been changes in the way these fields are analyzed and broken down to
>> terms.
>>
>
> No, I did a complete index rebuild to rule out things like that.
> Then after startup, did a search, then updated the document and did a
> search again.
>
> Regards
> Thomas
>
>
>
>> This may be a bug if you did not change the field or the schema file but
>> the
>> terms count is changing.
>>
>> On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer<sp...@gmx.net>
>>  wrote:
>>
>>  Hi,
>>>
>>> we are updating our documents (that represent products in our shop) when
>>> a
>>> dealer modifies them, by calling
>>> SolrServer.add(SolrInputDocument) with the updated document.
>>>
>>> My understanding is, that there is no other way of updating an existing
>>> document.
>>>
>>>
>>> However we also use a term query to autocomplete the search field for the
>>> user, but each time adocument is updated (added) the term count is
>>> incremented. So after starting with a new index the count is e.g. 1, then
>>> the document (that contains that term) is updated, and the count is 2,
>>> the
>>> next update will set this to 3 and so on.
>>>
>>> One the index is optimized (by calling SolServer.optimize()) the count is
>>> correct again.
>>>
>>> Am I missing something or is this a bug in Solr/Lucene?
>>>
>>> Thanks in advance
>>> Thomas
>>>
>>>
>>>
>>
>>
>
>

Re: Term is duplicated when updating a document

Posted by Thomas Kellerer <sp...@gmx.net>.
Thanks for the answer.

> Which fields are modified when the document is updated/replaced.

Only one field was changed, but it was not the one where the auto-suggest term is coming from.

> Are there any differences in the content of the fields that you are using
> for the AutoSuggest.
No

> Have you changed you schema.xml file recently? If you have, then there may
> have been changes in the way these fields are analyzed and broken down to
> terms.

No, I did a complete index rebuild to rule out things like that.
Then after startup, did a search, then updated the document and did a search again.

Regards
Thomas
  
> This may be a bug if you did not change the field or the schema file but the
> terms count is changing.
>
> On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer<sp...@gmx.net>  wrote:
>
>> Hi,
>>
>> we are updating our documents (that represent products in our shop) when a
>> dealer modifies them, by calling
>> SolrServer.add(SolrInputDocument) with the updated document.
>>
>> My understanding is, that there is no other way of updating an existing
>> document.
>>
>>
>> However we also use a term query to autocomplete the search field for the
>> user, but each time adocument is updated (added) the term count is
>> incremented. So after starting with a new index the count is e.g. 1, then
>> the document (that contains that term) is updated, and the count is 2, the
>> next update will set this to 3 and so on.
>>
>> One the index is optimized (by calling SolServer.optimize()) the count is
>> correct again.
>>
>> Am I missing something or is this a bug in Solr/Lucene?
>>
>> Thanks in advance
>> Thomas
>>
>>
>
>



Re: Term is duplicated when updating a document

Posted by Israel Ekpo <is...@gmail.com>.
Which fields are modified when the document is updated/replaced.

Are there any differences in the content of the fields that you are using
for the AutoSuggest.

Have you changed you schema.xml file recently? If you have, then there may
have been changes in the way these fields are analyzed and broken down to
terms.

This may be a bug if you did not change the field or the schema file but the
terms count is changing.

On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer <sp...@gmx.net> wrote:

> Hi,
>
> we are updating our documents (that represent products in our shop) when a
> dealer modifies them, by calling
> SolrServer.add(SolrInputDocument) with the updated document.
>
> My understanding is, that there is no other way of updating an existing
> document.
>
>
> However we also use a term query to autocomplete the search field for the
> user, but each time adocument is updated (added) the term count is
> incremented. So after starting with a new index the count is e.g. 1, then
> the document (that contains that term) is updated, and the count is 2, the
> next update will set this to 3 and so on.
>
> One the index is optimized (by calling SolServer.optimize()) the count is
> correct again.
>
> Am I missing something or is this a bug in Solr/Lucene?
>
> Thanks in advance
> Thomas
>
>


-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/