You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Siegert <jo...@marktjagd.de> on 2014/07/22 13:26:16 UTC
wrong docFreq while executing query based on uniqueKey-field
Hi.
My solr-index (version=4.7.2.) has an id-field:
<field name="id" type="string" indexed="true" stored="true"/>
...
<uniqueKey>id</uniqueKey>
The index will be updated once per hour.
I use the following query to retrieve some documents:
"q=id:2^2 id:1^1"
I would expect that the document(2) should be always before the
document(1). But after many index updates document(1) is before document(2).
With debug=true I could see the problem. The document(1) has a
docFreq=2, while the document(2) has a docFreq=1.
How could the docFreq of the uniqueKey-field be hight than 1? Could
anyone explain this behavior to me?
Thanks!
Johannes
Re: wrong docFreq while executing query based on uniqueKey-field
Posted by Jack Krupansky <ja...@basetechnology.com>.
Deleted documents remain in the Lucene index until an "optimize" or segment
merge operation removes them. As a result they are still counted in document
frequency. An update is a combination of a delete and an add of a fresh
document.
-- Jack Krupansky
-----Original Message-----
From: Johannes Siegert
Sent: Tuesday, July 22, 2014 7:26 AM
To: solr-user@lucene.apache.org
Subject: wrong docFreq while executing query based on uniqueKey-field
Hi.
My solr-index (version=4.7.2.) has an id-field:
<field name="id" type="string" indexed="true" stored="true"/>
...
<uniqueKey>id</uniqueKey>
The index will be updated once per hour.
I use the following query to retrieve some documents:
"q=id:2^2 id:1^1"
I would expect that the document(2) should be always before the
document(1). But after many index updates document(1) is before document(2).
With debug=true I could see the problem. The document(1) has a
docFreq=2, while the document(2) has a docFreq=1.
How could the docFreq of the uniqueKey-field be hight than 1? Could
anyone explain this behavior to me?
Thanks!
Johannes
Re: wrong docFreq while executing query based on uniqueKey-field
Posted by Apoorva Gaurav <ap...@myntra.com>.
I faced the same issue sometime back, root cause is docs getting deleted
and created again without getting optimized. Here is the discussion
http://www.signaldump.org/solr/qpod/22731/docfreq-coming-to-be-more-than-1-for-unique-id-field
On Tue, Jul 22, 2014 at 4:56 PM, Johannes Siegert <
johannes.siegert@marktjagd.de> wrote:
> Hi.
>
> My solr-index (version=4.7.2.) has an id-field:
>
> <field name="id" type="string" indexed="true" stored="true"/>
> ...
> <uniqueKey>id</uniqueKey>
>
> The index will be updated once per hour.
>
> I use the following query to retrieve some documents:
>
> "q=id:2^2 id:1^1"
>
> I would expect that the document(2) should be always before the
> document(1). But after many index updates document(1) is before document(2).
>
> With debug=true I could see the problem. The document(1) has a docFreq=2,
> while the document(2) has a docFreq=1.
>
> How could the docFreq of the uniqueKey-field be hight than 1? Could anyone
> explain this behavior to me?
>
> Thanks!
>
> Johannes
>
>
--
Thanks & Regards,
Apoorva