You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Siegert <jo...@marktjagd.de> on 2014/07/22 13:26:16 UTC

wrong docFreq while executing query based on uniqueKey-field

Hi.

My solr-index (version=4.7.2.) has an id-field:

<field  name="id"  type="string"  indexed="true"  stored="true"/>
...
<uniqueKey>id</uniqueKey>

The index will be updated once per hour.

I use the following query to retrieve some documents:

"q=id:2^2 id:1^1"

I would expect that the document(2) should be always before the 
document(1). But after many index updates document(1) is before document(2).

With debug=true I could see the problem. The document(1) has a 
docFreq=2, while the document(2) has a docFreq=1.

How could the docFreq of the uniqueKey-field be hight than 1? Could 
anyone explain this behavior to me?

Thanks!

Johannes


Re: wrong docFreq while executing query based on uniqueKey-field

Posted by Jack Krupansky <ja...@basetechnology.com>.
Deleted documents remain in the Lucene index until an "optimize" or segment 
merge operation removes them. As a result they are still counted in document 
frequency. An update is a combination of a delete and an add of a fresh 
document.

-- Jack Krupansky

-----Original Message----- 
From: Johannes Siegert
Sent: Tuesday, July 22, 2014 7:26 AM
To: solr-user@lucene.apache.org
Subject: wrong docFreq while executing query based on uniqueKey-field

Hi.

My solr-index (version=4.7.2.) has an id-field:

<field  name="id"  type="string"  indexed="true"  stored="true"/>
...
<uniqueKey>id</uniqueKey>

The index will be updated once per hour.

I use the following query to retrieve some documents:

"q=id:2^2 id:1^1"

I would expect that the document(2) should be always before the
document(1). But after many index updates document(1) is before document(2).

With debug=true I could see the problem. The document(1) has a
docFreq=2, while the document(2) has a docFreq=1.

How could the docFreq of the uniqueKey-field be hight than 1? Could
anyone explain this behavior to me?

Thanks!

Johannes


Re: wrong docFreq while executing query based on uniqueKey-field

Posted by Apoorva Gaurav <ap...@myntra.com>.
I faced the same issue sometime back, root cause is docs getting deleted
and created again without getting optimized. Here is the discussion
http://www.signaldump.org/solr/qpod/22731/docfreq-coming-to-be-more-than-1-for-unique-id-field


On Tue, Jul 22, 2014 at 4:56 PM, Johannes Siegert <
johannes.siegert@marktjagd.de> wrote:

> Hi.
>
> My solr-index (version=4.7.2.) has an id-field:
>
> <field  name="id"  type="string"  indexed="true"  stored="true"/>
> ...
> <uniqueKey>id</uniqueKey>
>
> The index will be updated once per hour.
>
> I use the following query to retrieve some documents:
>
> "q=id:2^2 id:1^1"
>
> I would expect that the document(2) should be always before the
> document(1). But after many index updates document(1) is before document(2).
>
> With debug=true I could see the problem. The document(1) has a docFreq=2,
> while the document(2) has a docFreq=1.
>
> How could the docFreq of the uniqueKey-field be hight than 1? Could anyone
> explain this behavior to me?
>
> Thanks!
>
> Johannes
>
>


-- 
Thanks & Regards,
Apoorva