You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Tang, Rebecca" <Re...@ucsf.edu> on 2014/12/19 19:22:01 UTC

Old facet value doesn't go away after index update

Hi there,

I have an index that has a field called collection_facet.

There was a value 'Ness Motley Law Firm Documents' that we wanted to update to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I re-indexed just the 36,132 records.  After the update, I ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see if the value got updated and I saw
Ness Motley Law Firm 36,132  -- as expected
Ness Motley Law Firm Documents 0 — Why is this value still here even though clearly there are no records with this value anymore?  I thought maybe it was cached, so I restarted solr, but I still got the same results.

"facet_fields": { "collection_facet": [
… "Ness Motley Law Firm", 36132,
… "Ness Motley Law Firm Documents", 0 ]



Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library<legacy.library.ucsf.edu/>
E: rebecca.tang@ucsf.edu

Re: Old facet value doesn't go away after index update

Posted by "Tang, Rebecca" <Re...@ucsf.edu>.
Thank you for the explanation!

Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.tang@ucsf.edu





On 12/19/14 12:37 PM, "Shawn Heisey" <ap...@elyograg.org> wrote:

>On 12/19/2014 11:22 AM, Tang, Rebecca wrote:
>> I have an index that has a field called collection_facet.
>>
>> There was a value 'Ness Motley Law Firm Documents' that we wanted to
>>update to 'Ness Motley Law Firm'.  There were 36,132 records with this
>>value.  So I re-indexed just the 36,132 records.  After the update, I
>>ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see
>>if the value got updated and I saw
>> Ness Motley Law Firm 36,132  -- as expected
>> Ness Motley Law Firm Documents 0 ‹ Why is this value still here even
>>though clearly there are no records with this value anymore?  I thought
>>maybe it was cached, so I restarted solr, but I still got the same
>>results.
>>
>> "facet_fields": { "collection_facet": [
>> Š "Ness Motley Law Firm", 36132,
>> Š "Ness Motley Law Firm Documents", 0 ]
>
>Updating a document in Solr is actually a delete of the old document
>followed by indexing a new version.
>
>When a document is deleted from an index, Lucene (the search API that
>Solr uses) does not actually remove that document from the index
>segment, it just writes an ID value to a file that tracks deletes.  That
>document is still in the index, and its terms are still present, but the
>software can remove it from any results when it sees that ID value in
>the delete tracking file(s).  Only a segment merge can eliminate the
>document and remove its terms from the inverted index.
>
>When you do a facet on that field, Lucene still sees "Ness Motley Law
>Firm Documents" in the inverted index, because nothing has actually
>removed it. The upper layers of Solr faceting code are aware that all
>the documents containing that term have been deleted, so it gets a
>correct document count of zero.
>
>To eliminate it from the results, you have two choices.  One is to set
>facet.mincount=1 as a parameter on your query, the other is to run an
>optimize (also known as a forceMerge down to one segment) on the index.
>
>Thanks,
>Shawn
>


Re: Old facet value doesn't go away after index update

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/19/2014 11:22 AM, Tang, Rebecca wrote:
> I have an index that has a field called collection_facet.
>
> There was a value 'Ness Motley Law Firm Documents' that we wanted to update to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I re-indexed just the 36,132 records.  After the update, I ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see if the value got updated and I saw
> Ness Motley Law Firm 36,132  -- as expected
> Ness Motley Law Firm Documents 0 — Why is this value still here even though clearly there are no records with this value anymore?  I thought maybe it was cached, so I restarted solr, but I still got the same results.
>
> "facet_fields": { "collection_facet": [
> … "Ness Motley Law Firm", 36132,
> … "Ness Motley Law Firm Documents", 0 ]

Updating a document in Solr is actually a delete of the old document
followed by indexing a new version.

When a document is deleted from an index, Lucene (the search API that
Solr uses) does not actually remove that document from the index
segment, it just writes an ID value to a file that tracks deletes.  That
document is still in the index, and its terms are still present, but the
software can remove it from any results when it sees that ID value in
the delete tracking file(s).  Only a segment merge can eliminate the
document and remove its terms from the inverted index.

When you do a facet on that field, Lucene still sees "Ness Motley Law
Firm Documents" in the inverted index, because nothing has actually
removed it. The upper layers of Solr faceting code are aware that all
the documents containing that term have been deleted, so it gets a
correct document count of zero.

To eliminate it from the results, you have two choices.  One is to set
facet.mincount=1 as a parameter on your query, the other is to run an
optimize (also known as a forceMerge down to one segment) on the index.

Thanks,
Shawn


Re: Old facet value doesn't go away after index update

Posted by Bill Bell <bi...@gmail.com>.
Set mincount=1

Bill Bell
Sent from mobile


> On Dec 19, 2014, at 12:22 PM, Tang, Rebecca <Re...@ucsf.edu> wrote:
> 
> Hi there,
> 
> I have an index that has a field called collection_facet.
> 
> There was a value 'Ness Motley Law Firm Documents' that we wanted to update to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I re-indexed just the 36,132 records.  After the update, I ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see if the value got updated and I saw
> Ness Motley Law Firm 36,132  -- as expected
> Ness Motley Law Firm Documents 0 — Why is this value still here even though clearly there are no records with this value anymore?  I thought maybe it was cached, so I restarted solr, but I still got the same results.
> 
> "facet_fields": { "collection_facet": [
> … "Ness Motley Law Firm", 36132,
> … "Ness Motley Law Firm Documents", 0 ]
> 
> 
> 
> Rebecca Tang
> Applications Developer, UCSF CKM
> Legacy Tobacco Document Library<legacy.library.ucsf.edu/>
> E: rebecca.tang@ucsf.edu