You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2012/06/06 01:09:41 UTC

Re: TermComponent and Optimize

: It seems that TermComponent is looking at all versions of documents in the index.
: 
: Does this is the expected behavior for TermComponent? Any suggestion about how to solve this?

Yes...

http://wiki.apache.org/solr/TermsComponent
"The doc frequencies returned are the number of documents that match the 
term, including any documents that have been marked for deletion but not 
yet removed from the index."

If you delete/replace a document in the index, it still contributes to 
the doc freq for that term until the "deletion" is expunged (either 
because of a natural segment merge, or forced merging due to optimize)

The reason TermsComponent is so fast, is because it only looks at the raw 
terms, if you want to "fix" the counts to represent visible documents, you 
have to use something like faceting, which will be slower becuase it 
checks the actual (live) document counts.


-Hoss

Re: TermComponent and Optimize

Posted by lboutros <bo...@gmail.com>.
It is possible to use the "expungeDeletes" option in the commit, that could
solve your problem.

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22

Sadly, there is currently a bug with the TieredMergePolicy : 
https://issues.apache.org/jira/browse/SOLR-2725 SOLR-2725 .

But you can use another merge policy (LogMergePolicy for instance).

Your updates will be (a bit) slower if you use this solution.

Ludovic.

-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/TermComponent-and-Optimize-tp3985696p3988056.html
Sent from the Solr - User mailing list archive at Nabble.com.