You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2012/06/06 01:09:41 UTC
Re: TermComponent and Optimize
: It seems that TermComponent is looking at all versions of documents in the index.
:
: Does this is the expected behavior for TermComponent? Any suggestion about how to solve this?
Yes...
http://wiki.apache.org/solr/TermsComponent
"The doc frequencies returned are the number of documents that match the
term, including any documents that have been marked for deletion but not
yet removed from the index."
If you delete/replace a document in the index, it still contributes to
the doc freq for that term until the "deletion" is expunged (either
because of a natural segment merge, or forced merging due to optimize)
The reason TermsComponent is so fast, is because it only looks at the raw
terms, if you want to "fix" the counts to represent visible documents, you
have to use something like faceting, which will be slower becuase it
checks the actual (live) document counts.
-Hoss
Re: TermComponent and Optimize
Posted by lboutros <bo...@gmail.com>.
It is possible to use the "expungeDeletes" option in the commit, that could
solve your problem.
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22
Sadly, there is currently a bug with the TieredMergePolicy :
https://issues.apache.org/jira/browse/SOLR-2725 SOLR-2725 .
But you can use another merge policy (LogMergePolicy for instance).
Your updates will be (a bit) slower if you use this solution.
Ludovic.
-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/TermComponent-and-Optimize-tp3985696p3988056.html
Sent from the Solr - User mailing list archive at Nabble.com.