You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2013/01/21 18:30:15 UTC
[jira] [Commented] (SOLR-4260) Inconsistent docFreq and docCount
before and after forceMerge/optimize
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558929#comment-13558929 ]
Yonik Seeley commented on SOLR-4260:
------------------------------------
bq. There is a small variation in maxDoc which is expected but there is also a variation in docFreq which is very unexpected, docFreq must not change at all if i reindex the same data.
Unfortunately, deletions don't change index statistics like docFreq (this has been the case since the first version of Lucene). This means that reindexing a document can artificially increase the docFreq until the deletion is really removed via merging/optimize.
> Inconsistent docFreq and docCount before and after forceMerge/optimize
> ----------------------------------------------------------------------
>
> Key: SOLR-4260
> URL: https://issues.apache.org/jira/browse/SOLR-4260
> Project: Solr
> Issue Type: Bug
> Components: update
> Affects Versions: 5.0
> Environment: 5.0.0.2013.01.04.15.31.51
> Reporter: Markus Jelsma
> Priority: Critical
> Fix For: 5.0
>
>
> After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards.
> Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more.
> Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs.
> We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org