You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2013/08/07 21:53:48 UTC
[jira] [Commented] (SOLR-5122) "ArithmeticException: / by zero" using spellcheck.collateMaxCollectDocs

    [ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732654#comment-13732654 ] 

Hoss Man commented on SOLR-5122:
--------------------------------

The problematic line is when catching the EarlyTerminatingCollectorException exception and computing the estimate based on the last doc id collected...

{noformat}
hits = maxDocId / ((etce.getLastDocId() + 1) / docCollectionLimit);
{noformat}

Unless i'm mising something, the problem comes up when {{(etce.getLastDocId() + 1) < docCollectionLimit}} because then the integer division results in 0, which then becomes the demoninator under {{maxDocId}}

It would be trivial to toss another "1+" in there to eliminate the divide by zero, but i'm confused about the basic assumption taking place here -- it smells fishy -- making any estimation based on getLastDocId() seems to only be useful if we know docs are being collected in order, and when the collateMaxCollectDocs option was added in r1479638, it did force in order collection when using hte early termination...

https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java?r1=1479638&r2=1479637&pathrev=1479638

...but in r1479645 that use of FORCE_INORDER_COLLECTION was eliminate with the msg "removing dead code" ...

https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java?r1=1479645&r2=1479644&pathrev=1479645

But w/o FORCE_INORDER_COLLECTION I don't see how any estimation based on the lastDocId can ever be meaningful?


[~jdyer] can you take a look at this?

                
> "ArithmeticException: / by zero" using spellcheck.collateMaxCollectDocs
> -----------------------------------------------------------------------
>
>                 Key: SOLR-5122
>                 URL: https://issues.apache.org/jira/browse/SOLR-5122
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Hoss Man
>         Attachments: SOLR-5122.patch
>
>
> As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts.
> As far as i can tell: the test assumes that specific values would be returned as the _estimated_ "hits" for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected.
> I made a quick attempt to improve the test to:
>  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the "estimate' should actually be exact (ie: collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num docs in the index
>  * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index
> This lead to an odd "ArithmeticException: / by zero" error in the test, which seems to suggest that there is a genuine bug in the code that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org