You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2013/08/08 00:42:47 UTC
[jira] [Updated] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to "ArithmeticException: / by zero"

     [ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-5122:
---------------------------

    Description: 
As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts.

As far as i can tell: the test assumes that specific values would be returned as the _estimated_ "hits" for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected.

I made a quick attempt to improve the test to:
 * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the "estimate' should actually be exact (ie: collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num docs in the index
 * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index

This lead to an odd "ArithmeticException: / by zero" error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations.

*Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order.

  was:
As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts.

As far as i can tell: the test assumes that specific values would be returned as the _estimated_ "hits" for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected.

I made a quick attempt to improve the test to:
 * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the "estimate' should actually be exact (ie: collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num docs in the index
 * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index

This lead to an odd "ArithmeticException: / by zero" error in the test, which seems to suggest that there is a genuine bug in the code that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations.

        Summary: spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to "ArithmeticException: / by zero"  (was: "ArithmeticException: / by zero" using spellcheck.collateMaxCollectDocs)

FYI: I attempted ot do a simple revert of r1479645 and the test still fails -- but reviewing hte diff i think that's because there doesn't seem to be anything paying attention to the FORCE_INORDER_COLLECTION flag at collection time, so it's effectively useless.

I'm at a loss to really understand what the correct fix should be at this point
                
> spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to "ArithmeticException: / by zero"
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5122
>                 URL: https://issues.apache.org/jira/browse/SOLR-5122
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Hoss Man
>         Attachments: SOLR-5122.patch
>
>
> As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts.
> As far as i can tell: the test assumes that specific values would be returned as the _estimated_ "hits" for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected.
> I made a quick attempt to improve the test to:
>  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the "estimate' should actually be exact (ie: collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num docs in the index
>  * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index
> This lead to an odd "ArithmeticException: / by zero" error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations.
> *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org