You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2009/12/21 12:19:18 UTC

[jira] Commented: (SOLR-1676) spellcheck.count has confusing default and documentation

    [ https://issues.apache.org/jira/browse/SOLR-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793158#action_12793158 ] 

Shalin Shekhar Mangar commented on SOLR-1676:
---------------------------------------------

Although it is not documented anywhere, SpellCheckComponent passes max(spellcheck.count, 5) to the Lucene spellchecker, see AbstractLuceneSpellChecker line 141 in trunk.

bq. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10. 

That is true. It is a trade-off between accuracy and performance. We cannot avoid this without fetching all results (or a large number of them) internally and score all of them with a distance metric and that can make it very slow.

Do you have any suggestion on how we could improve the documentation?



> spellcheck.count has confusing default and documentation
> --------------------------------------------------------
>
>                 Key: SOLR-1676
>                 URL: https://issues.apache.org/jira/browse/SOLR-1676
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.4
>            Reporter: Daniel Naber
>            Priority: Minor
>
> It seems spellcheck.count does not just limit the number of results returned, as the documentation claims. Instead, this value is given to the Lucene SpellChecker class which multiplies it by 10 and then only fetches the first spellcheck.count*10 candidates, ignoring all others. The effect is that with a low value for spellcheck.count you might miss good hits. In other words, the first item with spellcheck.count==1 is not always the same item as with e.g. spellcheck.count==10.
> The fix could be to fix the documentation (the comments in the sample solrconfig.xml) to mention this and use a better default.
> The Lucene SpellChecker class says about the numSug parameter: "Thus, you should set this value to *at least* 5 for a good suggestion."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.