You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Carrie Coy <cc...@ssww.com> on 2012/06/28 15:20:08 UTC
WordBreakSolrSpellChecker ignores MinBreakWordLength?
I set MinBreakWordLength = 3 thinking it would prevent
WordBreakSolrSpellChecker from suggesting corrections made up of
subwords shorter than 3 characters, but I still get suggestions like this:
query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)
Can someone help me understand why? Here is the relevant portion of
solrconfig.xml:
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">15</str>
<str name="spellcheck.maxCollationTries">100</str>
<str name="spellcheck.alternativeTermCount">4</str>
<str name="spellcheck.collateParam.mm">100%</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.MinBreakWordLength">3</str>
<str name="spellcheck.maxChanges">3</str>
Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?
Posted by Carrie Coy <cc...@ssww.com>.
Thanks! The combination of these two suggestions (relocating the
wordbreak parameters to the spellchecker configuration and correcting
the spelling of the parameter to "minBreakLength") fixed the problem I
was having.
On 06/28/2012 10:22 AM, Dyer, James wrote:
> Carrie,
>
> Try taking the "workbreak" parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the "spellcheck." prefix. Also, the correct spelling for this parameter is "minBreakLength". Here's an example.
>
> <lst name="spellchecker">
> <str name="name">wordbreak</str>
> <str name="classname">solr.WordBreakSolrSpellChecker</str>
> <str name="field">{your field name here}</str>
> <str name="combineWords">true</str>
> <str name="breakWords">true</str>
> <int name="maxChanges">3</int>
> <int name="minBreakLength">3</int>
> </lst>
>
> All of the parameters in the following source file go in the spellchecker configuration like this:
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java
>
> Descriptions of each of these parameters can be found in this source file:
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java
>
> Let me know if this works out for you. Any more feedback you can provide on the newer spellcheck features you're using is appreciated. Thanks.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Carrie Coy [mailto:ccoy@ssww.com]
> Sent: Thursday, June 28, 2012 8:20 AM
> To: solr-user@lucene.apache.org
> Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?
>
> I set MinBreakWordLength = 3 thinking it would prevent
> WordBreakSolrSpellChecker from suggesting corrections made up of
> subwords shorter than 3 characters, but I still get suggestions like this:
>
> query: Touch N' Match
> suggestion: (t o u ch) 'n (m a t ch)
>
> Can someone help me understand why? Here is the relevant portion of
> solrconfig.xml:
>
> <str name="spellcheck.dictionary">default</str>
> <str name="spellcheck.dictionary">wordbreak</str>
> <str name="spellcheck.count">10</str>
> <str name="spellcheck.collate">true</str>
> <str name="spellcheck.maxCollations">15</str>
> <str name="spellcheck.maxCollationTries">100</str>
> <str name="spellcheck.alternativeTermCount">4</str>
> <str name="spellcheck.collateParam.mm">100%</str>
> <str name="spellcheck.collateExtendedResults">true</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
> <str name="spellcheck.MinBreakWordLength">3</str>
> <str name="spellcheck.maxChanges">3</str>
>
RE: WordBreakSolrSpellChecker ignores MinBreakWordLength?
Posted by "Dyer, James" <Ja...@ingrambook.com>.
Carrie,
Try taking the "workbreak" parameters out of the request handler configuration and instead put them in the spellchecker configuration. You also need to remove the "spellcheck." prefix. Also, the correct spelling for this parameter is "minBreakLength". Here's an example.
<lst name="spellchecker">
<str name="name">wordbreak</str>
<str name="classname">solr.WordBreakSolrSpellChecker</str>
<str name="field">{your field name here}</str>
<str name="combineWords">true</str>
<str name="breakWords">true</str>
<int name="maxChanges">3</int>
<int name="minBreakLength">3</int>
</lst>
All of the parameters in the following source file go in the spellchecker configuration like this:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java
Descriptions of each of these parameters can be found in this source file:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java
Let me know if this works out for you. Any more feedback you can provide on the newer spellcheck features you're using is appreciated. Thanks.
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Carrie Coy [mailto:ccoy@ssww.com]
Sent: Thursday, June 28, 2012 8:20 AM
To: solr-user@lucene.apache.org
Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?
I set MinBreakWordLength = 3 thinking it would prevent
WordBreakSolrSpellChecker from suggesting corrections made up of
subwords shorter than 3 characters, but I still get suggestions like this:
query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)
Can someone help me understand why? Here is the relevant portion of
solrconfig.xml:
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">15</str>
<str name="spellcheck.maxCollationTries">100</str>
<str name="spellcheck.alternativeTermCount">4</str>
<str name="spellcheck.collateParam.mm">100%</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.MinBreakWordLength">3</str>
<str name="spellcheck.maxChanges">3</str>