You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Carrie Coy <cc...@ssww.com> on 2012/06/28 15:20:08 UTC

WordBreakSolrSpellChecker ignores MinBreakWordLength?

I set MinBreakWordLength = 3 thinking it would prevent 
WordBreakSolrSpellChecker from suggesting corrections made up of 
subwords shorter than 3 characters, but I still get suggestions like this:

query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of 
solrconfig.xml:

<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">15</str>
<str name="spellcheck.maxCollationTries">100</str>
<str name="spellcheck.alternativeTermCount">4</str>
<str name="spellcheck.collateParam.mm">100%</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.MinBreakWordLength">3</str>
<str name="spellcheck.maxChanges">3</str>


Re: Solved: WordBreakSolrSpellChecker ignores MinBreakWordLength?

Posted by Carrie Coy <cc...@ssww.com>.
Thanks! The combination of these two suggestions (relocating the 
wordbreak parameters to the spellchecker configuration and correcting 
the spelling of the parameter to "minBreakLength") fixed the problem I 
was having.

On 06/28/2012 10:22 AM, Dyer, James wrote:
> Carrie,
>
> Try taking the "workbreak" parameters out of the request handler configuration and instead put them in the spellchecker configuration.  You also need to remove the "spellcheck." prefix.  Also, the correct spelling for this parameter is "minBreakLength".  Here's an example.
>
> <lst name="spellchecker">
>   <str name="name">wordbreak</str>
>   <str name="classname">solr.WordBreakSolrSpellChecker</str>
>   <str name="field">{your field name here}</str>
>   <str name="combineWords">true</str>
>   <str name="breakWords">true</str>
>   <int name="maxChanges">3</int>
>   <int name="minBreakLength">3</int>
> </lst>
>
> All of the parameters in the following source file go in the spellchecker configuration like this:
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java
>
> Descriptions of each of these parameters can be found in this source file:
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java
>
> Let me know if this works out for you.  Any more feedback you can provide on the newer spellcheck features you're using is appreciated.  Thanks.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Carrie Coy [mailto:ccoy@ssww.com]
> Sent: Thursday, June 28, 2012 8:20 AM
> To: solr-user@lucene.apache.org
> Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?
>
> I set MinBreakWordLength = 3 thinking it would prevent
> WordBreakSolrSpellChecker from suggesting corrections made up of
> subwords shorter than 3 characters, but I still get suggestions like this:
>
> query: Touch N' Match
> suggestion: (t o u ch) 'n (m a t ch)
>
> Can someone help me understand why?  Here is the relevant portion of
> solrconfig.xml:
>
> <str name="spellcheck.dictionary">default</str>
> <str name="spellcheck.dictionary">wordbreak</str>
> <str name="spellcheck.count">10</str>
> <str name="spellcheck.collate">true</str>
> <str name="spellcheck.maxCollations">15</str>
> <str name="spellcheck.maxCollationTries">100</str>
> <str name="spellcheck.alternativeTermCount">4</str>
> <str name="spellcheck.collateParam.mm">100%</str>
> <str name="spellcheck.collateExtendedResults">true</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
> <str name="spellcheck.MinBreakWordLength">3</str>
> <str name="spellcheck.maxChanges">3</str>
>

RE: WordBreakSolrSpellChecker ignores MinBreakWordLength?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Carrie,

Try taking the "workbreak" parameters out of the request handler configuration and instead put them in the spellchecker configuration.  You also need to remove the "spellcheck." prefix.  Also, the correct spelling for this parameter is "minBreakLength".  Here's an example.

<lst name="spellchecker">
 <str name="name">wordbreak</str>
 <str name="classname">solr.WordBreakSolrSpellChecker</str>      
 <str name="field">{your field name here}</str>
 <str name="combineWords">true</str>
 <str name="breakWords">true</str>
 <int name="maxChanges">3</int>
 <int name="minBreakLength">3</int>
</lst>

All of the parameters in the following source file go in the spellchecker configuration like this:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/spelling/WordBreakSolrSpellChecker.java

Descriptions of each of these parameters can be found in this source file:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/lucene/suggest/src/java/org/apache/lucene/search/spell/WordBreakSpellChecker.java

Let me know if this works out for you.  Any more feedback you can provide on the newer spellcheck features you're using is appreciated.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Carrie Coy [mailto:ccoy@ssww.com] 
Sent: Thursday, June 28, 2012 8:20 AM
To: solr-user@lucene.apache.org
Subject: WordBreakSolrSpellChecker ignores MinBreakWordLength?

I set MinBreakWordLength = 3 thinking it would prevent 
WordBreakSolrSpellChecker from suggesting corrections made up of 
subwords shorter than 3 characters, but I still get suggestions like this:

query: Touch N' Match
suggestion: (t o u ch) 'n (m a t ch)

Can someone help me understand why?  Here is the relevant portion of 
solrconfig.xml:

<str name="spellcheck.dictionary">default</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollations">15</str>
<str name="spellcheck.maxCollationTries">100</str>
<str name="spellcheck.alternativeTermCount">4</str>
<str name="spellcheck.collateParam.mm">100%</str>
<str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="spellcheck.MinBreakWordLength">3</str>
<str name="spellcheck.maxChanges">3</str>