You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Snubbel <So...@spamgourmet.com> on 2013/08/06 16:17:09 UTC

Spellchecker suggests Tokens

Hello,
I have a problem getting stated with SolrDirectSpellChecker. 
I use NGramFilterFactory to index and query for strings of length greater
than 3. So, if I index the word
"aQuiteLongWord" I can search for "long" and get the result.

Now I'm adding the DirectSolrSpellChecker. And when searching for
"aQuitLongWord" I get suggestions for uit, itL, ngw and so on, which doesn't
make any sense, obviously ;)

How can I keep the Spellchecker from searching for substitutions of Tokens
of the searchstring?

Here is my schema snippet of the fieldtype I use:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.NGramFilterFactory" minGramSize=""
maxGramSize="30"/>
                
                <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-GermanFoldToASCII.txt"/>
                
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                
                <filter class="solr.NGramFilterFactory" minGramSize="3"
maxGramSize="30"/>
                <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-GermanFoldToASCII.txt"/>
           </analyzer>
</fieldType>

And the solrconfig:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_general</str>
        <lst name="spellchecker">
            <str name="name">default</str>
            <str name="field">text</str>
            <str name="classname">solr.DirectSolrSpellChecker</str>
            <str name="distanceMeasure">internal</str>
            <float name="accuracy">0.8</float>
            <int name="maxEdits">2</int>
            <int name="minPrefix">1</int>
            <int name="maxInspections">100</int>
            <int name="minQueryLength">5</int>
            <float name="maxQueryFrequency">0.5</float>
            <float name="thresholdTokenFrequency">.0001</float>
        </lst>
</searchComponent>

 <requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="df">text</str>
            <str name="q.op">AND</str>
            
            <str name="df">text</str>
            
            <str name="spellcheck.dictionary">default</str>
            
            <str name="spellcheck">on</str>
            <str name="spellcheck.extendedResults">true</str>
            <str name="spellcheck.count">100</str>
            
            <str name="spellcheck.maxResultsForSuggest">100</str>
            <str name="spellcheck.collate">false</str>
        </lst>
        <arr name="last-components">
            <str>spellcheck</str>
        </arr>
</requestHandler>



--
View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-suggests-Tokens-tp4082821.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Spellchecker suggests Tokens

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.

I think the issue lies in the analysis of the field you use for
spellchecking. It also contains NGramFilterFactory. So wither copy your data
to another field with  some other fieldType which doesnot do
NGramFilterFactory analysis and then try this out.



--
View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-suggests-Tokens-tp4082821p4083846.html
Sent from the Solr - User mailing list archive at Nabble.com.