You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Uwe Reh <re...@hebis.uni-frankfurt.de> on 2012/08/06 23:14:10 UTC

Two questions on spellchecking

Hi,

even though I read a lot, none of my spellchecker configurations works 
really well. I reached a dead end. Maybe someone could help, to solve my 
challenges.

- How can I get case sensitive suggestions, independent of the given 
case in the query?

- How to configure a 'did you mean' spellchecking, as discussed in 
https://issues.apache.org/jira/browse/SOLR-2585 (Context-Sensitive 
Spelling Suggestions & Collations)


I'm using following environment:
- Solr 4.0-alpha (downloaded 25. June)
- Java 7
- schema.xml
>      <fieldType name="textSuggest" class="solr.TextField" positionIncrementGap="100">
>          <analyzer>
>             <tokenizer class="solr.KeywordTokenizerFactory" />
>             <filter class="solr.LowerCaseFilterFactory" />
>          </analyzer>
>       </fieldType>
 > ...
>       <field name="suggest" type="textSuggest" indexed="true"  stored="true" required="false" multiValued="true"  />
- solrconfig.xml (suggester)
>    <requestHandler name="/hint" class="org.apache.solr.handler.component.SearchHandler">
>       <lst name="defaults">
>          <str name="echoParams">all</str>
>          <str name="spellcheck">true</str>
>          <str name="spellcheck.dictionary">suggester</str>
>          <str name="spellcheck.extendedResults">true</str>
>          <str name="spellcheck.onlyMorePopular">false</str>
>          <str name="spellcheck.count">20</str>
>       </lst>
>       <arr name="components">
>          <str>suggester</str>
>       </arr>
>    </requestHandler>
>    <searchComponent name="suggester" class="solr.SpellCheckComponent">
>       <lst name="spellchecker">
>          <str name="name">suggester</str>
>          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>          <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
>          <str name="field">suggest</str>
>       </lst>
>    </searchComponent>
- solrconfig.xml (spellcheck)
>   <requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
>       <lst name="defaults">
>          <str name="echoParams">all</str>
>          <int name="rows">10</int>
>          <str name="df">allfields</str>
>          <str name="spellcheck.extendedResults">true</str>
>          <str name="spellcheck.onlyMorePopular">false</str>
>          <str name="spellcheck.count">20</str>
>       </lst>
>       <arr name="last-components">
>          <str>spellcheck</str>
>       </arr>
>    </requestHandler>
 >    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>       <str name="queryAnalyzerFieldType">textSpell</str>
>       <lst name="spellchecker">
>          <str name="name">default</str>
>          <str name="field">suggest</str>
>          <str name="classname">solr.DirectSolrSpellChecker</str>
>          <str name="distanceMeasure">internal</str>
>          <float name="accuracy">0.1</float>
>          <int name="maxEdits">2</int>
>          <int name="minPrefix">1</int>
>          <int name="maxInspections">5</int>
>          <int name="minQueryLength">1</int>
>          <float name="maxQueryFrequency">0.1</float>
>          <float name="thresholdTokenFrequency">0.001</float>
>       </lst>
>    </searchComponent>

*Suggester problem*
With this configuration the suggester works not case sensitive, but the 
hints are all lower case.
Example: .../hint?q=da&wt=xml&spellcheck=true&spellcheck.build=true
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">173</int><lst name="params"><str name="spellcheck">true</str><str name="echoParams">all</str><str name="spellcheck.extendedResults">true</str><str name="spellcheck.dictionary">suggester</str><str name="spellcheck.count">20</str><str name="spellcheck.onlyMorePopular">false</str><str name="spellcheck">true</str><str name="q">da</str><str name="wt">xml</str><str name="spellcheck.build">true</str></lst></lst><str name="command">build</str><lst name="spellcheck"><lst name="suggestions"><lst name="da"><int name="numFound">20</int><int name="startOffset">0</int><int name="endOffset">2</int><arr name="suggestion"><str>dat-marktspiegel spezial</str><str>data structures with c++ using stl</str><str>data warehouse</str><str>datan, ingeborg</str><str>datenbanken mit delphi</str><str>datenverschlüsselung</str><str>dauner, gabriele</str><str>dautermann, margit</str><str>david copperfield</str><str>david, horst</str><str>dav
id, leo</str><str>david, nicholas</str><str>davis, charles t.</str><str>davis, edward l</str><str>davis, leslie dorfman</str><str>davis, stanley m.</str><str>davor kommt noch</str><str>davydova, irina n.</str><str>dawidowski, bernd</str><str>dayan, daniel</str></arr></lst><bool name="correctlySpelled">false</bool></lst></lst>
> </response>
Using just solr.StrField as field type, the suggestion are true to 
original capitalization, but I get no suggestions, if the query starts 
with a lower case character.

*Spelling problem*
One of the indexed entries in the field 'suggest' is "David Copperfield" 
and I want this string as alternative suggestion to the query "David 
opperfield".
Example .../select?q="david+opperfield"&rows=0&wt=xml&spellcheck=true
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">15</int><lst name="params"><str name="df">allfields</str><str name="echoParams">all</str><str name="spellcheck.extendedResults">true</str><str name="spellcheck.count">20</str><str name="spellcheck.onlyMorePopular">false</str><str name="rows">0</str><str name="spellcheck">true</str><str name="q">"david opperfield"</str><str name="wt">xml</str><str name="rows">0</str></lst></lst><result name="response" numFound="0" start="0"></result><lst name="spellcheck"><lst name="suggestions"><bool name="correctlySpelled">false</bool></lst></lst>
> </response>
.../select?q=david+opperfield&rows=0&wt=xml&spellcheck=true
--> <bool name="correctlySpelled">true</bool>

=?8-)
Uwe

Btw. Is there a DirectSolrSuggester corresponding to DirectSolrSpellChecker?