You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Swinson <Ma...@bbc.co.uk> on 2011/10/03 18:09:26 UTC

Debugging misbehaving spellchecker search....

Hi,

I'm trying to configure solr to perform a 'Did you mean this' style
search using the SpellCheckerComponent and the standard search handler.
Unfortunately I am having problems in getting results from my test
search ... basically when I search using a misspelling of a word I know
to be in the source index I get no results.

I have built a standard index from a mysql table using the 'dataimport'
plugin. This works successfully and I am able to make standard text
queries on this with the expected results. I have then performed a
spellchecker index rebuild using the following uri

/select?spellcheck=true&spellcheck.build=true&q=*

( if I try without the q parameter which I don't think is necessary in
this particular situation I get a null pointer exception ).

Does anyone know if there is a way of confirming if the spellchecker
index has been correctly written, as I want to isolate whether or not
it is my query that is at fault or my spellchecker configuration.



For reference, below is the key aspects of my solr configration relating
to this issue -


Thanks


Mark



schema.xml:
	<fieldType	name="textSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
		<analyzer type="index">
			<tokenizer
class="solr.StandardTokenizerFactory"/>
			<filter class="solr.LowerCaseFilterFactory"/>
			<!-- filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" / -->
			<!-- filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/ -->
			<filter class="solr.StandardFilterFactory"/>
			<filter
class="RemovingDuplicatesTokenFilterFactory"/>
		</analyzer>
		<analyzer type="query">
			<tokenizer
class="solr.StandardTokenizerFactory"/>
			<filter class="solr.LowerCaseFilterFactory"/>
			<!-- filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/ -->
			<filter class="solr.StandardFilterFactory"/>
			<filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
		</analyzer>
	</fieldType>

	<field name="recipeId" type="string" indexed="true"
stored="true" required="true"/>

	<field name="spell" type="textSpell" indexed="true"
required="true"/>

	<copyField source="recipeId" dest="spell"/>


solrconfig.xml:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    
    <str name="queryAnalyzerFieldType">textSpell</str>
    
    <lst name="spellchecker">
      <str name="name">test</str>
      <str name="field">spell</str>
      <str name="buildOnOptimize">true</str>
      <str name="spellcheckIndexDir">./spellchecker</str>
    </lst>

  </searchComponent>
  <!-- requestHandler plugins... incoming queries will be dispatched to
the
     correct handler based on the path or the qt (query type) param.
     Names starting with a '/' are accessed with the a path equal to the

     registered name.  Names without a leading '/' are accessed with:
      http://host/app/select?qt=name
     If no qt is defined, the requestHandler that declares
default="true"
     will be used.
  -->
  <requestHandler name="standard" class="solr.SearchHandler"
default="true">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <!--
       <int name="rows">10</int>
       <str name="fl">*</str>
       <str name="version">2.1</str>
        -->
		<str name="spellcheck.onlyMorePopular">false</str>
      	<!-- exr = Extended Results -->
      	<str name="spellcheck.extendedResults">false</str>
      	<!--  The number of suggestions to return -->
      	<str name="spellcheck.count">1</str>
	    <arr name="last-components">
    	  <str>spellcheck</str>
    	</arr>
     </lst>     
  </requestHandler>




http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.