You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ji...@ece.ubc.ca on 2014/07/16 01:40:05 UTC

questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Hello everyone :)

I have a product called "xbox" indexed, and when the user search for
either "x-box" or "x box" i want the "xbox" product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for "x-box" case, and
WordBreakSolrSpellChecker for "x box" case. Is this correct?

(1) In my schema file, this is what I changed:
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>

But I don't see the xbox product returned when the search term is
"x-box", so I must have missed something....

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">wc_textSpell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellCheck</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      	<float name="accuracy">0.3</float>
			<int name="maxEdits">2</int>
			<int name="minPrefix">1</int>
			<int name="maxInspections">5</int>
			<int name="minQueryLength">3</int>
			<float name="maxQueryFrequency">0.01</float>
			<float name="thresholdTokenFrequency">0.004</float>
    </lst>
 <lst name="spellchecker">
	<str name="name">wordbreak</str>
	<str name="classname">solr.WordBreakSolrSpellChecker</str>
	<str name="field">spellCheck</str>
	<str name="combineWords">true</str>
	<str name="breakWords">true</str>
	<int name="maxChanges">10</int>
  </lst>
  </searchComponent>

  <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
		<str name="df">SpellCheck</str>
	    <str name="spellcheck">true</str>
   		<str name="spellcheck.dictionary">default</str>
		<str name="spellcheck.dictionary">wordbreak</str>
		<str name="spellcheck.build"> true</str>
   		<str name="spellcheck.onlyMorePopular">false</str>
   		<str name="spellcheck.count">10</str>
   		<str name="spellcheck.collate">true</str>
   		<str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
but the response returned is this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="spellcheck.build">true</str>
<str name="spellcheck">true</str>
</lst>
</lst>
<str name="command">build</str>
<result name="response" numFound="0" start="0"/>
</response>

What's the correct way to build the dictionary?
Even though my requestHandler's name="/spellcheck", i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">

  <str name="queryAnalyzerFieldType">wc_textSpell</str>
	<lst name="spellchecker">
	<str name="name">default</str>
	<str name="classname">solr.WordBreakSolrSpellChecker</str>
	<str name="field">spellCheck</str>
	<str name="combineWords">true</str>
	<str name="breakWords">true</str>
	<int name="maxChanges">10</int>
  </lst>
   </searchComponent>

   <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
		<str name="df">SpellCheck</str>
	    <str name="spellcheck">true</str>
   		<str name="spellcheck.dictionary">default</str>
		<!--<str name="spellcheck.dictionary">wordbreak</str> -->
		<str name="spellcheck.build"> true</str>
   		<str name="spellcheck.onlyMorePopular">false</str>
   		<str name="spellcheck.count">10</str>
   		<str name="spellcheck.collate">true</str>
   		<str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by benjelloun <an...@gmail.com>.
hello,

for WordDelimiterFilterFactory:

this is an exemple in schema.xml to folow:

<field name="spell"  type="textSpell" multiValued="true" indexed="true"
required="false" stored="false"/>

<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <filter class="solr.LengthFilterFactory" min="3" max="20" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"/>   
		
        </analyzer>
		
        <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.PatternReplaceFilterFactory"
pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <filter class="solr.LengthFilterFactory" min="3" max="20" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"/>
        </analyzer>
    </fieldType>

and for WordBreakSolrSpellChecker  folow this in solrconfig.xml:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <str name="queryAnalyzerFieldType">textSpell</str>
<lst name="spellchecker">
      <str name="name">wordbreak</str>
      <str name="classname">solr.WordBreakSolrSpellChecker</str>      
      <str name="field">spell</str>
      <str name="combineWords">true</str>
      <str name="breakWords">true</str>
      <int name="maxChanges">10</int>
    </lst>
</searchComponent>

<requestHandler name="/spellcheck" class="solr.SearchHandler">
<lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
<str name="qf">your field</str>
str name="qt">spellchecker</str>
	   <str name="cmd">rebuild</str>
<str name="spellcheck.dictionary">wordbreak</str>
<str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>   
	  
      <str name="spellcheck.collate">true</str>
	  <str name="spellcheck.onlyMorePopular">true</str>
	  <str name="spellcheck.maxCollations">5</str> 
      <str name="spellcheck.collateExtendedResults">true</str>
<str name="spellcheck">true</str>
	  <str name="spellcheck.build">true</str>	 
<arr name="last-components">
	  <str>spellcheck</str> 
    </arr>	
 </requestHandler>

regards,
Anass BENJELLOUN





--
View this message in context: http://lucene.472066.n3.nabble.com/questions-on-Solr-WordBreakSolrSpellChecker-and-WordDelimiterFilterFactory-tp4147390p4147859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by Erick Erickson <er...@gmail.com>.
Zeroth, take a look at the admin/analysis page with that input and see if
your
field in analyzing x-box and xbox like you expect.

First, try adding &debug=all to the URL, that'll show you exactly what the
parsed query
was. It may surprise you.

Second, examine what's actually _in_ the index with the
admin/schema-browser or
TermsComponent or Luke to see if _that's_ what you expect.

My bet is it'll be pretty obvious in one of those three steps... but I've
lost bets before.


On Thu, Jul 17, 2014 at 5:42 AM, <ji...@ece.ubc.ca> wrote:

> Hi Ahmet,
>
> using <arr name="last-components"> or <arr name="components"> didn't
> make any difference. Still running into the same issues aforementioned :(
>
> Thanks,
> Jia
>
> On 7/16/2014, "Ahmet Arslan" <io...@yahoo.com> wrote:
>
> >Hi Jia,
> >
> >What happens when you use
> >
> > <arr name="last-components">
> >
> >instead of
> >
> > <arr name="components">
> >
> >Ahmet
> >
> >
> >On Wednesday, July 16, 2014 3:07 AM, "jiag@ece.ubc.ca" <ji...@ece.ubc.ca>
> wrote:
> >
> >
> >
> >Hello everyone :)
> >
> >I have a product called "xbox" indexed, and when the user search for
> >either "x-box" or "x box" i want the "xbox" product to be
> >returned.  I'm new to Solr, and from reading online, I thought I need
> >to use WordDelimiterFilterFactory for "x-box" case, and
> >WordBreakSolrSpellChecker for "x box" case. Is this correct?
> >
> >(1) In my schema file, this is what I changed:
> ><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> >generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
> >
> >But I don't see the xbox product returned when the search term is
> >"x-box", so I must have missed something....
> >
> >(2) I tried to use  WordBreakSolrSpellChecker together with
> >DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
> >never got used:
> >
> ><searchComponent name="wc_spellcheck"
> >class="solr.SpellCheckComponent">
> >    <str name="queryAnalyzerFieldType">wc_textSpell</str>
> >
> >    <lst name="spellchecker">
> >      <str name="name">default</str>
> >      <str name="field">spellCheck</str>
> >      <str name="classname">solr.DirectSolrSpellChecker</str>
> >      <str name="distanceMeasure">internal</str>
> >          <float name="accuracy">0.3</float>
> >            <int name="maxEdits">2</int>
> >            <int name="minPrefix">1</int>
> >            <int name="maxInspections">5</int>
> >            <int name="minQueryLength">3</int>
> >            <float name="maxQueryFrequency">0.01</float>
> >            <float name="thresholdTokenFrequency">0.004</float>
> >    </lst>
> ><lst name="spellchecker">
> >    <str name="name">wordbreak</str>
> >    <str name="classname">solr.WordBreakSolrSpellChecker</str>
> >    <str name="field">spellCheck</str>
> >    <str name="combineWords">true</str>
> >    <str name="breakWords">true</str>
> >    <int name="maxChanges">10</int>
> >  </lst>
> >  </searchComponent>
> >
> >  <requestHandler name="/spellcheck"
> >class="org.apache.solr.handler.component.SearchHandler">
> >    <lst name="defaults">
> >        <str name="df">SpellCheck</str>
> >        <str name="spellcheck">true</str>
> >           <str name="spellcheck.dictionary">default</str>
> >        <str name="spellcheck.dictionary">wordbreak</str>
> >        <str name="spellcheck.build"> true</str>
> >           <str name="spellcheck.onlyMorePopular">false</str>
> >           <str name="spellcheck.count">10</str>
> >           <str name="spellcheck.collate">true</str>
> >           <str name="spellcheck.collateExtendedResults">false</str>
> >    </lst>
> >    <arr name="components">
> >      <str>wc_spellcheck</str>
> >    </arr>
> >  </requestHandler>
> >
> >I tried to build the dictionary this way:
> >
> http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true
> ,
> >but the response returned is this:
> ><response>
> ><lst name="responseHeader">
> ><int name="status">0</int>
> ><int name="QTime">0</int>
> ><lst name="params">
> ><str name="spellcheck.build">true</str>
> ><str name="spellcheck">true</str>
> ></lst>
> ></lst>
> ><str name="command">build</str>
> ><result name="response" numFound="0" start="0"/>
> ></response>
> >
> >What's the correct way to build the dictionary?
> >Even though my requestHandler's name="/spellcheck", i wasn't able to
> >use
> >
> http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
> >.. is there something wrong with my definition above?
> >
> >(3) I also tried to use WordBreakSolrSpellChecker without the
> >DirectSolrSpellChecker as shown below:
> ><searchComponent name="wc_spellcheck"
> >class="solr.SpellCheckComponent">
> >
> >  <str name="queryAnalyzerFieldType">wc_textSpell</str>
> >    <lst name="spellchecker">
> >    <str name="name">default</str>
> >    <str name="classname">solr.WordBreakSolrSpellChecker</str>
> >    <str name="field">spellCheck</str>
> >    <str name="combineWords">true</str>
> >    <str name="breakWords">true</str>
> >    <int name="maxChanges">10</int>
> >  </lst>
> >   </searchComponent>
> >
> >   <requestHandler name="/spellcheck"
> >class="org.apache.solr.handler.component.SearchHandler">
> >    <lst name="defaults">
> >        <str name="df">SpellCheck</str>
> >        <str name="spellcheck">true</str>
> >           <str name="spellcheck.dictionary">default</str>
> >        <!--<str name="spellcheck.dictionary">wordbreak</str> -->
> >        <str name="spellcheck.build"> true</str>
> >           <str name="spellcheck.onlyMorePopular">false</str>
> >           <str name="spellcheck.count">10</str>
> >           <str name="spellcheck.collate">true</str>
> >           <str name="spellcheck.collateExtendedResults">false</str>
> >    </lst>
> >    <arr name="components">
> >      <str>wc_spellcheck</str>
> >    </arr>
> >  </requestHandler>
> >
> >And still unable to see WordBreakSolrSpellChecker being called anywhere.
> >
> >Would someone kindly help me?
> >
> >Many thanks,
> >Jia
> >
>

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by Diego Fernandez <di...@redhat.com>.
Which tokenizer are you using?  StandardTokenizer will split "x-box" into "x" and "box", same as "x box".

If there's not too many of these, you could also use the PatternReplaceCharFilterFactory to map "x box" and "x-box" to "xbox" before the tokenizer.

Diego Fernandez - 爱国
Software Engineer
US GSS Supportability - Diagnostics


----- Original Message -----
> Jia,
> 
> I agree that for the spellcheckers to work, you need  <arr
> name="last-components"> instead of <arr name="components">.
> 
> But the "x-box" => "xbox" example ought to be solved by analyzing using
> WordDelimiterFilterFactory and "catenateWords=1" at query-time.  Did you
> re-index after changing your analysis chain (you need to)?  Perhaps you can
> show your full analyzer configuration, and someone here can help you find
> the problem. Also, the Analysis page on the solr Admin UI is invaluable for
> debugging text-field analyzer problems.
> 
> Getting "x box" to analyze to "xbox" is trickier (but possible).  The
> WordBreakSpellChecker is probably your best option if you have cases like
> this in your data & users' queries.
> 
> Of course, if you have a finite number of products that have spelling
> variants like this, SynonymFilterFactory might be all you need.  I would
> recommend using index-time synonyms for your case rather than query-time
> synonyms.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
> Sent: Wednesday, July 16, 2014 7:42 AM
> To: solr-user@lucene.apache.org; jiag@ece.ubc.ca
> Subject: Re: questions on Solr WordBreakSolrSpellChecker and
> WordDelimiterFilterFactory
> 
> Hi Jia,
> 
> What happens when you use
> 
>  <arr name="last-components">
> 
> instead of
> 
>  <arr name="components">
> 
> Ahmet
> 
> 
> On Wednesday, July 16, 2014 3:07 AM, "jiag@ece.ubc.ca" <ji...@ece.ubc.ca>
> wrote:
> 
> 
> 
> Hello everyone :)
> 
> I have a product called "xbox" indexed, and when the user search for
> either "x-box" or "x box" i want the "xbox" product to be
> returned.  I'm new to Solr, and from reading online, I thought I need
> to use WordDelimiterFilterFactory for "x-box" case, and
> WordBreakSolrSpellChecker for "x box" case. Is this correct?
> 
> (1) In my schema file, this is what I changed:
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
> 
> But I don't see the xbox product returned when the search term is
> "x-box", so I must have missed something....
> 
> (2) I tried to use  WordBreakSolrSpellChecker together with
> DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
> never got used:
> 
> <searchComponent name="wc_spellcheck"
> class="solr.SpellCheckComponent">
>     <str name="queryAnalyzerFieldType">wc_textSpell</str>
> 
>     <lst name="spellchecker">
>       <str name="name">default</str>
>       <str name="field">spellCheck</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>       <str name="distanceMeasure">internal</str>
>           <float name="accuracy">0.3</float>
>             <int name="maxEdits">2</int>
>             <int name="minPrefix">1</int>
>             <int name="maxInspections">5</int>
>             <int name="minQueryLength">3</int>
>             <float name="maxQueryFrequency">0.01</float>
>             <float name="thresholdTokenFrequency">0.004</float>
>     </lst>
> <lst name="spellchecker">
>     <str name="name">wordbreak</str>
>     <str name="classname">solr.WordBreakSolrSpellChecker</str>
>     <str name="field">spellCheck</str>
>     <str name="combineWords">true</str>
>     <str name="breakWords">true</str>
>     <int name="maxChanges">10</int>
>   </lst>
>   </searchComponent>
> 
>   <requestHandler name="/spellcheck"
> class="org.apache.solr.handler.component.SearchHandler">
>     <lst name="defaults">
>         <str name="df">SpellCheck</str>
>         <str name="spellcheck">true</str>
>            <str name="spellcheck.dictionary">default</str>
>         <str name="spellcheck.dictionary">wordbreak</str>
>         <str name="spellcheck.build"> true</str>
>            <str name="spellcheck.onlyMorePopular">false</str>
>            <str name="spellcheck.count">10</str>
>            <str name="spellcheck.collate">true</str>
>            <str name="spellcheck.collateExtendedResults">false</str>
>     </lst>
>     <arr name="components">
>       <str>wc_spellcheck</str>
>     </arr>
>   </requestHandler>
> 
> I tried to build the dictionary this way:
> http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
> but the response returned is this:
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> <lst name="params">
> <str name="spellcheck.build">true</str>
> <str name="spellcheck">true</str>
> </lst>
> </lst>
> <str name="command">build</str>
> <result name="response" numFound="0" start="0"/>
> </response>
> 
> What's the correct way to build the dictionary?
> Even though my requestHandler's name="/spellcheck", i wasn't able to
> use
> http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
> .. is there something wrong with my definition above?
> 
> (3) I also tried to use WordBreakSolrSpellChecker without the
> DirectSolrSpellChecker as shown below:
> <searchComponent name="wc_spellcheck"
> class="solr.SpellCheckComponent">
> 
>   <str name="queryAnalyzerFieldType">wc_textSpell</str>
>     <lst name="spellchecker">
>     <str name="name">default</str>
>     <str name="classname">solr.WordBreakSolrSpellChecker</str>
>     <str name="field">spellCheck</str>
>     <str name="combineWords">true</str>
>     <str name="breakWords">true</str>
>     <int name="maxChanges">10</int>
>   </lst>
>    </searchComponent>
> 
>    <requestHandler name="/spellcheck"
> class="org.apache.solr.handler.component.SearchHandler">
>     <lst name="defaults">
>         <str name="df">SpellCheck</str>
>         <str name="spellcheck">true</str>
>            <str name="spellcheck.dictionary">default</str>
>         <!--<str name="spellcheck.dictionary">wordbreak</str> -->
>         <str name="spellcheck.build"> true</str>
>            <str name="spellcheck.onlyMorePopular">false</str>
>            <str name="spellcheck.count">10</str>
>            <str name="spellcheck.collate">true</str>
>            <str name="spellcheck.collateExtendedResults">false</str>
>     </lst>
>     <arr name="components">
>       <str>wc_spellcheck</str>
>     </arr>
>   </requestHandler>
> 
> And still unable to see WordBreakSolrSpellChecker being called anywhere.
> 
> Would someone kindly help me?
> 
> Many thanks,
> Jia
> 
> 
> 

RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Jia,

I agree that for the spellcheckers to work, you need  <arr name="last-components"> instead of <arr name="components">.

But the "x-box" => "xbox" example ought to be solved by analyzing using WordDelimiterFilterFactory and "catenateWords=1" at query-time.  Did you re-index after changing your analysis chain (you need to)?  Perhaps you can show your full analyzer configuration, and someone here can help you find the problem. Also, the Analysis page on the solr Admin UI is invaluable for debugging text-field analyzer problems.

Getting "x box" to analyze to "xbox" is trickier (but possible).  The WordBreakSpellChecker is probably your best option if you have cases like this in your data & users' queries. 

Of course, if you have a finite number of products that have spelling variants like this, SynonymFilterFactory might be all you need.  I would recommend using index-time synonyms for your case rather than query-time synonyms.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] 
Sent: Wednesday, July 16, 2014 7:42 AM
To: solr-user@lucene.apache.org; jiag@ece.ubc.ca
Subject: Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Hi Jia,

What happens when you use 

 <arr name="last-components">

instead of 

 <arr name="components">

Ahmet


On Wednesday, July 16, 2014 3:07 AM, "jiag@ece.ubc.ca" <ji...@ece.ubc.ca> wrote:



Hello everyone :)

I have a product called "xbox" indexed, and when the user search for
either "x-box" or "x box" i want the "xbox" product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for "x-box" case, and
WordBreakSolrSpellChecker for "x box" case. Is this correct?

(1) In my schema file, this is what I changed:
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>

But I don't see the xbox product returned when the search term is
"x-box", so I must have missed something....

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">wc_textSpell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellCheck</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
          <float name="accuracy">0.3</float>
            <int name="maxEdits">2</int>
            <int name="minPrefix">1</int>
            <int name="maxInspections">5</int>
            <int name="minQueryLength">3</int>
            <float name="maxQueryFrequency">0.01</float>
            <float name="thresholdTokenFrequency">0.004</float>
    </lst>
<lst name="spellchecker">
    <str name="name">wordbreak</str>
    <str name="classname">solr.WordBreakSolrSpellChecker</str>
    <str name="field">spellCheck</str>
    <str name="combineWords">true</str>
    <str name="breakWords">true</str>
    <int name="maxChanges">10</int>
  </lst>
  </searchComponent>

  <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <str name="df">SpellCheck</str>
        <str name="spellcheck">true</str>
           <str name="spellcheck.dictionary">default</str>
        <str name="spellcheck.dictionary">wordbreak</str>
        <str name="spellcheck.build"> true</str>
           <str name="spellcheck.onlyMorePopular">false</str>
           <str name="spellcheck.count">10</str>
           <str name="spellcheck.collate">true</str>
           <str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
but the response returned is this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="spellcheck.build">true</str>
<str name="spellcheck">true</str>
</lst>
</lst>
<str name="command">build</str>
<result name="response" numFound="0" start="0"/>
</response>

What's the correct way to build the dictionary?
Even though my requestHandler's name="/spellcheck", i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">

  <str name="queryAnalyzerFieldType">wc_textSpell</str>
    <lst name="spellchecker">
    <str name="name">default</str>
    <str name="classname">solr.WordBreakSolrSpellChecker</str>
    <str name="field">spellCheck</str>
    <str name="combineWords">true</str>
    <str name="breakWords">true</str>
    <int name="maxChanges">10</int>
  </lst>
   </searchComponent>

   <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <str name="df">SpellCheck</str>
        <str name="spellcheck">true</str>
           <str name="spellcheck.dictionary">default</str>
        <!--<str name="spellcheck.dictionary">wordbreak</str> -->
        <str name="spellcheck.build"> true</str>
           <str name="spellcheck.onlyMorePopular">false</str>
           <str name="spellcheck.count">10</str>
           <str name="spellcheck.collate">true</str>
           <str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia



Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by ji...@ece.ubc.ca.
Hi Ahmet,

using <arr name="last-components"> or <arr name="components"> didn't
make any difference. Still running into the same issues aforementioned :(

Thanks,
Jia

On 7/16/2014, "Ahmet Arslan" <io...@yahoo.com> wrote:

>Hi Jia,
>
>What happens when you use 
>
> <arr name="last-components">
>
>instead of 
>
> <arr name="components">
>
>Ahmet
>
>
>On Wednesday, July 16, 2014 3:07 AM, "jiag@ece.ubc.ca" <ji...@ece.ubc.ca> wrote:
>
>
>
>Hello everyone :)
>
>I have a product called "xbox" indexed, and when the user search for
>either "x-box" or "x box" i want the "xbox" product to be
>returned.  I'm new to Solr, and from reading online, I thought I need
>to use WordDelimiterFilterFactory for "x-box" case, and
>WordBreakSolrSpellChecker for "x box" case. Is this correct?
>
>(1) In my schema file, this is what I changed:
><filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
>
>But I don't see the xbox product returned when the search term is
>"x-box", so I must have missed something....
>
>(2) I tried to use  WordBreakSolrSpellChecker together with
>DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
>never got used:
>
><searchComponent name="wc_spellcheck"
>class="solr.SpellCheckComponent">
>    <str name="queryAnalyzerFieldType">wc_textSpell</str>
>
>    <lst name="spellchecker">
>      <str name="name">default</str>
>      <str name="field">spellCheck</str>
>      <str name="classname">solr.DirectSolrSpellChecker</str>
>      <str name="distanceMeasure">internal</str>
>          <float name="accuracy">0.3</float>
>            <int name="maxEdits">2</int>
>            <int name="minPrefix">1</int>
>            <int name="maxInspections">5</int>
>            <int name="minQueryLength">3</int>
>            <float name="maxQueryFrequency">0.01</float>
>            <float name="thresholdTokenFrequency">0.004</float>
>    </lst>
><lst name="spellchecker">
>    <str name="name">wordbreak</str>
>    <str name="classname">solr.WordBreakSolrSpellChecker</str>
>    <str name="field">spellCheck</str>
>    <str name="combineWords">true</str>
>    <str name="breakWords">true</str>
>    <int name="maxChanges">10</int>
>  </lst>
>  </searchComponent>
>
>  <requestHandler name="/spellcheck"
>class="org.apache.solr.handler.component.SearchHandler">
>    <lst name="defaults">
>        <str name="df">SpellCheck</str>
>        <str name="spellcheck">true</str>
>           <str name="spellcheck.dictionary">default</str>
>        <str name="spellcheck.dictionary">wordbreak</str>
>        <str name="spellcheck.build"> true</str>
>           <str name="spellcheck.onlyMorePopular">false</str>
>           <str name="spellcheck.count">10</str>
>           <str name="spellcheck.collate">true</str>
>           <str name="spellcheck.collateExtendedResults">false</str>
>    </lst>
>    <arr name="components">
>      <str>wc_spellcheck</str>
>    </arr>
>  </requestHandler>
>
>I tried to build the dictionary this way:
>http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
>but the response returned is this:
><response>
><lst name="responseHeader">
><int name="status">0</int>
><int name="QTime">0</int>
><lst name="params">
><str name="spellcheck.build">true</str>
><str name="spellcheck">true</str>
></lst>
></lst>
><str name="command">build</str>
><result name="response" numFound="0" start="0"/>
></response>
>
>What's the correct way to build the dictionary?
>Even though my requestHandler's name="/spellcheck", i wasn't able to
>use
>http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
>.. is there something wrong with my definition above?
>
>(3) I also tried to use WordBreakSolrSpellChecker without the
>DirectSolrSpellChecker as shown below:
><searchComponent name="wc_spellcheck"
>class="solr.SpellCheckComponent">
>
>  <str name="queryAnalyzerFieldType">wc_textSpell</str>
>    <lst name="spellchecker">
>    <str name="name">default</str>
>    <str name="classname">solr.WordBreakSolrSpellChecker</str>
>    <str name="field">spellCheck</str>
>    <str name="combineWords">true</str>
>    <str name="breakWords">true</str>
>    <int name="maxChanges">10</int>
>  </lst>
>   </searchComponent>
>
>   <requestHandler name="/spellcheck"
>class="org.apache.solr.handler.component.SearchHandler">
>    <lst name="defaults">
>        <str name="df">SpellCheck</str>
>        <str name="spellcheck">true</str>
>           <str name="spellcheck.dictionary">default</str>
>        <!--<str name="spellcheck.dictionary">wordbreak</str> -->
>        <str name="spellcheck.build"> true</str>
>           <str name="spellcheck.onlyMorePopular">false</str>
>           <str name="spellcheck.count">10</str>
>           <str name="spellcheck.collate">true</str>
>           <str name="spellcheck.collateExtendedResults">false</str>
>    </lst>
>    <arr name="components">
>      <str>wc_spellcheck</str>
>    </arr>
>  </requestHandler>
>
>And still unable to see WordBreakSolrSpellChecker being called anywhere.
>
>Would someone kindly help me?
>
>Many thanks,
>Jia
>

Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Jia,

What happens when you use 

 <arr name="last-components">

instead of 

 <arr name="components">

Ahmet


On Wednesday, July 16, 2014 3:07 AM, "jiag@ece.ubc.ca" <ji...@ece.ubc.ca> wrote:



Hello everyone :)

I have a product called "xbox" indexed, and when the user search for
either "x-box" or "x box" i want the "xbox" product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for "x-box" case, and
WordBreakSolrSpellChecker for "x box" case. Is this correct?

(1) In my schema file, this is what I changed:
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>

But I don't see the xbox product returned when the search term is
"x-box", so I must have missed something....

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">wc_textSpell</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">spellCheck</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
          <float name="accuracy">0.3</float>
            <int name="maxEdits">2</int>
            <int name="minPrefix">1</int>
            <int name="maxInspections">5</int>
            <int name="minQueryLength">3</int>
            <float name="maxQueryFrequency">0.01</float>
            <float name="thresholdTokenFrequency">0.004</float>
    </lst>
<lst name="spellchecker">
    <str name="name">wordbreak</str>
    <str name="classname">solr.WordBreakSolrSpellChecker</str>
    <str name="field">spellCheck</str>
    <str name="combineWords">true</str>
    <str name="breakWords">true</str>
    <int name="maxChanges">10</int>
  </lst>
  </searchComponent>

  <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <str name="df">SpellCheck</str>
        <str name="spellcheck">true</str>
           <str name="spellcheck.dictionary">default</str>
        <str name="spellcheck.dictionary">wordbreak</str>
        <str name="spellcheck.build"> true</str>
           <str name="spellcheck.onlyMorePopular">false</str>
           <str name="spellcheck.count">10</str>
           <str name="spellcheck.collate">true</str>
           <str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
but the response returned is this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="spellcheck.build">true</str>
<str name="spellcheck">true</str>
</lst>
</lst>
<str name="command">build</str>
<result name="response" numFound="0" start="0"/>
</response>

What's the correct way to build the dictionary?
Even though my requestHandler's name="/spellcheck", i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
<searchComponent name="wc_spellcheck"
class="solr.SpellCheckComponent">

  <str name="queryAnalyzerFieldType">wc_textSpell</str>
    <lst name="spellchecker">
    <str name="name">default</str>
    <str name="classname">solr.WordBreakSolrSpellChecker</str>
    <str name="field">spellCheck</str>
    <str name="combineWords">true</str>
    <str name="breakWords">true</str>
    <int name="maxChanges">10</int>
  </lst>
   </searchComponent>

   <requestHandler name="/spellcheck"
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
        <str name="df">SpellCheck</str>
        <str name="spellcheck">true</str>
           <str name="spellcheck.dictionary">default</str>
        <!--<str name="spellcheck.dictionary">wordbreak</str> -->
        <str name="spellcheck.build"> true</str>
           <str name="spellcheck.onlyMorePopular">false</str>
           <str name="spellcheck.count">10</str>
           <str name="spellcheck.collate">true</str>
           <str name="spellcheck.collateExtendedResults">false</str>
    </lst>
    <arr name="components">
      <str>wc_spellcheck</str>
    </arr>
  </requestHandler>

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia