You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nathaniel Rudavsky-Brody <na...@gmail.com> on 2014/09/22 15:07:45 UTC

fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".

I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but isn't 
what I want here.

Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:

  <searchComponent name="fuzzyterms" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">fuzzy1</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
         <int name="maxEdits">1</int>
   	...
    </lst>
    <lst name="spellchecker">
      <str name="name">fuzzy2</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
         <int name="maxEdits">2</int>
    ...
    </lst>
  </searchComponent>

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.

Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
You cannot use 100% because, as you say, 1 is intepreted as "1 document".  But you can do something like 99.99999% .

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
Sent: Monday, September 22, 2014 11:39 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line "A lower 
threshold is better for small indexes" when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)

Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> DirectSpellChecker defaults to not suggest anything for terms that 
> occur in 1% or more of the total documents in the index.  You can set 
> this higher in solrconfig.xml either with a fractional percent or a 
> whole-number absolute number of documents.
> 
> See 
> http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29 
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Yep, I tried it both as a default param in the request handler (as in 
> the config I sent), and in the request, but with no effect... That's 
> what surprised me, since it seems it should work.
> 
> On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
> <Ja...@ingramcontent.com> wrote:
>>  Did you try "spellcheck.alternativeTermCount" with 
>>  DirectSolrSpellChecker?  You can set it to whatever low value you 
>>  actually want it to return back to you (perhaps 20 suggestions 
>> max?).
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -----Original Message-----
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudavsky@gmail.com] 
>>  Sent: Monday, September 22, 2014 9:36 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
>>  alternativeTermCount
>>  
>>  Hi James,
>>  
>>  The request 
>>  
>> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
>>  returns
>>  
>>  quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
>>  quis, quae, quas, quem, quid, quin, qui, qua
>>  
>>  Replacing quiam (not in the index) by quidam (in the index) returns 
>>  nothing at all, but I want it to return
>>  
>>  quidam, quam, quia, quidem, quadam, quodam, quedam, ...
>>  
>>  When I was using the same parameters with IndexBasedSpellChecker, 
>> by 
>>  setting a high alternativeTermCount, I got results for both. But as 
>> I 
>>  said, then I can't differentiate the different maxEdits.
>>  
>>  The request handler is:
>>  
>>   <requestHandler name="/spellcheck" 
>>  class="org.apache.solr.handler.component.SearchHandler">
>>      <lst name="defaults">
>>        <str name="spellcheck.dictionary">fuzzy1</str>
>>        <str name="spellcheck.count">20</str>
>>        <int name="spellcheck.alternativeTermCount">1000000</int>
>>      </lst>
>>      <arr name="last-components">
>>        <str>fuzzyterms</str>
>>      </arr>
>>    </requestHandler>
>>  
>>  Thanks!
>>  
>>  Nathaniel
>>  
>>  On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>>  <Ja...@ingramcontent.com> wrote:
>>>   Nathaniel,
>>>   
>>>   Can you show us all of the parameters you are sending to the 
>>>   spellchecker?  When you specify "alternativeTermCount" with 
>>>   "spellcheck.q=quidam", what are the terms you expect to get back? 
>>>  
>>>   Also, are you getting any query results back?  If you are using a 
>>>  "q" 
>>>   that returns results, or more results than you specify for 
>>>   "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>>>  anything 
>>>   regardless of what you put for "spellcheck.q".
>>>   
>>>   James Dyer
>>>   Ingram Content Group
>>>   (615) 213-4311
>>>   
>>>   
>>>   -----Original Message-----
>>>   From: Nathaniel Rudavsky-Brody 
>>>  [mailto:nathaniel.rudavsky@gmail.com] 
>>>   Sent: Monday, September 22, 2014 8:08 AM
>>>   To: solr-user@lucene.apache.org
>>>   Subject: fuzzy terms, DirectSolrSpellChecker and 
>>>  alternativeTermCount
>>>   
>>>   Hello,
>>>   
>>>   I'm trying find the best way to "fake" the terms component for 
>>>  fuzzy 
>>>   queries. That is, I need the full set of index terms for each of 
>>>  the 
>>>   two queries "quidam~1" and "quidam~2".
>>>   
>>>   I tried defining two suggesters with FuzzyLookupFactory, with 
>>>   maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>>>  include 
>>>   suffixes like "quodammodo", which makes sense for a suggester but 
>>>   isn't 
>>>   what I want here.
>>>   
>>>   Now I'm trying with the spell-checker. As far as I can see, 
>>>   IndexBasedSpellChecker doesn't let me set maxEdits, so I can't 
>>> use 
>>>  it 
>>>   to distinguish between my two queries. DirectSolrSpellChecker 
>>> seems 
>>>   like it should work, ie:
>>>   
>>>     <searchComponent name="fuzzyterms" 
>>>  class="solr.SpellCheckComponent">
>>>       <lst name="spellchecker">
>>>         <str name="name">fuzzy1</str>
>>>         <str name="classname">solr.DirectSolrSpellChecker</str>
>>>            <int name="maxEdits">1</int>
>>>      	...
>>>       </lst>
>>>       <lst name="spellchecker">
>>>         <str name="name">fuzzy2</str>
>>>         <str name="classname">solr.DirectSolrSpellChecker</str>
>>>            <int name="maxEdits">2</int>
>>>       ...
>>>       </lst>
>>>     </searchComponent>
>>>   
>>>   However the parameter spellcheck.alternativeTermCount has no 
>>>  effect, 
>>>   so 
>>>   the query "spellcheck.q=quidam" gives no results, but 
>>>   "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>>   expected terms.
>>>   
>>>   Am I missing something? Or is there a better way to do this?
>>>   
>>>   Many thanks for any help and ideas,
>>>   
>>>   Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by Nathaniel Rudavsky-Brody <na...@gmail.com>.
Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line "A lower 
threshold is better for small indexes" when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)

Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> DirectSpellChecker defaults to not suggest anything for terms that 
> occur in 1% or more of the total documents in the index.  You can set 
> this higher in solrconfig.xml either with a fractional percent or a 
> whole-number absolute number of documents.
> 
> See 
> http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29 
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Yep, I tried it both as a default param in the request handler (as in 
> the config I sent), and in the request, but with no effect... That's 
> what surprised me, since it seems it should work.
> 
> On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
> <Ja...@ingramcontent.com> wrote:
>>  Did you try "spellcheck.alternativeTermCount" with 
>>  DirectSolrSpellChecker?  You can set it to whatever low value you 
>>  actually want it to return back to you (perhaps 20 suggestions 
>> max?).
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -----Original Message-----
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudavsky@gmail.com] 
>>  Sent: Monday, September 22, 2014 9:36 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
>>  alternativeTermCount
>>  
>>  Hi James,
>>  
>>  The request 
>>  
>> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
>>  returns
>>  
>>  quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
>>  quis, quae, quas, quem, quid, quin, qui, qua
>>  
>>  Replacing quiam (not in the index) by quidam (in the index) returns 
>>  nothing at all, but I want it to return
>>  
>>  quidam, quam, quia, quidem, quadam, quodam, quedam, ...
>>  
>>  When I was using the same parameters with IndexBasedSpellChecker, 
>> by 
>>  setting a high alternativeTermCount, I got results for both. But as 
>> I 
>>  said, then I can't differentiate the different maxEdits.
>>  
>>  The request handler is:
>>  
>>   <requestHandler name="/spellcheck" 
>>  class="org.apache.solr.handler.component.SearchHandler">
>>      <lst name="defaults">
>>        <str name="spellcheck.dictionary">fuzzy1</str>
>>        <str name="spellcheck.count">20</str>
>>        <int name="spellcheck.alternativeTermCount">1000000</int>
>>      </lst>
>>      <arr name="last-components">
>>        <str>fuzzyterms</str>
>>      </arr>
>>    </requestHandler>
>>  
>>  Thanks!
>>  
>>  Nathaniel
>>  
>>  On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>>  <Ja...@ingramcontent.com> wrote:
>>>   Nathaniel,
>>>   
>>>   Can you show us all of the parameters you are sending to the 
>>>   spellchecker?  When you specify "alternativeTermCount" with 
>>>   "spellcheck.q=quidam", what are the terms you expect to get back? 
>>>  
>>>   Also, are you getting any query results back?  If you are using a 
>>>  "q" 
>>>   that returns results, or more results than you specify for 
>>>   "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>>>  anything 
>>>   regardless of what you put for "spellcheck.q".
>>>   
>>>   James Dyer
>>>   Ingram Content Group
>>>   (615) 213-4311
>>>   
>>>   
>>>   -----Original Message-----
>>>   From: Nathaniel Rudavsky-Brody 
>>>  [mailto:nathaniel.rudavsky@gmail.com] 
>>>   Sent: Monday, September 22, 2014 8:08 AM
>>>   To: solr-user@lucene.apache.org
>>>   Subject: fuzzy terms, DirectSolrSpellChecker and 
>>>  alternativeTermCount
>>>   
>>>   Hello,
>>>   
>>>   I'm trying find the best way to "fake" the terms component for 
>>>  fuzzy 
>>>   queries. That is, I need the full set of index terms for each of 
>>>  the 
>>>   two queries "quidam~1" and "quidam~2".
>>>   
>>>   I tried defining two suggesters with FuzzyLookupFactory, with 
>>>   maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>>>  include 
>>>   suffixes like "quodammodo", which makes sense for a suggester but 
>>>   isn't 
>>>   what I want here.
>>>   
>>>   Now I'm trying with the spell-checker. As far as I can see, 
>>>   IndexBasedSpellChecker doesn't let me set maxEdits, so I can't 
>>> use 
>>>  it 
>>>   to distinguish between my two queries. DirectSolrSpellChecker 
>>> seems 
>>>   like it should work, ie:
>>>   
>>>     <searchComponent name="fuzzyterms" 
>>>  class="solr.SpellCheckComponent">
>>>       <lst name="spellchecker">
>>>         <str name="name">fuzzy1</str>
>>>         <str name="classname">solr.DirectSolrSpellChecker</str>
>>>            <int name="maxEdits">1</int>
>>>      	...
>>>       </lst>
>>>       <lst name="spellchecker">
>>>         <str name="name">fuzzy2</str>
>>>         <str name="classname">solr.DirectSolrSpellChecker</str>
>>>            <int name="maxEdits">2</int>
>>>       ...
>>>       </lst>
>>>     </searchComponent>
>>>   
>>>   However the parameter spellcheck.alternativeTermCount has no 
>>>  effect, 
>>>   so 
>>>   the query "spellcheck.q=quidam" gives no results, but 
>>>   "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>>   expected terms.
>>>   
>>>   Am I missing something? Or is there a better way to do this?
>>>   
>>>   Many thanks for any help and ideas,
>>>   
>>>   Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
DirectSpellChecker defaults to not suggest anything for terms that occur in 1% or more of the total documents in the index.  You can set this higher in solrconfig.xml either with a fractional percent or a whole-number absolute number of documents.

See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29 

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
Sent: Monday, September 22, 2014 9:41 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.

On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> Did you try "spellcheck.alternativeTermCount" with 
> DirectSolrSpellChecker?  You can set it to whatever low value you 
> actually want it to return back to you (perhaps 20 suggestions max?).
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 9:36 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Hi James,
> 
> The request 
> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
> returns
> 
> quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
> quis, quae, quas, quem, quid, quin, qui, qua
> 
> Replacing quiam (not in the index) by quidam (in the index) returns 
> nothing at all, but I want it to return
> 
> quidam, quam, quia, quidem, quadam, quodam, quedam, ...
> 
> When I was using the same parameters with IndexBasedSpellChecker, by 
> setting a high alternativeTermCount, I got results for both. But as I 
> said, then I can't differentiate the different maxEdits.
> 
> The request handler is:
> 
>  <requestHandler name="/spellcheck" 
> class="org.apache.solr.handler.component.SearchHandler">
>     <lst name="defaults">
>       <str name="spellcheck.dictionary">fuzzy1</str>
>       <str name="spellcheck.count">20</str>
>       <int name="spellcheck.alternativeTermCount">1000000</int>
>     </lst>
>     <arr name="last-components">
>       <str>fuzzyterms</str>
>     </arr>
>   </requestHandler>
> 
> Thanks!
> 
> Nathaniel
> 
> On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
> <Ja...@ingramcontent.com> wrote:
>>  Nathaniel,
>>  
>>  Can you show us all of the parameters you are sending to the 
>>  spellchecker?  When you specify "alternativeTermCount" with 
>>  "spellcheck.q=quidam", what are the terms you expect to get back?  
>>  Also, are you getting any query results back?  If you are using a 
>> "q" 
>>  that returns results, or more results than you specify for 
>>  "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>> anything 
>>  regardless of what you put for "spellcheck.q".
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -----Original Message-----
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudavsky@gmail.com] 
>>  Sent: Monday, September 22, 2014 8:08 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: fuzzy terms, DirectSolrSpellChecker and 
>> alternativeTermCount
>>  
>>  Hello,
>>  
>>  I'm trying find the best way to "fake" the terms component for 
>> fuzzy 
>>  queries. That is, I need the full set of index terms for each of 
>> the 
>>  two queries "quidam~1" and "quidam~2".
>>  
>>  I tried defining two suggesters with FuzzyLookupFactory, with 
>>  maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>> include 
>>  suffixes like "quodammodo", which makes sense for a suggester but 
>>  isn't 
>>  what I want here.
>>  
>>  Now I'm trying with the spell-checker. As far as I can see, 
>>  IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use 
>> it 
>>  to distinguish between my two queries. DirectSolrSpellChecker seems 
>>  like it should work, ie:
>>  
>>    <searchComponent name="fuzzyterms" 
>> class="solr.SpellCheckComponent">
>>      <lst name="spellchecker">
>>        <str name="name">fuzzy1</str>
>>        <str name="classname">solr.DirectSolrSpellChecker</str>
>>           <int name="maxEdits">1</int>
>>     	...
>>      </lst>
>>      <lst name="spellchecker">
>>        <str name="name">fuzzy2</str>
>>        <str name="classname">solr.DirectSolrSpellChecker</str>
>>           <int name="maxEdits">2</int>
>>      ...
>>      </lst>
>>    </searchComponent>
>>  
>>  However the parameter spellcheck.alternativeTermCount has no 
>> effect, 
>>  so 
>>  the query "spellcheck.q=quidam" gives no results, but 
>>  "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>  expected terms.
>>  
>>  Am I missing something? Or is there a better way to do this?
>>  
>>  Many thanks for any help and ideas,
>>  
>>  Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by Nathaniel Rudavsky-Brody <na...@gmail.com>.
Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.

On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> Did you try "spellcheck.alternativeTermCount" with 
> DirectSolrSpellChecker?  You can set it to whatever low value you 
> actually want it to return back to you (perhaps 20 suggestions max?).
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 9:36 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Hi James,
> 
> The request 
> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
> returns
> 
> quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
> quis, quae, quas, quem, quid, quin, qui, qua
> 
> Replacing quiam (not in the index) by quidam (in the index) returns 
> nothing at all, but I want it to return
> 
> quidam, quam, quia, quidem, quadam, quodam, quedam, ...
> 
> When I was using the same parameters with IndexBasedSpellChecker, by 
> setting a high alternativeTermCount, I got results for both. But as I 
> said, then I can't differentiate the different maxEdits.
> 
> The request handler is:
> 
>  <requestHandler name="/spellcheck" 
> class="org.apache.solr.handler.component.SearchHandler">
>     <lst name="defaults">
>       <str name="spellcheck.dictionary">fuzzy1</str>
>       <str name="spellcheck.count">20</str>
>       <int name="spellcheck.alternativeTermCount">1000000</int>
>     </lst>
>     <arr name="last-components">
>       <str>fuzzyterms</str>
>     </arr>
>   </requestHandler>
> 
> Thanks!
> 
> Nathaniel
> 
> On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
> <Ja...@ingramcontent.com> wrote:
>>  Nathaniel,
>>  
>>  Can you show us all of the parameters you are sending to the 
>>  spellchecker?  When you specify "alternativeTermCount" with 
>>  "spellcheck.q=quidam", what are the terms you expect to get back?  
>>  Also, are you getting any query results back?  If you are using a 
>> "q" 
>>  that returns results, or more results than you specify for 
>>  "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>> anything 
>>  regardless of what you put for "spellcheck.q".
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -----Original Message-----
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudavsky@gmail.com] 
>>  Sent: Monday, September 22, 2014 8:08 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: fuzzy terms, DirectSolrSpellChecker and 
>> alternativeTermCount
>>  
>>  Hello,
>>  
>>  I'm trying find the best way to "fake" the terms component for 
>> fuzzy 
>>  queries. That is, I need the full set of index terms for each of 
>> the 
>>  two queries "quidam~1" and "quidam~2".
>>  
>>  I tried defining two suggesters with FuzzyLookupFactory, with 
>>  maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>> include 
>>  suffixes like "quodammodo", which makes sense for a suggester but 
>>  isn't 
>>  what I want here.
>>  
>>  Now I'm trying with the spell-checker. As far as I can see, 
>>  IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use 
>> it 
>>  to distinguish between my two queries. DirectSolrSpellChecker seems 
>>  like it should work, ie:
>>  
>>    <searchComponent name="fuzzyterms" 
>> class="solr.SpellCheckComponent">
>>      <lst name="spellchecker">
>>        <str name="name">fuzzy1</str>
>>        <str name="classname">solr.DirectSolrSpellChecker</str>
>>           <int name="maxEdits">1</int>
>>     	...
>>      </lst>
>>      <lst name="spellchecker">
>>        <str name="name">fuzzy2</str>
>>        <str name="classname">solr.DirectSolrSpellChecker</str>
>>           <int name="maxEdits">2</int>
>>      ...
>>      </lst>
>>    </searchComponent>
>>  
>>  However the parameter spellcheck.alternativeTermCount has no 
>> effect, 
>>  so 
>>  the query "spellcheck.q=quidam" gives no results, but 
>>  "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>  expected terms.
>>  
>>  Am I missing something? Or is there a better way to do this?
>>  
>>  Many thanks for any help and ideas,
>>  
>>  Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Did you try "spellcheck.alternativeTermCount" with DirectSolrSpellChecker?  You can set it to whatever low value you actually want it to return back to you (perhaps 20 suggestions max?).

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
Sent: Monday, September 22, 2014 9:36 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns

quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua

Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return

quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.

The request handler is:

 <requestHandler name="/spellcheck" 
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
      <str name="spellcheck.dictionary">fuzzy1</str>
      <str name="spellcheck.count">20</str>
      <int name="spellcheck.alternativeTermCount">1000000</int>
    </lst>
    <arr name="last-components">
      <str>fuzzyterms</str>
    </arr>
  </requestHandler>

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> Nathaniel,
> 
> Can you show us all of the parameters you are sending to the 
> spellchecker?  When you specify "alternativeTermCount" with 
> "spellcheck.q=quidam", what are the terms you expect to get back?  
> Also, are you getting any query results back?  If you are using a "q" 
> that returns results, or more results than you specify for 
> "spellcheck.maxResultsForSuggest", spellcheck won't give you anything 
> regardless of what you put for "spellcheck.q".
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 8:08 AM
> To: solr-user@lucene.apache.org
> Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount
> 
> Hello,
> 
> I'm trying find the best way to "fake" the terms component for fuzzy 
> queries. That is, I need the full set of index terms for each of the 
> two queries "quidam~1" and "quidam~2".
> 
> I tried defining two suggesters with FuzzyLookupFactory, with 
> maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
> suffixes like "quodammodo", which makes sense for a suggester but 
> isn't 
> what I want here.
> 
> Now I'm trying with the spell-checker. As far as I can see, 
> IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
> to distinguish between my two queries. DirectSolrSpellChecker seems 
> like it should work, ie:
> 
>   <searchComponent name="fuzzyterms" class="solr.SpellCheckComponent">
>     <lst name="spellchecker">
>       <str name="name">fuzzy1</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>          <int name="maxEdits">1</int>
>    	...
>     </lst>
>     <lst name="spellchecker">
>       <str name="name">fuzzy2</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>          <int name="maxEdits">2</int>
>     ...
>     </lst>
>   </searchComponent>
> 
> However the parameter spellcheck.alternativeTermCount has no effect, 
> so 
> the query "spellcheck.q=quidam" gives no results, but 
> "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
> expected terms.
> 
> Am I missing something? Or is there a better way to do this?
> 
> Many thanks for any help and ideas,
> 
> Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by Nathaniel Rudavsky-Brody <na...@gmail.com>.
Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns

quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua

Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return

quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.

The request handler is:

 <requestHandler name="/spellcheck" 
class="org.apache.solr.handler.component.SearchHandler">
    <lst name="defaults">
      <str name="spellcheck.dictionary">fuzzy1</str>
      <str name="spellcheck.count">20</str>
      <int name="spellcheck.alternativeTermCount">1000000</int>
    </lst>
    <arr name="last-components">
      <str>fuzzyterms</str>
    </arr>
  </requestHandler>

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
<Ja...@ingramcontent.com> wrote:
> Nathaniel,
> 
> Can you show us all of the parameters you are sending to the 
> spellchecker?  When you specify "alternativeTermCount" with 
> "spellcheck.q=quidam", what are the terms you expect to get back?  
> Also, are you getting any query results back?  If you are using a "q" 
> that returns results, or more results than you specify for 
> "spellcheck.maxResultsForSuggest", spellcheck won't give you anything 
> regardless of what you put for "spellcheck.q".
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
> Sent: Monday, September 22, 2014 8:08 AM
> To: solr-user@lucene.apache.org
> Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount
> 
> Hello,
> 
> I'm trying find the best way to "fake" the terms component for fuzzy 
> queries. That is, I need the full set of index terms for each of the 
> two queries "quidam~1" and "quidam~2".
> 
> I tried defining two suggesters with FuzzyLookupFactory, with 
> maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
> suffixes like "quodammodo", which makes sense for a suggester but 
> isn't 
> what I want here.
> 
> Now I'm trying with the spell-checker. As far as I can see, 
> IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
> to distinguish between my two queries. DirectSolrSpellChecker seems 
> like it should work, ie:
> 
>   <searchComponent name="fuzzyterms" class="solr.SpellCheckComponent">
>     <lst name="spellchecker">
>       <str name="name">fuzzy1</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>          <int name="maxEdits">1</int>
>    	...
>     </lst>
>     <lst name="spellchecker">
>       <str name="name">fuzzy2</str>
>       <str name="classname">solr.DirectSolrSpellChecker</str>
>          <int name="maxEdits">2</int>
>     ...
>     </lst>
>   </searchComponent>
> 
> However the parameter spellcheck.alternativeTermCount has no effect, 
> so 
> the query "spellcheck.q=quidam" gives no results, but 
> "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
> expected terms.
> 
> Am I missing something? Or is there a better way to do this?
> 
> Many thanks for any help and ideas,
> 
> Nathaniel

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Nathaniel,

Can you show us all of the parameters you are sending to the spellchecker?  When you specify "alternativeTermCount" with "spellcheck.q=quidam", what are the terms you expect to get back?  Also, are you getting any query results back?  If you are using a "q" that returns results, or more results than you specify for "spellcheck.maxResultsForSuggest", spellcheck won't give you anything regardless of what you put for "spellcheck.q".

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudavsky@gmail.com] 
Sent: Monday, September 22, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".

I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but isn't 
what I want here.

Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:

  <searchComponent name="fuzzyterms" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">fuzzy1</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
         <int name="maxEdits">1</int>
   	...
    </lst>
    <lst name="spellchecker">
      <str name="name">fuzzy2</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
         <int name="maxEdits">2</int>
    ...
    </lst>
  </searchComponent>

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.

Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel