You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Michael Engelke <th...@posteo.de> on 2014/09/10 12:00:21 UTC

Solr Spellcheck suggestions only return from /select handler when returning search results

 Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 <searchComponent name="spellcheck"
class="solr.SpellCheckComponent">
 <str
name="queryAnalyzerFieldType">spell</str>
 <!-- Multiple "Spell
Checkers" can be declared and used by this
 component
 -->
 <!-- a
spellchecker built from a field of the main index -->
 <lst
name="spellchecker">
 <str name="name">default</str>
 <str
name="field">spell</str>
 <str
name="classname">solr.DirectSolrSpellChecker</str>
 <!-- the spellcheck
distance measure used, the default is the internal levenshtein -->
 <str
name="distanceMeasure">internal</str>
 <!-- uncomment this to require
suggestions to occur in 1% of the documents
 <float
name="thresholdTokenFrequency">.01</float>
 -->
 </lst>
 <!-- a
spellchecker that can break or combine words. See "/spell" handler below
for usage -->
 <lst name="spellchecker">
 <str
name="name">wordbreak</str>
 <str
name="classname">solr.WordBreakSolrSpellChecker</str>
 <str
name="field">spell</str>
 <str name="combineWords">true</str>
 <str
name="breakWords">true</str>
 <int name="maxChanges">10</int>
 </lst>

</searchComponent>

And I've added the spellcheck component to my
/select request handler:

 <requestHandler name="/select"
class="solr.SearchHandler">
 ...
 <arr name="last-components">

<str>spellcheck</str>
 </arr>
 </requestHandler>

I have built up the
spellchecker source in the schema.xml from the name field:

 <field
name="spell" type="spell" indexed="true" stored="true" required="false"
multiValued="false"/>
 <copyField source="name" dest="spell"
maxChars="30000" />
 ...
 <fieldType name="spell" class="solr.TextField"
positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer
class="solr.StandardTokenizerFactory"/>
 </analyzer>
 <analyzer
type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>

</analyzer>
 </fieldType>

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates "Sichtscheibe" and "Sichtscheiben" into
"Sichtscheib", so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?

 

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Thomas,

Yes, you are right about the problem being with the beginning of the word needing correction.  If you are using DirectSolrSpellChecker, you need to set the "minPrefix" parameter to 0.  Otherwise the default (1) requires the first character to match for it to try and correct it.

See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Thomas Michael Engelke [mailto:thomas.engelke@posteo.de] 
Sent: Thursday, September 11, 2014 3:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck suggestions only return from /select handler when returning search results

 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from "ichtscheiben":

{
 "responseHeader": {

"status": 0,
 "QTime": 0,
 "params": {
 "fl": "name,spell",
 "indent":
"true",
 "q": "name:Sichtscheiben",
 "_": "1410423419758",
 "wt":
"json",
 "rows": "50"
 }
 },
 "response": {
 "numFound": 6,
 "start":
0,
 "docs": [
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"

},
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {

"name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: "Transport". I get suggestions
when I use "Transpor" and "Transpo", even "Transpotr", but "ransport"
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

> Thomas,
> 
> It looks like you've set
things up correctly in that while the user is searching against a
stemmed field ("name"), spellcheck is checking against a
lightly-analyzed copy of it ("spell"). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
> 
> But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set "spellcheck.maxResultsForSuggest=0".
> 
> Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
"ichtscheiben" ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
> 
> Also, even if you do have
something within 2 edits, if "ichtscheiben" occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set "spellcheck.alternativeTermCount" to a
non-zero value (try maybe 5).
> 
> See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
> 
> James Dyer
> Ingram Content Group
>
(615) 213-4311
> 
> -----Original Message-----
> From: Thomas Michael
Engelke [mailto:thomas.engelke@posteo.de] 
> Sent: Wednesday, September
10, 2014 5:00 AM
> To: Solr user
> Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
> 
>
Hi,
> 
> I'm experimenting with the Spellcheck component and have
therefor
> used the example configuration for spell checking to try
things out. My
> solrconfig.xml looks like this:
> 
> <searchComponent
name="spellcheck"
> class="solr.SpellCheckComponent">
> <str
>
name="queryAnalyzerFieldType">spell</str>
> <!-- Multiple "Spell
>
Checkers" can be declared and used by this
> component
> -->
> <!-- a
>
spellchecker built from a field of the main index -->
> <lst
>
name="spellchecker">
> <str name="name">default</str>
> <str
>
name="field">spell</str>
> <str
>
name="classname">solr.DirectSolrSpellChecker</str>
> <!-- the
spellcheck
> distance measure used, the default is the internal
levenshtein -->
> <str
> name="distanceMeasure">internal</str>
> <!--
uncomment this to require
> suggestions to occur in 1% of the
documents
> <float
> name="thresholdTokenFrequency">.01</float>
> -->
>
</lst>
> <!-- a
> spellchecker that can break or combine words. See
"/spell" handler below
> for usage -->
> <lst name="spellchecker">
>
<str
> name="name">wordbreak</str>
> <str
>
name="classname">solr.WordBreakSolrSpellChecker</str>
> <str
>
name="field">spell</str>
> <str name="combineWords">true</str>
> <str
>
name="breakWords">true</str>
> <int name="maxChanges">10</int>
>
</lst>
> 
> </searchComponent>
> 
> And I've added the spellcheck
component to my
> /select request handler:
> 
> <requestHandler
name="/select"
> class="solr.SearchHandler">
> ...
> <arr
name="last-components">
> 
> <str>spellcheck</str>
> </arr>
>
</requestHandler>
> 
> I have built up the
> spellchecker source in the
schema.xml from the name field:
> 
> <field
> name="spell" type="spell"
indexed="true" stored="true" required="false"
> multiValued="false"/>
>
<copyField source="name" dest="spell"
> maxChars="30000" />
> ...
>
<fieldType name="spell" class="solr.TextField"
>
positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer
>
class="solr.StandardTokenizerFactory"/>
> </analyzer>
> <analyzer
>
type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> 
>
</analyzer>
> </fieldType>
> 
> As I'm querying the /select request
handler,
> I should get spellcheck suggestions with my results. However,
I rarely
> get a suggestion. Examples:
> 
> query: Sichtscheibe,
spellcheck suggestion:
> Sichtscheiben (works)
> query: Sichtscheib,
spellcheck suggestion:
> Sichtscheiben (works)
> query: ichtscheiben, no
spellcheck suggestions
> 
> As
> far as I can identify, I only get
suggestions when I get real search
> results. I get results for the
first 2 examples, because the german
> StemFilterFactory translates
"Sichtscheibe" and "Sichtscheiben" into
> "Sichtscheib", so there are
matches found. However, the third query
> should result in a suggestion,
as the Levenshtein distance is less than
> in the second example.
> 
>
Suggestions, improvements, corrections?

 

Links:
------
[1]
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

Posted by Thomas Michael Engelke <th...@posteo.de>.
 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from "ichtscheiben":

{
 "responseHeader": {

"status": 0,
 "QTime": 0,
 "params": {
 "fl": "name,spell",
 "indent":
"true",
 "q": "name:Sichtscheiben",
 "_": "1410423419758",
 "wt":
"json",
 "rows": "50"
 }
 },
 "response": {
 "numFound": 6,
 "start":
0,
 "docs": [
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"

},
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {

"name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: "Transport". I get suggestions
when I use "Transpor" and "Transpo", even "Transpotr", but "ransport"
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

> Thomas,
> 
> It looks like you've set
things up correctly in that while the user is searching against a
stemmed field ("name"), spellcheck is checking against a
lightly-analyzed copy of it ("spell"). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
> 
> But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set "spellcheck.maxResultsForSuggest=0".
> 
> Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
"ichtscheiben" ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
> 
> Also, even if you do have
something within 2 edits, if "ichtscheiben" occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set "spellcheck.alternativeTermCount" to a
non-zero value (try maybe 5).
> 
> See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
> 
> James Dyer
> Ingram Content Group
>
(615) 213-4311
> 
> -----Original Message-----
> From: Thomas Michael
Engelke [mailto:thomas.engelke@posteo.de] 
> Sent: Wednesday, September
10, 2014 5:00 AM
> To: Solr user
> Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
> 
>
Hi,
> 
> I'm experimenting with the Spellcheck component and have
therefor
> used the example configuration for spell checking to try
things out. My
> solrconfig.xml looks like this:
> 
> <searchComponent
name="spellcheck"
> class="solr.SpellCheckComponent">
> <str
>
name="queryAnalyzerFieldType">spell</str>
> <!-- Multiple "Spell
>
Checkers" can be declared and used by this
> component
> -->
> <!-- a
>
spellchecker built from a field of the main index -->
> <lst
>
name="spellchecker">
> <str name="name">default</str>
> <str
>
name="field">spell</str>
> <str
>
name="classname">solr.DirectSolrSpellChecker</str>
> <!-- the
spellcheck
> distance measure used, the default is the internal
levenshtein -->
> <str
> name="distanceMeasure">internal</str>
> <!--
uncomment this to require
> suggestions to occur in 1% of the
documents
> <float
> name="thresholdTokenFrequency">.01</float>
> -->
>
</lst>
> <!-- a
> spellchecker that can break or combine words. See
"/spell" handler below
> for usage -->
> <lst name="spellchecker">
>
<str
> name="name">wordbreak</str>
> <str
>
name="classname">solr.WordBreakSolrSpellChecker</str>
> <str
>
name="field">spell</str>
> <str name="combineWords">true</str>
> <str
>
name="breakWords">true</str>
> <int name="maxChanges">10</int>
>
</lst>
> 
> </searchComponent>
> 
> And I've added the spellcheck
component to my
> /select request handler:
> 
> <requestHandler
name="/select"
> class="solr.SearchHandler">
> ...
> <arr
name="last-components">
> 
> <str>spellcheck</str>
> </arr>
>
</requestHandler>
> 
> I have built up the
> spellchecker source in the
schema.xml from the name field:
> 
> <field
> name="spell" type="spell"
indexed="true" stored="true" required="false"
> multiValued="false"/>
>
<copyField source="name" dest="spell"
> maxChars="30000" />
> ...
>
<fieldType name="spell" class="solr.TextField"
>
positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer
>
class="solr.StandardTokenizerFactory"/>
> </analyzer>
> <analyzer
>
type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> 
>
</analyzer>
> </fieldType>
> 
> As I'm querying the /select request
handler,
> I should get spellcheck suggestions with my results. However,
I rarely
> get a suggestion. Examples:
> 
> query: Sichtscheibe,
spellcheck suggestion:
> Sichtscheiben (works)
> query: Sichtscheib,
spellcheck suggestion:
> Sichtscheiben (works)
> query: ichtscheiben, no
spellcheck suggestions
> 
> As
> far as I can identify, I only get
suggestions when I get real search
> results. I get results for the
first 2 examples, because the german
> StemFilterFactory translates
"Sichtscheibe" and "Sichtscheiben" into
> "Sichtscheib", so there are
matches found. However, the third query
> should result in a suggestion,
as the Levenshtein distance is less than
> in the second example.
> 
>
Suggestions, improvements, corrections?

 

Links:
------
[1]
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Thomas,

It looks like you've set things up correctly in that while the user is searching against a stemmed field ("name"), spellcheck is checking against a lightly-analyzed copy of it ("spell").  This is the right way to do it as spellcheck against stemmed forms is usually undesirable.

But as you've experienced, you will sometimes get results (due to stemming) and also suggestions (because the spellechecker is looking at unstemmed forms).  If you do not want spellcheck to return anything when you get results, you can set "spellcheck.maxResultsForSuggest=0".

Now keeping in mind we're comparing unstemmed forms, can you verify you indeed have something in your index that is within 2 edits of "ichtscheiben" ?  My guess is you probably don't, which would be why you do not get spelling results in that case.

Also, even if you do have something within 2 edits, if "ichtscheiben" occurs in your index, by default it won't try to correct it at all (even if the query returns nothing, maybe because of filters or other required terms on the query).  In this case you need to set "spellcheck.alternativeTermCount" to a non-zero value (try maybe 5).

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount and following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Thomas Michael Engelke [mailto:thomas.engelke@posteo.de] 
Sent: Wednesday, September 10, 2014 5:00 AM
To: Solr user
Subject: Solr Spellcheck suggestions only return from /select handler when returning search results

 Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 <searchComponent name="spellcheck"
class="solr.SpellCheckComponent">
 <str
name="queryAnalyzerFieldType">spell</str>
 <!-- Multiple "Spell
Checkers" can be declared and used by this
 component
 -->
 <!-- a
spellchecker built from a field of the main index -->
 <lst
name="spellchecker">
 <str name="name">default</str>
 <str
name="field">spell</str>
 <str
name="classname">solr.DirectSolrSpellChecker</str>
 <!-- the spellcheck
distance measure used, the default is the internal levenshtein -->
 <str
name="distanceMeasure">internal</str>
 <!-- uncomment this to require
suggestions to occur in 1% of the documents
 <float
name="thresholdTokenFrequency">.01</float>
 -->
 </lst>
 <!-- a
spellchecker that can break or combine words. See "/spell" handler below
for usage -->
 <lst name="spellchecker">
 <str
name="name">wordbreak</str>
 <str
name="classname">solr.WordBreakSolrSpellChecker</str>
 <str
name="field">spell</str>
 <str name="combineWords">true</str>
 <str
name="breakWords">true</str>
 <int name="maxChanges">10</int>
 </lst>

</searchComponent>

And I've added the spellcheck component to my
/select request handler:

 <requestHandler name="/select"
class="solr.SearchHandler">
 ...
 <arr name="last-components">

<str>spellcheck</str>
 </arr>
 </requestHandler>

I have built up the
spellchecker source in the schema.xml from the name field:

 <field
name="spell" type="spell" indexed="true" stored="true" required="false"
multiValued="false"/>
 <copyField source="name" dest="spell"
maxChars="30000" />
 ...
 <fieldType name="spell" class="solr.TextField"
positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer
class="solr.StandardTokenizerFactory"/>
 </analyzer>
 <analyzer
type="query">
 <tokenizer class="solr.StandardTokenizerFactory"/>

</analyzer>
 </fieldType>

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates "Sichtscheibe" and "Sichtscheiben" into
"Sichtscheib", so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?