You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Oliver Beattie <ol...@obeattie.com> on 2011/10/11 17:13:07 UTC

Search suggestion with misspellings

Hi,

I'm sure this is something that's probably been covered before, and I
shouldn't need to ask. But anyway. I'm trying to build an autosuggest
with org.apache.solr.spelling.suggest.Suggester

The content being searched is music artist names, so I need to be able
to deal with suggesting things like "Katy Perry" if the user types
"Katy Pe" (sorry, couldn't think of a more tasteful example off the
cuff). I've tried a few things, but so far none give satisfactory
results. Here's my current configuration:

<fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.TrimFilterFactory"/>
    </analyzer>
</fieldType>

…for which I have a copyField called suggestionArtist. In my
solrconfig.xml I have:

<searchComponent name="autosuggester" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
        <str name="name">autosuggester</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
        <str name="field">suggestionArtist</str>
        <float name="threshold">0.0005</float>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">autosuggester</str>
        <str name="spellcheck.onlyMorePopular">false</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>autosuggester</str>
    </arr>
</requestHandler>

If anyone could give me any pointers, I'd be really grateful.

—Oliver

Re: Search suggestion with misspellings

Posted by Doug McKenzie <do...@firebox.com>.
Here's a basic query :
q=wat&start=0&rows=5&sort=total%20desc

And example data returned :

<doc>
<strname="id">9</str>
<intname="matched">180</int>
<strname="query">watch</str>
<arrname="suggest_ngrams">
<str>watch</str>
</arr>
<strname="suggest_query">watch</str>
<intname="total">5433</int>
</doc>
<doc>
<strname="id">52</str>
<intname="matched">180</int>
<strname="query">water</str>
<arrname="suggest_ngrams">
<str>water</str>
</arr>
<strname="suggest_query">water</str>
<intname="total">1201</int>
</doc>

So to clarify...
I input 3 values into solr :
query (which is a previously seen search query)
matched - how many docs matched that query
total - how many times the query has been asked

The query field is copyfielded to suggest_ngrams and then spilt into 
ngrams and that field is set to be the default search field in the 
schema (you can also set that in the query if you want to) - for us it 
made sense to do it in schema.

Note that Im sorting on how many times that query has been previously 
asked - it's a pretty basic solution and we're looking at refining it 
but it's a good starting point.






On 12/10/2011 10:37, Oliver Beattie wrote:
> Hi Doug,
>
> Brilliant, thanks so much for sharing. One more question… how is your
> request handler setup to query this?
>
> Sorry to be a bit dense haha.
>
> —Oliver
>
>
>
> On 12 October 2011 09:48, Doug McKenzie<do...@firebox.com>  wrote:
>> Sure, this is the schema I used...
>>
>> <fieldType name="text_ngram" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt" enablePositionIncrement="true"/>
>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"
>> side="front"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt" enablePositionIncrement="true"/>
>> </analyzer>
>> </fieldType>
>>
>> Input was CopyFielded into this field. So using your example...
>>
>> Katy Perry gets split into ...
>>
>> Ka, Kat, Katy, Katy 'space', Katy P etc
>>
>> So when a user is initially inputting their search, you can use an ajax call
>> to search for each part of the query and return matches. This system would
>> return Katy Perry if a user has just typed in 'Ka' whereas other schemas
>> wouldnt.
>>
>> Note that theres some assumptions here... for instance the entire phrase is
>> Tokenised so a search for Perry wouldnt return a match. Still found it more
>> effective for autosuggest than either the Suggestor options in Solr 3.4 or
>> using Spellchecker.
>>
>> It is worth using Spellcheck on a failed search however.
>>
>> Hope that helps
>>
>> On 12/10/2011 09:34, Oliver Beattie wrote:
>>> Hi Doug,
>>>
>>> Sounds very interesting; would you mind sharing some details of how
>>> exactly you did this? What request handler did you use etc?
>>>
>>> Many thanks,
>>> Oliver
>>>
>>>
>>>
>>> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com>    wrote:
>>>> I've just done something similar and rather than using the Spellchecker
>>>> went
>>>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>>>
>>>>
>>>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>>>> Hi,
>>>>>
>>>>> I'm sure this is something that's probably been covered before, and I
>>>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>>>> with org.apache.solr.spelling.suggest.Suggester
>>>>>
>>>>> The content being searched is music artist names, so I need to be able
>>>>> to deal with suggesting things like "Katy Perry" if the user types
>>>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>>>> cuff). I've tried a few things, but so far none give satisfactory
>>>>> results. Here's my current configuration:
>>>>>
>>>>> <fieldType name="autosuggestString" class="solr.TextField"
>>>>> omitNorms="true">
>>>>>      <analyzer type="index">
>>>>>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>>>          <filter class="solr.TrimFilterFactory"/>
>>>>>      </analyzer>
>>>>> </fieldType>
>>>>>
>>>>> …for which I have a copyField called suggestionArtist. In my
>>>>> solrconfig.xml I have:
>>>>>
>>>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>>>>      <lst name="spellchecker">
>>>>>          <str name="name">autosuggester</str>
>>>>>          <str
>>>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>>>>          <str
>>>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>>>>          <str name="field">suggestionArtist</str>
>>>>>          <float name="threshold">0.0005</float>
>>>>>          <str name="buildOnCommit">true</str>
>>>>>      </lst>
>>>>> </searchComponent>
>>>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>>>>      <lst name="defaults">
>>>>>          <str name="spellcheck">true</str>
>>>>>          <str name="spellcheck.dictionary">autosuggester</str>
>>>>>          <str name="spellcheck.onlyMorePopular">false</str>
>>>>>          <str name="spellcheck.count">5</str>
>>>>>          <str name="spellcheck.collate">true</str>
>>>>>      </lst>
>>>>>      <arr name="components">
>>>>>          <str>autosuggester</str>
>>>>>      </arr>
>>>>> </requestHandler>
>>>>>
>>>>> If anyone could give me any pointers, I'd be really grateful.
>>>>>
>>>>> —Oliver

Re: Search suggestion with misspellings

Posted by Oliver Beattie <ol...@obeattie.com>.
Hi Doug,

Brilliant, thanks so much for sharing. One more question… how is your
request handler setup to query this?

Sorry to be a bit dense haha.

—Oliver



On 12 October 2011 09:48, Doug McKenzie <do...@firebox.com> wrote:
> Sure, this is the schema I used...
>
> <fieldType name="text_ngram" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrement="true"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"
> side="front"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrement="true"/>
> </analyzer>
> </fieldType>
>
> Input was CopyFielded into this field. So using your example...
>
> Katy Perry gets split into ...
>
> Ka, Kat, Katy, Katy 'space', Katy P etc
>
> So when a user is initially inputting their search, you can use an ajax call
> to search for each part of the query and return matches. This system would
> return Katy Perry if a user has just typed in 'Ka' whereas other schemas
> wouldnt.
>
> Note that theres some assumptions here... for instance the entire phrase is
> Tokenised so a search for Perry wouldnt return a match. Still found it more
> effective for autosuggest than either the Suggestor options in Solr 3.4 or
> using Spellchecker.
>
> It is worth using Spellcheck on a failed search however.
>
> Hope that helps
>
> On 12/10/2011 09:34, Oliver Beattie wrote:
>>
>> Hi Doug,
>>
>> Sounds very interesting; would you mind sharing some details of how
>> exactly you did this? What request handler did you use etc?
>>
>> Many thanks,
>> Oliver
>>
>>
>>
>> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com>  wrote:
>>>
>>> I've just done something similar and rather than using the Spellchecker
>>> went
>>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>>
>>>
>>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm sure this is something that's probably been covered before, and I
>>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>>> with org.apache.solr.spelling.suggest.Suggester
>>>>
>>>> The content being searched is music artist names, so I need to be able
>>>> to deal with suggesting things like "Katy Perry" if the user types
>>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>>> cuff). I've tried a few things, but so far none give satisfactory
>>>> results. Here's my current configuration:
>>>>
>>>> <fieldType name="autosuggestString" class="solr.TextField"
>>>> omitNorms="true">
>>>>     <analyzer type="index">
>>>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>         <filter class="solr.LowerCaseFilterFactory"/>
>>>>         <filter class="solr.TrimFilterFactory"/>
>>>>     </analyzer>
>>>> </fieldType>
>>>>
>>>> …for which I have a copyField called suggestionArtist. In my
>>>> solrconfig.xml I have:
>>>>
>>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>>>     <lst name="spellchecker">
>>>>         <str name="name">autosuggester</str>
>>>>         <str
>>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>>>         <str
>>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>>>         <str name="field">suggestionArtist</str>
>>>>         <float name="threshold">0.0005</float>
>>>>         <str name="buildOnCommit">true</str>
>>>>     </lst>
>>>> </searchComponent>
>>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>>>     <lst name="defaults">
>>>>         <str name="spellcheck">true</str>
>>>>         <str name="spellcheck.dictionary">autosuggester</str>
>>>>         <str name="spellcheck.onlyMorePopular">false</str>
>>>>         <str name="spellcheck.count">5</str>
>>>>         <str name="spellcheck.collate">true</str>
>>>>     </lst>
>>>>     <arr name="components">
>>>>         <str>autosuggester</str>
>>>>     </arr>
>>>> </requestHandler>
>>>>
>>>> If anyone could give me any pointers, I'd be really grateful.
>>>>
>>>> —Oliver
>

Re: Search suggestion with misspellings

Posted by Doug McKenzie <do...@firebox.com>.
Sure, this is the schema I used...

<fieldType name="text_ngram" class="solr.TextField" 
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords_en.txt" enablePositionIncrement="true"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" 
maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords_en.txt" enablePositionIncrement="true"/>
</analyzer>
</fieldType>

Input was CopyFielded into this field. So using your example...

Katy Perry gets split into ...

Ka, Kat, Katy, Katy 'space', Katy P etc

So when a user is initially inputting their search, you can use an ajax 
call to search for each part of the query and return matches. This 
system would return Katy Perry if a user has just typed in 'Ka' whereas 
other schemas wouldnt.

Note that theres some assumptions here... for instance the entire phrase 
is Tokenised so a search for Perry wouldnt return a match. Still found 
it more effective for autosuggest than either the Suggestor options in 
Solr 3.4 or using Spellchecker.

It is worth using Spellcheck on a failed search however.

Hope that helps

On 12/10/2011 09:34, Oliver Beattie wrote:
> Hi Doug,
>
> Sounds very interesting; would you mind sharing some details of how
> exactly you did this? What request handler did you use etc?
>
> Many thanks,
> Oliver
>
>
>
> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com>  wrote:
>> I've just done something similar and rather than using the Spellchecker went
>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>
>>
>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>> Hi,
>>>
>>> I'm sure this is something that's probably been covered before, and I
>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>> with org.apache.solr.spelling.suggest.Suggester
>>>
>>> The content being searched is music artist names, so I need to be able
>>> to deal with suggesting things like "Katy Perry" if the user types
>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>> cuff). I've tried a few things, but so far none give satisfactory
>>> results. Here's my current configuration:
>>>
>>> <fieldType name="autosuggestString" class="solr.TextField"
>>> omitNorms="true">
>>>      <analyzer type="index">
>>>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>          <filter class="solr.TrimFilterFactory"/>
>>>      </analyzer>
>>> </fieldType>
>>>
>>> …for which I have a copyField called suggestionArtist. In my
>>> solrconfig.xml I have:
>>>
>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>>      <lst name="spellchecker">
>>>          <str name="name">autosuggester</str>
>>>          <str
>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>>          <str
>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>>          <str name="field">suggestionArtist</str>
>>>          <float name="threshold">0.0005</float>
>>>          <str name="buildOnCommit">true</str>
>>>      </lst>
>>> </searchComponent>
>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>>      <lst name="defaults">
>>>          <str name="spellcheck">true</str>
>>>          <str name="spellcheck.dictionary">autosuggester</str>
>>>          <str name="spellcheck.onlyMorePopular">false</str>
>>>          <str name="spellcheck.count">5</str>
>>>          <str name="spellcheck.collate">true</str>
>>>      </lst>
>>>      <arr name="components">
>>>          <str>autosuggester</str>
>>>      </arr>
>>> </requestHandler>
>>>
>>> If anyone could give me any pointers, I'd be really grateful.
>>>
>>> —Oliver

Re: Search suggestion with misspellings

Posted by Oliver Beattie <ol...@obeattie.com>.
Hi Doug,

Sounds very interesting; would you mind sharing some details of how
exactly you did this? What request handler did you use etc?

Many thanks,
Oliver



On 11 October 2011 17:37, Doug McKenzie <do...@firebox.com> wrote:
> I've just done something similar and rather than using the Spellchecker went
> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>
>
> On 11/10/2011 16:13, Oliver Beattie wrote:
>>
>> Hi,
>>
>> I'm sure this is something that's probably been covered before, and I
>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>> with org.apache.solr.spelling.suggest.Suggester
>>
>> The content being searched is music artist names, so I need to be able
>> to deal with suggesting things like "Katy Perry" if the user types
>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>> cuff). I've tried a few things, but so far none give satisfactory
>> results. Here's my current configuration:
>>
>> <fieldType name="autosuggestString" class="solr.TextField"
>> omitNorms="true">
>>     <analyzer type="index">
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.TrimFilterFactory"/>
>>     </analyzer>
>> </fieldType>
>>
>> …for which I have a copyField called suggestionArtist. In my
>> solrconfig.xml I have:
>>
>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>     <lst name="spellchecker">
>>         <str name="name">autosuggester</str>
>>         <str
>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>         <str
>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>         <str name="field">suggestionArtist</str>
>>         <float name="threshold">0.0005</float>
>>         <str name="buildOnCommit">true</str>
>>     </lst>
>> </searchComponent>
>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>     <lst name="defaults">
>>         <str name="spellcheck">true</str>
>>         <str name="spellcheck.dictionary">autosuggester</str>
>>         <str name="spellcheck.onlyMorePopular">false</str>
>>         <str name="spellcheck.count">5</str>
>>         <str name="spellcheck.collate">true</str>
>>     </lst>
>>     <arr name="components">
>>         <str>autosuggester</str>
>>     </arr>
>> </requestHandler>
>>
>> If anyone could give me any pointers, I'd be really grateful.
>>
>> —Oliver
>

Re: Search suggestion with misspellings

Posted by Doug McKenzie <do...@firebox.com>.
I've just done something similar and rather than using the Spellchecker 
went for NEdgeGramFilters instead for the suggestions. Worth looking 
into imo


On 11/10/2011 16:13, Oliver Beattie wrote:
> Hi,
>
> I'm sure this is something that's probably been covered before, and I
> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
> with org.apache.solr.spelling.suggest.Suggester
>
> The content being searched is music artist names, so I need to be able
> to deal with suggesting things like "Katy Perry" if the user types
> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
> cuff). I've tried a few things, but so far none give satisfactory
> results. Here's my current configuration:
>
> <fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
>      <analyzer type="index">
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.TrimFilterFactory"/>
>      </analyzer>
> </fieldType>
>
> …for which I have a copyField called suggestionArtist. In my
> solrconfig.xml I have:
>
> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>      <lst name="spellchecker">
>          <str name="name">autosuggester</str>
>          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>          <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>          <str name="field">suggestionArtist</str>
>          <float name="threshold">0.0005</float>
>          <str name="buildOnCommit">true</str>
>      </lst>
> </searchComponent>
> <requestHandler name="/suggest" class="solr.SearchHandler">
>      <lst name="defaults">
>          <str name="spellcheck">true</str>
>          <str name="spellcheck.dictionary">autosuggester</str>
>          <str name="spellcheck.onlyMorePopular">false</str>
>          <str name="spellcheck.count">5</str>
>          <str name="spellcheck.collate">true</str>
>      </lst>
>      <arr name="components">
>          <str>autosuggester</str>
>      </arr>
> </requestHandler>
>
> If anyone could give me any pointers, I'd be really grateful.
>
> —Oliver

Re: Search suggestion with misspellings

Posted by Oliver Beattie <ol...@obeattie.com>.
Just realised that I said "Katy Pe" as the example when I actually
meant "Katie Pe", apologies

—Oliver



On 11 October 2011 16:13, Oliver Beattie <ol...@obeattie.com> wrote:
> Hi,
>
> I'm sure this is something that's probably been covered before, and I
> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
> with org.apache.solr.spelling.suggest.Suggester
>
> The content being searched is music artist names, so I need to be able
> to deal with suggesting things like "Katy Perry" if the user types
> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
> cuff). I've tried a few things, but so far none give satisfactory
> results. Here's my current configuration:
>
> <fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
>    <analyzer type="index">
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.TrimFilterFactory"/>
>    </analyzer>
> </fieldType>
>
> …for which I have a copyField called suggestionArtist. In my
> solrconfig.xml I have:
>
> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>    <lst name="spellchecker">
>        <str name="name">autosuggester</str>
>        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>        <str name="field">suggestionArtist</str>
>        <float name="threshold">0.0005</float>
>        <str name="buildOnCommit">true</str>
>    </lst>
> </searchComponent>
> <requestHandler name="/suggest" class="solr.SearchHandler">
>    <lst name="defaults">
>        <str name="spellcheck">true</str>
>        <str name="spellcheck.dictionary">autosuggester</str>
>        <str name="spellcheck.onlyMorePopular">false</str>
>        <str name="spellcheck.count">5</str>
>        <str name="spellcheck.collate">true</str>
>    </lst>
>    <arr name="components">
>        <str>autosuggester</str>
>    </arr>
> </requestHandler>
>
> If anyone could give me any pointers, I'd be really grateful.
>
> —Oliver
>