You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Oliver Beattie <ol...@obeattie.com> on 2011/10/11 17:13:07 UTC
Search suggestion with misspellings
Hi,
I'm sure this is something that's probably been covered before, and I
shouldn't need to ask. But anyway. I'm trying to build an autosuggest
with org.apache.solr.spelling.suggest.Suggester
The content being searched is music artist names, so I need to be able
to deal with suggesting things like "Katy Perry" if the user types
"Katy Pe" (sorry, couldn't think of a more tasteful example off the
cuff). I've tried a few things, but so far none give satisfactory
results. Here's my current configuration:
<fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
…for which I have a copyField called suggestionArtist. In my
solrconfig.xml I have:
<searchComponent name="autosuggester" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">autosuggester</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
<str name="field">suggestionArtist</str>
<float name="threshold">0.0005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">autosuggester</str>
<str name="spellcheck.onlyMorePopular">false</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>autosuggester</str>
</arr>
</requestHandler>
If anyone could give me any pointers, I'd be really grateful.
—Oliver
Re: Search suggestion with misspellings
Posted by Doug McKenzie <do...@firebox.com>.
Here's a basic query :
q=wat&start=0&rows=5&sort=total%20desc
And example data returned :
<doc>
<strname="id">9</str>
<intname="matched">180</int>
<strname="query">watch</str>
<arrname="suggest_ngrams">
<str>watch</str>
</arr>
<strname="suggest_query">watch</str>
<intname="total">5433</int>
</doc>
<doc>
<strname="id">52</str>
<intname="matched">180</int>
<strname="query">water</str>
<arrname="suggest_ngrams">
<str>water</str>
</arr>
<strname="suggest_query">water</str>
<intname="total">1201</int>
</doc>
So to clarify...
I input 3 values into solr :
query (which is a previously seen search query)
matched - how many docs matched that query
total - how many times the query has been asked
The query field is copyfielded to suggest_ngrams and then spilt into
ngrams and that field is set to be the default search field in the
schema (you can also set that in the query if you want to) - for us it
made sense to do it in schema.
Note that Im sorting on how many times that query has been previously
asked - it's a pretty basic solution and we're looking at refining it
but it's a good starting point.
On 12/10/2011 10:37, Oliver Beattie wrote:
> Hi Doug,
>
> Brilliant, thanks so much for sharing. One more question… how is your
> request handler setup to query this?
>
> Sorry to be a bit dense haha.
>
> —Oliver
>
>
>
> On 12 October 2011 09:48, Doug McKenzie<do...@firebox.com> wrote:
>> Sure, this is the schema I used...
>>
>> <fieldType name="text_ngram" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt" enablePositionIncrement="true"/>
>> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"
>> side="front"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords_en.txt" enablePositionIncrement="true"/>
>> </analyzer>
>> </fieldType>
>>
>> Input was CopyFielded into this field. So using your example...
>>
>> Katy Perry gets split into ...
>>
>> Ka, Kat, Katy, Katy 'space', Katy P etc
>>
>> So when a user is initially inputting their search, you can use an ajax call
>> to search for each part of the query and return matches. This system would
>> return Katy Perry if a user has just typed in 'Ka' whereas other schemas
>> wouldnt.
>>
>> Note that theres some assumptions here... for instance the entire phrase is
>> Tokenised so a search for Perry wouldnt return a match. Still found it more
>> effective for autosuggest than either the Suggestor options in Solr 3.4 or
>> using Spellchecker.
>>
>> It is worth using Spellcheck on a failed search however.
>>
>> Hope that helps
>>
>> On 12/10/2011 09:34, Oliver Beattie wrote:
>>> Hi Doug,
>>>
>>> Sounds very interesting; would you mind sharing some details of how
>>> exactly you did this? What request handler did you use etc?
>>>
>>> Many thanks,
>>> Oliver
>>>
>>>
>>>
>>> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com> wrote:
>>>> I've just done something similar and rather than using the Spellchecker
>>>> went
>>>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>>>
>>>>
>>>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>>>> Hi,
>>>>>
>>>>> I'm sure this is something that's probably been covered before, and I
>>>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>>>> with org.apache.solr.spelling.suggest.Suggester
>>>>>
>>>>> The content being searched is music artist names, so I need to be able
>>>>> to deal with suggesting things like "Katy Perry" if the user types
>>>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>>>> cuff). I've tried a few things, but so far none give satisfactory
>>>>> results. Here's my current configuration:
>>>>>
>>>>> <fieldType name="autosuggestString" class="solr.TextField"
>>>>> omitNorms="true">
>>>>> <analyzer type="index">
>>>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>>> <filter class="solr.TrimFilterFactory"/>
>>>>> </analyzer>
>>>>> </fieldType>
>>>>>
>>>>> …for which I have a copyField called suggestionArtist. In my
>>>>> solrconfig.xml I have:
>>>>>
>>>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>>>> <lst name="spellchecker">
>>>>> <str name="name">autosuggester</str>
>>>>> <str
>>>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>>>> <str
>>>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>>>> <str name="field">suggestionArtist</str>
>>>>> <float name="threshold">0.0005</float>
>>>>> <str name="buildOnCommit">true</str>
>>>>> </lst>
>>>>> </searchComponent>
>>>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>>>> <lst name="defaults">
>>>>> <str name="spellcheck">true</str>
>>>>> <str name="spellcheck.dictionary">autosuggester</str>
>>>>> <str name="spellcheck.onlyMorePopular">false</str>
>>>>> <str name="spellcheck.count">5</str>
>>>>> <str name="spellcheck.collate">true</str>
>>>>> </lst>
>>>>> <arr name="components">
>>>>> <str>autosuggester</str>
>>>>> </arr>
>>>>> </requestHandler>
>>>>>
>>>>> If anyone could give me any pointers, I'd be really grateful.
>>>>>
>>>>> —Oliver
Re: Search suggestion with misspellings
Posted by Oliver Beattie <ol...@obeattie.com>.
Hi Doug,
Brilliant, thanks so much for sharing. One more question… how is your
request handler setup to query this?
Sorry to be a bit dense haha.
—Oliver
On 12 October 2011 09:48, Doug McKenzie <do...@firebox.com> wrote:
> Sure, this is the schema I used...
>
> <fieldType name="text_ngram" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrement="true"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"
> side="front"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_en.txt" enablePositionIncrement="true"/>
> </analyzer>
> </fieldType>
>
> Input was CopyFielded into this field. So using your example...
>
> Katy Perry gets split into ...
>
> Ka, Kat, Katy, Katy 'space', Katy P etc
>
> So when a user is initially inputting their search, you can use an ajax call
> to search for each part of the query and return matches. This system would
> return Katy Perry if a user has just typed in 'Ka' whereas other schemas
> wouldnt.
>
> Note that theres some assumptions here... for instance the entire phrase is
> Tokenised so a search for Perry wouldnt return a match. Still found it more
> effective for autosuggest than either the Suggestor options in Solr 3.4 or
> using Spellchecker.
>
> It is worth using Spellcheck on a failed search however.
>
> Hope that helps
>
> On 12/10/2011 09:34, Oliver Beattie wrote:
>>
>> Hi Doug,
>>
>> Sounds very interesting; would you mind sharing some details of how
>> exactly you did this? What request handler did you use etc?
>>
>> Many thanks,
>> Oliver
>>
>>
>>
>> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com> wrote:
>>>
>>> I've just done something similar and rather than using the Spellchecker
>>> went
>>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>>
>>>
>>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm sure this is something that's probably been covered before, and I
>>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>>> with org.apache.solr.spelling.suggest.Suggester
>>>>
>>>> The content being searched is music artist names, so I need to be able
>>>> to deal with suggesting things like "Katy Perry" if the user types
>>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>>> cuff). I've tried a few things, but so far none give satisfactory
>>>> results. Here's my current configuration:
>>>>
>>>> <fieldType name="autosuggestString" class="solr.TextField"
>>>> omitNorms="true">
>>>> <analyzer type="index">
>>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>> <filter class="solr.LowerCaseFilterFactory"/>
>>>> <filter class="solr.TrimFilterFactory"/>
>>>> </analyzer>
>>>> </fieldType>
>>>>
>>>> …for which I have a copyField called suggestionArtist. In my
>>>> solrconfig.xml I have:
>>>>
>>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>>> <lst name="spellchecker">
>>>> <str name="name">autosuggester</str>
>>>> <str
>>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>>> <str
>>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>>> <str name="field">suggestionArtist</str>
>>>> <float name="threshold">0.0005</float>
>>>> <str name="buildOnCommit">true</str>
>>>> </lst>
>>>> </searchComponent>
>>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>>> <lst name="defaults">
>>>> <str name="spellcheck">true</str>
>>>> <str name="spellcheck.dictionary">autosuggester</str>
>>>> <str name="spellcheck.onlyMorePopular">false</str>
>>>> <str name="spellcheck.count">5</str>
>>>> <str name="spellcheck.collate">true</str>
>>>> </lst>
>>>> <arr name="components">
>>>> <str>autosuggester</str>
>>>> </arr>
>>>> </requestHandler>
>>>>
>>>> If anyone could give me any pointers, I'd be really grateful.
>>>>
>>>> —Oliver
>
Re: Search suggestion with misspellings
Posted by Doug McKenzie <do...@firebox.com>.
Sure, this is the schema I used...
<fieldType name="text_ngram" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrement="true"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_en.txt" enablePositionIncrement="true"/>
</analyzer>
</fieldType>
Input was CopyFielded into this field. So using your example...
Katy Perry gets split into ...
Ka, Kat, Katy, Katy 'space', Katy P etc
So when a user is initially inputting their search, you can use an ajax
call to search for each part of the query and return matches. This
system would return Katy Perry if a user has just typed in 'Ka' whereas
other schemas wouldnt.
Note that theres some assumptions here... for instance the entire phrase
is Tokenised so a search for Perry wouldnt return a match. Still found
it more effective for autosuggest than either the Suggestor options in
Solr 3.4 or using Spellchecker.
It is worth using Spellcheck on a failed search however.
Hope that helps
On 12/10/2011 09:34, Oliver Beattie wrote:
> Hi Doug,
>
> Sounds very interesting; would you mind sharing some details of how
> exactly you did this? What request handler did you use etc?
>
> Many thanks,
> Oliver
>
>
>
> On 11 October 2011 17:37, Doug McKenzie<do...@firebox.com> wrote:
>> I've just done something similar and rather than using the Spellchecker went
>> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>>
>>
>> On 11/10/2011 16:13, Oliver Beattie wrote:
>>> Hi,
>>>
>>> I'm sure this is something that's probably been covered before, and I
>>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>>> with org.apache.solr.spelling.suggest.Suggester
>>>
>>> The content being searched is music artist names, so I need to be able
>>> to deal with suggesting things like "Katy Perry" if the user types
>>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>>> cuff). I've tried a few things, but so far none give satisfactory
>>> results. Here's my current configuration:
>>>
>>> <fieldType name="autosuggestString" class="solr.TextField"
>>> omitNorms="true">
>>> <analyzer type="index">
>>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.TrimFilterFactory"/>
>>> </analyzer>
>>> </fieldType>
>>>
>>> …for which I have a copyField called suggestionArtist. In my
>>> solrconfig.xml I have:
>>>
>>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>>> <lst name="spellchecker">
>>> <str name="name">autosuggester</str>
>>> <str
>>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>> <str
>>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>>> <str name="field">suggestionArtist</str>
>>> <float name="threshold">0.0005</float>
>>> <str name="buildOnCommit">true</str>
>>> </lst>
>>> </searchComponent>
>>> <requestHandler name="/suggest" class="solr.SearchHandler">
>>> <lst name="defaults">
>>> <str name="spellcheck">true</str>
>>> <str name="spellcheck.dictionary">autosuggester</str>
>>> <str name="spellcheck.onlyMorePopular">false</str>
>>> <str name="spellcheck.count">5</str>
>>> <str name="spellcheck.collate">true</str>
>>> </lst>
>>> <arr name="components">
>>> <str>autosuggester</str>
>>> </arr>
>>> </requestHandler>
>>>
>>> If anyone could give me any pointers, I'd be really grateful.
>>>
>>> —Oliver
Re: Search suggestion with misspellings
Posted by Oliver Beattie <ol...@obeattie.com>.
Hi Doug,
Sounds very interesting; would you mind sharing some details of how
exactly you did this? What request handler did you use etc?
Many thanks,
Oliver
On 11 October 2011 17:37, Doug McKenzie <do...@firebox.com> wrote:
> I've just done something similar and rather than using the Spellchecker went
> for NEdgeGramFilters instead for the suggestions. Worth looking into imo
>
>
> On 11/10/2011 16:13, Oliver Beattie wrote:
>>
>> Hi,
>>
>> I'm sure this is something that's probably been covered before, and I
>> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
>> with org.apache.solr.spelling.suggest.Suggester
>>
>> The content being searched is music artist names, so I need to be able
>> to deal with suggesting things like "Katy Perry" if the user types
>> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
>> cuff). I've tried a few things, but so far none give satisfactory
>> results. Here's my current configuration:
>>
>> <fieldType name="autosuggestString" class="solr.TextField"
>> omitNorms="true">
>> <analyzer type="index">
>> <tokenizer class="solr.KeywordTokenizerFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.TrimFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> …for which I have a copyField called suggestionArtist. In my
>> solrconfig.xml I have:
>>
>> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
>> <lst name="spellchecker">
>> <str name="name">autosuggester</str>
>> <str
>> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>> <str
>> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
>> <str name="field">suggestionArtist</str>
>> <float name="threshold">0.0005</float>
>> <str name="buildOnCommit">true</str>
>> </lst>
>> </searchComponent>
>> <requestHandler name="/suggest" class="solr.SearchHandler">
>> <lst name="defaults">
>> <str name="spellcheck">true</str>
>> <str name="spellcheck.dictionary">autosuggester</str>
>> <str name="spellcheck.onlyMorePopular">false</str>
>> <str name="spellcheck.count">5</str>
>> <str name="spellcheck.collate">true</str>
>> </lst>
>> <arr name="components">
>> <str>autosuggester</str>
>> </arr>
>> </requestHandler>
>>
>> If anyone could give me any pointers, I'd be really grateful.
>>
>> —Oliver
>
Re: Search suggestion with misspellings
Posted by Doug McKenzie <do...@firebox.com>.
I've just done something similar and rather than using the Spellchecker
went for NEdgeGramFilters instead for the suggestions. Worth looking
into imo
On 11/10/2011 16:13, Oliver Beattie wrote:
> Hi,
>
> I'm sure this is something that's probably been covered before, and I
> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
> with org.apache.solr.spelling.suggest.Suggester
>
> The content being searched is music artist names, so I need to be able
> to deal with suggesting things like "Katy Perry" if the user types
> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
> cuff). I've tried a few things, but so far none give satisfactory
> results. Here's my current configuration:
>
> <fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> </analyzer>
> </fieldType>
>
> …for which I have a copyField called suggestionArtist. In my
> solrconfig.xml I have:
>
> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
> <lst name="spellchecker">
> <str name="name">autosuggester</str>
> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
> <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
> <str name="field">suggestionArtist</str>
> <float name="threshold">0.0005</float>
> <str name="buildOnCommit">true</str>
> </lst>
> </searchComponent>
> <requestHandler name="/suggest" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="spellcheck">true</str>
> <str name="spellcheck.dictionary">autosuggester</str>
> <str name="spellcheck.onlyMorePopular">false</str>
> <str name="spellcheck.count">5</str>
> <str name="spellcheck.collate">true</str>
> </lst>
> <arr name="components">
> <str>autosuggester</str>
> </arr>
> </requestHandler>
>
> If anyone could give me any pointers, I'd be really grateful.
>
> —Oliver
Re: Search suggestion with misspellings
Posted by Oliver Beattie <ol...@obeattie.com>.
Just realised that I said "Katy Pe" as the example when I actually
meant "Katie Pe", apologies
—Oliver
On 11 October 2011 16:13, Oliver Beattie <ol...@obeattie.com> wrote:
> Hi,
>
> I'm sure this is something that's probably been covered before, and I
> shouldn't need to ask. But anyway. I'm trying to build an autosuggest
> with org.apache.solr.spelling.suggest.Suggester
>
> The content being searched is music artist names, so I need to be able
> to deal with suggesting things like "Katy Perry" if the user types
> "Katy Pe" (sorry, couldn't think of a more tasteful example off the
> cuff). I've tried a few things, but so far none give satisfactory
> results. Here's my current configuration:
>
> <fieldType name="autosuggestString" class="solr.TextField" omitNorms="true">
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.TrimFilterFactory"/>
> </analyzer>
> </fieldType>
>
> …for which I have a copyField called suggestionArtist. In my
> solrconfig.xml I have:
>
> <searchComponent name="autosuggester" class="solr.SpellCheckComponent">
> <lst name="spellchecker">
> <str name="name">autosuggester</str>
> <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
> <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
> <str name="field">suggestionArtist</str>
> <float name="threshold">0.0005</float>
> <str name="buildOnCommit">true</str>
> </lst>
> </searchComponent>
> <requestHandler name="/suggest" class="solr.SearchHandler">
> <lst name="defaults">
> <str name="spellcheck">true</str>
> <str name="spellcheck.dictionary">autosuggester</str>
> <str name="spellcheck.onlyMorePopular">false</str>
> <str name="spellcheck.count">5</str>
> <str name="spellcheck.collate">true</str>
> </lst>
> <arr name="components">
> <str>autosuggester</str>
> </arr>
> </requestHandler>
>
> If anyone could give me any pointers, I'd be really grateful.
>
> —Oliver
>