You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nicholas Ding <ni...@gmail.com> on 2013/05/10 15:47:29 UTC
Looking for Best Practice of Spellchecker
Hi guys,
I'm working on a local search project, I wanna integrate spellchecker for
the search.
So basically, my search engines is used to search local businesses. For
example, user could search for "wall mart", here is a typo, I wanna
spellchecker to give me Collation for "walmart".
My problems are:
1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
mart" as phrase search, but I can't get collation from the spellchecker.
2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
100% match, but spellchecker can't give me collation also.
I read the documents about spellchecker on Solr wiki, but it's very brief.
I'm wondering is there any best practice of spellchecker, I believe it's
widely used in the search, right?
And I have another idea, I don't know whether it's valid or not. I want to
apply spellchecker everything before doing the search, so that I could rely
on the spellchecker to tell me whether my search could get result or not.
Thanks
Nicholas
RE: Looking for Best Practice of Spellchecker
Posted by "Dyer, James" <Ja...@ingramcontent.com>.
The Word Break spellchecker will incorporate the broken & combined words in the collations. Its designed to work seamlessly in conjunction with a "regular" spellchecker (IndexBased- or Direct-).
James Dyer
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
Sent: Monday, May 13, 2013 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Looking for Best Practice of Spellchecker
Thank you for you help, guys. I agreed, "wall mart" should be a synonyms,
it's not a good example.
I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for "wall mart" to "walmart". But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?
Thanks
On Fri, May 10, 2013 at 11:34 AM, Dyer, James
<Ja...@ingramcontent.com>wrote:
> Good point, Jason. In fact, even if you use WorkBreakSpellChecker "wall
> mart" will not correct to "walmart". The reason is the spellchecker cannot
> both correct a token's spelling *and* fix the wordbreak issue involving
> that same token. So in this case a synonym is the way to go.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Jason Hellman [mailto:jhellman@innoventsolutions.com]
> Sent: Friday, May 10, 2013 9:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Looking for Best Practice of Spellchecker
>
> Nicholas,
>
> Also consider that some misspellings are better handled through Synonyms
> (or injected metadata).
>
> You can garner a great deal of value out of the spell checker by following
> the great advice James is giving here...but you'll find a well-placed
> "helper" synonym or metavalue can often save a lot of headache and time.
>
> Jason
>
> On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com>
> wrote:
>
> > Nicholas,
> >
> > It sounds like you might want to use WordBreakSolrSpellChecker, which
> gets obscure mention in the wiki. Read through this section:
> http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
> will see some information.
> >
> > Also, the Solr Example shows how to configure this. See
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> >
> > Look for...
> >
> > <lst name="spellchecker">
> > <str name="name">wordbreak</str>
> > ...
> > </lst>
> >
> > ...and...
> >
> > <requestHandler name="/spell" ...>
> > ...
> > </requestHandler>
> >
> > Also, I'd recommend you take a look at each parameter in the "/spell"
> request handler and read its section on the "spellcheckcomponent" wiki
> page. You probably will want to set many of these parameters as well.
> >
> > You can get a query to return only spell results simply by specifying
> "rows=0". However, its one less query to just have it return the results
> also. If there are no results, your application can check for collations
> and re-issue a collation query. If there are both results and collations
> returned, you can give the user results with "did-you-mean" suggestions.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> > Sent: Friday, May 10, 2013 8:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Looking for Best Practice of Spellchecker
> >
> > Hi guys,
> >
> > I'm working on a local search project, I wanna integrate spellchecker for
> > the search.
> >
> > So basically, my search engines is used to search local businesses. For
> > example, user could search for "wall mart", here is a typo, I wanna
> > spellchecker to give me Collation for "walmart".
> >
> > My problems are:
> > 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> > mart" as phrase search, but I can't get collation from the spellchecker.
> > 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> > 100% match, but spellchecker can't give me collation also.
> >
> > I read the documents about spellchecker on Solr wiki, but it's very
> brief.
> > I'm wondering is there any best practice of spellchecker, I believe it's
> > widely used in the search, right?
> >
> > And I have another idea, I don't know whether it's valid or not. I want
> to
> > apply spellchecker everything before doing the search, so that I could
> rely
> > on the spellchecker to tell me whether my search could get result or not.
> >
> > Thanks
> > Nicholas
> >
>
>
>
>
Re: Looking for Best Practice of Spellchecker
Posted by Nicholas Ding <ni...@gmail.com>.
Thank you for you help, guys. I agreed, "wall mart" should be a synonyms,
it's not a good example.
I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for "wall mart" to "walmart". But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?
Thanks
On Fri, May 10, 2013 at 11:34 AM, Dyer, James
<Ja...@ingramcontent.com>wrote:
> Good point, Jason. In fact, even if you use WorkBreakSpellChecker "wall
> mart" will not correct to "walmart". The reason is the spellchecker cannot
> both correct a token's spelling *and* fix the wordbreak issue involving
> that same token. So in this case a synonym is the way to go.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Jason Hellman [mailto:jhellman@innoventsolutions.com]
> Sent: Friday, May 10, 2013 9:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Looking for Best Practice of Spellchecker
>
> Nicholas,
>
> Also consider that some misspellings are better handled through Synonyms
> (or injected metadata).
>
> You can garner a great deal of value out of the spell checker by following
> the great advice James is giving here...but you'll find a well-placed
> "helper" synonym or metavalue can often save a lot of headache and time.
>
> Jason
>
> On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com>
> wrote:
>
> > Nicholas,
> >
> > It sounds like you might want to use WordBreakSolrSpellChecker, which
> gets obscure mention in the wiki. Read through this section:
> http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
> will see some information.
> >
> > Also, the Solr Example shows how to configure this. See
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> >
> > Look for...
> >
> > <lst name="spellchecker">
> > <str name="name">wordbreak</str>
> > ...
> > </lst>
> >
> > ...and...
> >
> > <requestHandler name="/spell" ...>
> > ...
> > </requestHandler>
> >
> > Also, I'd recommend you take a look at each parameter in the "/spell"
> request handler and read its section on the "spellcheckcomponent" wiki
> page. You probably will want to set many of these parameters as well.
> >
> > You can get a query to return only spell results simply by specifying
> "rows=0". However, its one less query to just have it return the results
> also. If there are no results, your application can check for collations
> and re-issue a collation query. If there are both results and collations
> returned, you can give the user results with "did-you-mean" suggestions.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> > Sent: Friday, May 10, 2013 8:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Looking for Best Practice of Spellchecker
> >
> > Hi guys,
> >
> > I'm working on a local search project, I wanna integrate spellchecker for
> > the search.
> >
> > So basically, my search engines is used to search local businesses. For
> > example, user could search for "wall mart", here is a typo, I wanna
> > spellchecker to give me Collation for "walmart".
> >
> > My problems are:
> > 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> > mart" as phrase search, but I can't get collation from the spellchecker.
> > 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> > 100% match, but spellchecker can't give me collation also.
> >
> > I read the documents about spellchecker on Solr wiki, but it's very
> brief.
> > I'm wondering is there any best practice of spellchecker, I believe it's
> > widely used in the search, right?
> >
> > And I have another idea, I don't know whether it's valid or not. I want
> to
> > apply spellchecker everything before doing the search, so that I could
> rely
> > on the spellchecker to tell me whether my search could get result or not.
> >
> > Thanks
> > Nicholas
> >
>
>
>
>
RE: Looking for Best Practice of Spellchecker
Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Good point, Jason. In fact, even if you use WorkBreakSpellChecker "wall mart" will not correct to "walmart". The reason is the spellchecker cannot both correct a token's spelling *and* fix the wordbreak issue involving that same token. So in this case a synonym is the way to go.
James Dyer
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Jason Hellman [mailto:jhellman@innoventsolutions.com]
Sent: Friday, May 10, 2013 9:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Looking for Best Practice of Spellchecker
Nicholas,
Also consider that some misspellings are better handled through Synonyms (or injected metadata).
You can garner a great deal of value out of the spell checker by following the great advice James is giving here...but you'll find a well-placed "helper" synonym or metavalue can often save a lot of headache and time.
Jason
On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com> wrote:
> Nicholas,
>
> It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki. Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.
>
> Also, the Solr Example shows how to configure this. See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
>
> Look for...
>
> <lst name="spellchecker">
> <str name="name">wordbreak</str>
> ...
> </lst>
>
> ...and...
>
> <requestHandler name="/spell" ...>
> ...
> </requestHandler>
>
> Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page. You probably will want to set many of these parameters as well.
>
> You can get a query to return only spell results simply by specifying "rows=0". However, its one less query to just have it return the results also. If there are no results, your application can check for collations and re-issue a collation query. If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> Sent: Friday, May 10, 2013 8:47 AM
> To: solr-user@lucene.apache.org
> Subject: Looking for Best Practice of Spellchecker
>
> Hi guys,
>
> I'm working on a local search project, I wanna integrate spellchecker for
> the search.
>
> So basically, my search engines is used to search local businesses. For
> example, user could search for "wall mart", here is a typo, I wanna
> spellchecker to give me Collation for "walmart".
>
> My problems are:
> 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> mart" as phrase search, but I can't get collation from the spellchecker.
> 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> 100% match, but spellchecker can't give me collation also.
>
> I read the documents about spellchecker on Solr wiki, but it's very brief.
> I'm wondering is there any best practice of spellchecker, I believe it's
> widely used in the search, right?
>
> And I have another idea, I don't know whether it's valid or not. I want to
> apply spellchecker everything before doing the search, so that I could rely
> on the spellchecker to tell me whether my search could get result or not.
>
> Thanks
> Nicholas
>
Re: Looking for Best Practice of Spellchecker
Posted by Jason Hellman <jh...@innoventsolutions.com>.
Nicholas,
Also consider that some misspellings are better handled through Synonyms (or injected metadata).
You can garner a great deal of value out of the spell checker by following the great advice James is giving hereā¦but you'll find a well-placed "helper" synonym or metavalue can often save a lot of headache and time.
Jason
On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com> wrote:
> Nicholas,
>
> It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki. Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.
>
> Also, the Solr Example shows how to configure this. See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
>
> Look for...
>
> <lst name="spellchecker">
> <str name="name">wordbreak</str>
> ...
> </lst>
>
> ...and...
>
> <requestHandler name="/spell" ...>
> ...
> </requestHandler>
>
> Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page. You probably will want to set many of these parameters as well.
>
> You can get a query to return only spell results simply by specifying "rows=0". However, its one less query to just have it return the results also. If there are no results, your application can check for collations and re-issue a collation query. If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> Sent: Friday, May 10, 2013 8:47 AM
> To: solr-user@lucene.apache.org
> Subject: Looking for Best Practice of Spellchecker
>
> Hi guys,
>
> I'm working on a local search project, I wanna integrate spellchecker for
> the search.
>
> So basically, my search engines is used to search local businesses. For
> example, user could search for "wall mart", here is a typo, I wanna
> spellchecker to give me Collation for "walmart".
>
> My problems are:
> 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> mart" as phrase search, but I can't get collation from the spellchecker.
> 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> 100% match, but spellchecker can't give me collation also.
>
> I read the documents about spellchecker on Solr wiki, but it's very brief.
> I'm wondering is there any best practice of spellchecker, I believe it's
> widely used in the search, right?
>
> And I have another idea, I don't know whether it's valid or not. I want to
> apply spellchecker everything before doing the search, so that I could rely
> on the spellchecker to tell me whether my search could get result or not.
>
> Thanks
> Nicholas
>
RE: Looking for Best Practice of Spellchecker
Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Nicholas,
It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki. Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.
Also, the Solr Example shows how to configure this. See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
Look for...
<lst name="spellchecker">
<str name="name">wordbreak</str>
...
</lst>
...and...
<requestHandler name="/spell" ...>
...
</requestHandler>
Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page. You probably will want to set many of these parameters as well.
You can get a query to return only spell results simply by specifying "rows=0". However, its one less query to just have it return the results also. If there are no results, your application can check for collations and re-issue a collation query. If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.
James Dyer
Ingram Content Group
(615) 213-4311
-----Original Message-----
From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
Sent: Friday, May 10, 2013 8:47 AM
To: solr-user@lucene.apache.org
Subject: Looking for Best Practice of Spellchecker
Hi guys,
I'm working on a local search project, I wanna integrate spellchecker for
the search.
So basically, my search engines is used to search local businesses. For
example, user could search for "wall mart", here is a typo, I wanna
spellchecker to give me Collation for "walmart".
My problems are:
1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
mart" as phrase search, but I can't get collation from the spellchecker.
2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
100% match, but spellchecker can't give me collation also.
I read the documents about spellchecker on Solr wiki, but it's very brief.
I'm wondering is there any best practice of spellchecker, I believe it's
widely used in the search, right?
And I have another idea, I don't know whether it's valid or not. I want to
apply spellchecker everything before doing the search, so that I could rely
on the spellchecker to tell me whether my search could get result or not.
Thanks
Nicholas