You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nicholas Ding <ni...@gmail.com> on 2013/05/10 15:47:29 UTC

Looking for Best Practice of Spellchecker

Hi guys,

I'm working on a local search project, I wanna integrate spellchecker for
the search.

So basically, my search engines is used to search local businesses. For
example, user could search for "wall mart", here is a typo, I wanna
spellchecker to give me Collation for "walmart".

My problems are:
1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
mart" as phrase search, but I can't get collation from the spellchecker.
2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
100% match, but spellchecker can't give me collation also.

I read the documents about spellchecker on Solr wiki, but it's very brief.
I'm wondering is there any best practice of spellchecker, I believe it's
widely used in the search, right?

And I have another idea, I don't know whether it's valid or not. I want to
apply spellchecker everything before doing the search, so that I could rely
on the spellchecker to tell me whether my search could get result or not.

Thanks
Nicholas

RE: Looking for Best Practice of Spellchecker

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
The Word Break spellchecker will incorporate the broken & combined words in the collations.  Its designed to work seamlessly in conjunction with a "regular" spellchecker (IndexBased- or Direct-).  

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nicholas Ding [mailto:nicholasdsj@gmail.com] 
Sent: Monday, May 13, 2013 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Looking for Best Practice of Spellchecker

Thank you for you help, guys. I agreed, "wall mart" should be a synonyms,
it's not a good example.

I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for "wall mart" to "walmart". But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?

Thanks


On Fri, May 10, 2013 at 11:34 AM, Dyer, James
<Ja...@ingramcontent.com>wrote:

> Good point, Jason.  In fact, even if you use WorkBreakSpellChecker "wall
> mart" will not correct to "walmart".  The reason is the spellchecker cannot
> both correct a token's spelling *and* fix the wordbreak issue involving
> that same token.  So in this case a synonym is the way to go.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Jason Hellman [mailto:jhellman@innoventsolutions.com]
> Sent: Friday, May 10, 2013 9:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Looking for Best Practice of Spellchecker
>
> Nicholas,
>
> Also consider that some misspellings are better handled through Synonyms
> (or injected metadata).
>
> You can garner a great deal of value out of the spell checker by following
> the great advice James is giving here...but you'll find a well-placed
> "helper" synonym or metavalue can often save a lot of headache and time.
>
> Jason
>
> On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com>
> wrote:
>
> > Nicholas,
> >
> > It sounds like you might want to use WordBreakSolrSpellChecker, which
> gets obscure mention in the wiki.  Read through this section:
> http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
> will see some information.
> >
> > Also, the Solr Example shows how to configure this.  See
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> >
> > Look for...
> >
> > <lst name="spellchecker">
> >  <str name="name">wordbreak</str>
> >  ...
> > </lst>
> >
> > ...and...
> >
> > <requestHandler name="/spell" ...>
> > ...
> > </requestHandler>
> >
> > Also, I'd recommend you take a look at each parameter in the "/spell"
> request handler and read its section on the "spellcheckcomponent" wiki
> page.  You probably will want to set many of these parameters as well.
> >
> > You can get a query to return only spell results simply by specifying
> "rows=0".  However, its one less query to just have it return the results
> also.  If there are no results, your application can check for collations
> and re-issue a collation query.  If there are both results and collations
> returned, you can give the user results with "did-you-mean" suggestions.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> > Sent: Friday, May 10, 2013 8:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Looking for Best Practice of Spellchecker
> >
> > Hi guys,
> >
> > I'm working on a local search project, I wanna integrate spellchecker for
> > the search.
> >
> > So basically, my search engines is used to search local businesses. For
> > example, user could search for "wall mart", here is a typo, I wanna
> > spellchecker to give me Collation for "walmart".
> >
> > My problems are:
> > 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> > mart" as phrase search, but I can't get collation from the spellchecker.
> > 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> > 100% match, but spellchecker can't give me collation also.
> >
> > I read the documents about spellchecker on Solr wiki, but it's very
> brief.
> > I'm wondering is there any best practice of spellchecker, I believe it's
> > widely used in the search, right?
> >
> > And I have another idea, I don't know whether it's valid or not. I want
> to
> > apply spellchecker everything before doing the search, so that I could
> rely
> > on the spellchecker to tell me whether my search could get result or not.
> >
> > Thanks
> > Nicholas
> >
>
>
>
>


Re: Looking for Best Practice of Spellchecker

Posted by Nicholas Ding <ni...@gmail.com>.
Thank you for you help, guys. I agreed, "wall mart" should be a synonyms,
it's not a good example.

I did an experiment by using KeywordTokenizer + DirectSolrSpellChecker, I
can get suggestion even for "wall mart" to "walmart". But I don't know
whether it's a good practice or not. It's much like a workaround to me. And
for WordBreakSpellChecker, I haven't tried it yet. Does this spellchecker
break the word and concatenate them then give me collations?

Thanks


On Fri, May 10, 2013 at 11:34 AM, Dyer, James
<Ja...@ingramcontent.com>wrote:

> Good point, Jason.  In fact, even if you use WorkBreakSpellChecker "wall
> mart" will not correct to "walmart".  The reason is the spellchecker cannot
> both correct a token's spelling *and* fix the wordbreak issue involving
> that same token.  So in this case a synonym is the way to go.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Jason Hellman [mailto:jhellman@innoventsolutions.com]
> Sent: Friday, May 10, 2013 9:55 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Looking for Best Practice of Spellchecker
>
> Nicholas,
>
> Also consider that some misspellings are better handled through Synonyms
> (or injected metadata).
>
> You can garner a great deal of value out of the spell checker by following
> the great advice James is giving here...but you'll find a well-placed
> "helper" synonym or metavalue can often save a lot of headache and time.
>
> Jason
>
> On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com>
> wrote:
>
> > Nicholas,
> >
> > It sounds like you might want to use WordBreakSolrSpellChecker, which
> gets obscure mention in the wiki.  Read through this section:
> http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you
> will see some information.
> >
> > Also, the Solr Example shows how to configure this.  See
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> >
> > Look for...
> >
> > <lst name="spellchecker">
> >  <str name="name">wordbreak</str>
> >  ...
> > </lst>
> >
> > ...and...
> >
> > <requestHandler name="/spell" ...>
> > ...
> > </requestHandler>
> >
> > Also, I'd recommend you take a look at each parameter in the "/spell"
> request handler and read its section on the "spellcheckcomponent" wiki
> page.  You probably will want to set many of these parameters as well.
> >
> > You can get a query to return only spell results simply by specifying
> "rows=0".  However, its one less query to just have it return the results
> also.  If there are no results, your application can check for collations
> and re-issue a collation query.  If there are both results and collations
> returned, you can give the user results with "did-you-mean" suggestions.
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Nicholas Ding [mailto:nicholasdsj@gmail.com]
> > Sent: Friday, May 10, 2013 8:47 AM
> > To: solr-user@lucene.apache.org
> > Subject: Looking for Best Practice of Spellchecker
> >
> > Hi guys,
> >
> > I'm working on a local search project, I wanna integrate spellchecker for
> > the search.
> >
> > So basically, my search engines is used to search local businesses. For
> > example, user could search for "wall mart", here is a typo, I wanna
> > spellchecker to give me Collation for "walmart".
> >
> > My problems are:
> > 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> > mart" as phrase search, but I can't get collation from the spellchecker.
> > 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> > 100% match, but spellchecker can't give me collation also.
> >
> > I read the documents about spellchecker on Solr wiki, but it's very
> brief.
> > I'm wondering is there any best practice of spellchecker, I believe it's
> > widely used in the search, right?
> >
> > And I have another idea, I don't know whether it's valid or not. I want
> to
> > apply spellchecker everything before doing the search, so that I could
> rely
> > on the spellchecker to tell me whether my search could get result or not.
> >
> > Thanks
> > Nicholas
> >
>
>
>
>

RE: Looking for Best Practice of Spellchecker

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Good point, Jason.  In fact, even if you use WorkBreakSpellChecker "wall mart" will not correct to "walmart".  The reason is the spellchecker cannot both correct a token's spelling *and* fix the wordbreak issue involving that same token.  So in this case a synonym is the way to go.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Jason Hellman [mailto:jhellman@innoventsolutions.com] 
Sent: Friday, May 10, 2013 9:55 AM
To: solr-user@lucene.apache.org
Subject: Re: Looking for Best Practice of Spellchecker

Nicholas,

Also consider that some misspellings are better handled through Synonyms (or injected metadata).  

You can garner a great deal of value out of the spell checker by following the great advice James is giving here...but you'll find a well-placed "helper" synonym or metavalue can often save a lot of headache and time.

Jason

On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com> wrote:

> Nicholas,
> 
> It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki.  Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.  
> 
> Also, the Solr Example shows how to configure this.  See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> 
> Look for...
> 
> <lst name="spellchecker">
>  <str name="name">wordbreak</str>
>  ...
> </lst>
> 
> ...and...
> 
> <requestHandler name="/spell" ...>
> ...
> </requestHandler>
> 
> Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page.  You probably will want to set many of these parameters as well.
> 
> You can get a query to return only spell results simply by specifying "rows=0".  However, its one less query to just have it return the results also.  If there are no results, your application can check for collations and re-issue a collation query.  If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nicholas Ding [mailto:nicholasdsj@gmail.com] 
> Sent: Friday, May 10, 2013 8:47 AM
> To: solr-user@lucene.apache.org
> Subject: Looking for Best Practice of Spellchecker
> 
> Hi guys,
> 
> I'm working on a local search project, I wanna integrate spellchecker for
> the search.
> 
> So basically, my search engines is used to search local businesses. For
> example, user could search for "wall mart", here is a typo, I wanna
> spellchecker to give me Collation for "walmart".
> 
> My problems are:
> 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> mart" as phrase search, but I can't get collation from the spellchecker.
> 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> 100% match, but spellchecker can't give me collation also.
> 
> I read the documents about spellchecker on Solr wiki, but it's very brief.
> I'm wondering is there any best practice of spellchecker, I believe it's
> widely used in the search, right?
> 
> And I have another idea, I don't know whether it's valid or not. I want to
> apply spellchecker everything before doing the search, so that I could rely
> on the spellchecker to tell me whether my search could get result or not.
> 
> Thanks
> Nicholas
> 




Re: Looking for Best Practice of Spellchecker

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Nicholas,

Also consider that some misspellings are better handled through Synonyms (or injected metadata).  

You can garner a great deal of value out of the spell checker by following the great advice James is giving hereā€¦but you'll find a well-placed "helper" synonym or metavalue can often save a lot of headache and time.

Jason

On May 10, 2013, at 7:32 AM, "Dyer, James" <Ja...@ingramcontent.com> wrote:

> Nicholas,
> 
> It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki.  Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.  
> 
> Also, the Solr Example shows how to configure this.  See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml
> 
> Look for...
> 
> <lst name="spellchecker">
>  <str name="name">wordbreak</str>
>  ...
> </lst>
> 
> ...and...
> 
> <requestHandler name="/spell" ...>
> ...
> </requestHandler>
> 
> Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page.  You probably will want to set many of these parameters as well.
> 
> You can get a query to return only spell results simply by specifying "rows=0".  However, its one less query to just have it return the results also.  If there are no results, your application can check for collations and re-issue a collation query.  If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -----Original Message-----
> From: Nicholas Ding [mailto:nicholasdsj@gmail.com] 
> Sent: Friday, May 10, 2013 8:47 AM
> To: solr-user@lucene.apache.org
> Subject: Looking for Best Practice of Spellchecker
> 
> Hi guys,
> 
> I'm working on a local search project, I wanna integrate spellchecker for
> the search.
> 
> So basically, my search engines is used to search local businesses. For
> example, user could search for "wall mart", here is a typo, I wanna
> spellchecker to give me Collation for "walmart".
> 
> My problems are:
> 1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
> mart" as phrase search, but I can't get collation from the spellchecker.
> 2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
> 100% match, but spellchecker can't give me collation also.
> 
> I read the documents about spellchecker on Solr wiki, but it's very brief.
> I'm wondering is there any best practice of spellchecker, I believe it's
> widely used in the search, right?
> 
> And I have another idea, I don't know whether it's valid or not. I want to
> apply spellchecker everything before doing the search, so that I could rely
> on the spellchecker to tell me whether my search could get result or not.
> 
> Thanks
> Nicholas
> 


RE: Looking for Best Practice of Spellchecker

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Nicholas,

It sounds like you might want to use WordBreakSolrSpellChecker, which gets obscure mention in the wiki.  Read through this section: http://wiki.apache.org/solr/SpellCheckComponent#Configuration and you will see some information.  

Also, the Solr Example shows how to configure this.  See http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/example/solr/collection1/conf/solrconfig.xml

Look for...

<lst name="spellchecker">
  <str name="name">wordbreak</str>
  ...
</lst>

...and...

<requestHandler name="/spell" ...>
...
</requestHandler>

Also, I'd recommend you take a look at each parameter in the "/spell" request handler and read its section on the "spellcheckcomponent" wiki page.  You probably will want to set many of these parameters as well.

You can get a query to return only spell results simply by specifying "rows=0".  However, its one less query to just have it return the results also.  If there are no results, your application can check for collations and re-issue a collation query.  If there are both results and collations returned, you can give the user results with "did-you-mean" suggestions.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Nicholas Ding [mailto:nicholasdsj@gmail.com] 
Sent: Friday, May 10, 2013 8:47 AM
To: solr-user@lucene.apache.org
Subject: Looking for Best Practice of Spellchecker

Hi guys,

I'm working on a local search project, I wanna integrate spellchecker for
the search.

So basically, my search engines is used to search local businesses. For
example, user could search for "wall mart", here is a typo, I wanna
spellchecker to give me Collation for "walmart".

My problems are:
1. I use DirectSolrSpellChecker on my BusinessNameField and pass "wall
mart" as phrase search, but I can't get collation from the spellchecker.
2. I tried not to pass phrase search, but pass q=Wall AND Mart to force a
100% match, but spellchecker can't give me collation also.

I read the documents about spellchecker on Solr wiki, but it's very brief.
I'm wondering is there any best practice of spellchecker, I believe it's
widely used in the search, right?

And I have another idea, I don't know whether it's valid or not. I want to
apply spellchecker everything before doing the search, so that I could rely
on the spellchecker to tell me whether my search could get result or not.

Thanks
Nicholas