You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tanner Postert <ta...@gmail.com> on 2011/05/31 17:31:07 UTC

Better Spellcheck

I've tried to use a spellcheck dictionary built from my own content, but my
content ends up having a lot of misspelled words so the spellcheck ends up
being less than effective. I could use a standard dictionary, but it may
have problems with proper nouns. It also misses phrases. When someone
searches for "Untied States" I would hope the spellcheck would suggest
"United States" but it just recognizes that "untied" is a valid word and
doesn't suggest any thing.

Is there any way around this? Are there any third party modules or
spellcheck systems that I could implement to get these type of features?

Re: Better Spellcheck

Posted by Alexey Serba <as...@gmail.com>.
> I've tried to use a spellcheck dictionary built from my own content, but my
> content ends up having a lot of misspelled words so the spellcheck ends up
> being less than effective.
You can try to use sp.dictionary.threshold parameter to solve this problem
* http://wiki.apache.org/solr/SpellCheckerRequestHandler#sp.dictionary.threshold

> It also misses phrases. When someone
> searches for "Untied States" I would hope the spellcheck would suggest
> "United States" but it just recognizes that "untied" is a valid word and
> doesn't suggest any thing.
So you are saying about auto suggest component and not spellcheck
right? These are two different use cases.

If you want auto suggest and you have some search logs for your system
then you can probably use the following solution:
* http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

If you don't have significant search logs history and want to populate
your auto suggest dictionary from index or some text file you should
check
* http://wiki.apache.org/solr/Suggester

Re: Better Spellcheck

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Tanner,

We have something we call DYM ReSearcher that helps in situations like these, 
esp. with multi-word queries that Lucene/Solr spellcheckers have trouble with.

See http://sematext.com/products/dym-researcher/index.html

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Tanner Postert <ta...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, May 31, 2011 11:31:07 AM
> Subject: Better Spellcheck
> 
> I've tried to use a spellcheck dictionary built from my own content, but  my
> content ends up having a lot of misspelled words so the spellcheck ends  up
> being less than effective. I could use a standard dictionary, but it  may
> have problems with proper nouns. It also misses phrases. When  someone
> searches for "Untied States" I would hope the spellcheck would  suggest
> "United States" but it just recognizes that "untied" is a valid word  and
> doesn't suggest any thing.
> 
> Is there any way around this? Are there  any third party modules or
> spellcheck systems that I could implement to get  these type of features?
>