You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andrey Klochkov <ak...@griddynamics.com> on 2009/11/27 12:09:08 UTC

restore space between words by spell checker

Hi

If a user issued a misspelled query, forgetting to place space between
words, is it possible to fix it with a spell checker or by some other
mechanism?

For example, if we get query "tommyhitfiger" and have terms "tommy" and
"hitfiger" in the index, how to fix the query?

-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics

Re: restore space between words by spell checker

Posted by Andrey Klochkov <ak...@griddynamics.com>.
>
>
>>> For example, if we get query "tommyhitfiger" and have terms "tommy" and
>>> "hitfiger" in the index, how to fix the query?
>>>
>>
> The usual approach to solving this is to index compound words, i.e. when
> producing a spellchecker dictionary add a record "tommyhitfiger" with a
> field that points to "tommy hitfiger". Details vary depending on what
> spellchecking impl. you use.
>

I'm using the default Solr's spell checker, which is using n-gram index and
Levenshtein distance. Can it's be customized to include compound words? What
alternative spell checkers for Lucene/Solr do exist?

I tried to experiment with Lucene spell checker and noticed that if
configured with a low accuracy it can find words "tommy" and "hilfiger" that
form the whole word. So I was able to create some logic which post-process
spell checker results and finds the correct query "tommy hilfiger". It just
iterates over all possible combinations of terms suggested by spell checker
and compares the resulting query to original by DoubleMetaphor. I'm not sure
that this is the best solution though, probably it's just not fast enough.

-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics

Re: restore space between words by spell checker

Posted by Andrzej Bialecki <ab...@getopt.org>.
Otis Gospodnetic wrote:
> I'm not sure if that can be easily done (other than going char by char and testing), because nothing indicates where the space might be, not even an upper case there.  I'd be curious to know if you find a better solution.
> 
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> 
> 
> 
> ----- Original Message ----
>> From: Andrey Klochkov <ak...@griddynamics.com>
>> To: solr-user <so...@lucene.apache.org>
>> Sent: Fri, November 27, 2009 6:09:08 AM
>> Subject: restore space between words by spell checker
>>
>> Hi
>>
>> If a user issued a misspelled query, forgetting to place space between
>> words, is it possible to fix it with a spell checker or by some other
>> mechanism?
>>
>> For example, if we get query "tommyhitfiger" and have terms "tommy" and
>> "hitfiger" in the index, how to fix the query?

The usual approach to solving this is to index compound words, i.e. when 
producing a spellchecker dictionary add a record "tommyhitfiger" with a 
field that points to "tommy hitfiger". Details vary depending on what 
spellchecking impl. you use.



-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: restore space between words by spell checker

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I'm not sure if that can be easily done (other than going char by char and testing), because nothing indicates where the space might be, not even an upper case there.  I'd be curious to know if you find a better solution.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Andrey Klochkov <ak...@griddynamics.com>
> To: solr-user <so...@lucene.apache.org>
> Sent: Fri, November 27, 2009 6:09:08 AM
> Subject: restore space between words by spell checker
> 
> Hi
> 
> If a user issued a misspelled query, forgetting to place space between
> words, is it possible to fix it with a spell checker or by some other
> mechanism?
> 
> For example, if we get query "tommyhitfiger" and have terms "tommy" and
> "hitfiger" in the index, how to fix the query?
> 
> -- 
> Andrew Klochkov
> Senior Software Engineer,
> Grid Dynamics