You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2008/04/25 12:03:55 UTC

[jira] Commented: (SOLR-507) Spell Checking Improvements

    [ https://issues.apache.org/jira/browse/SOLR-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592328#action_12592328 ] 

Shalin Shekhar Mangar commented on SOLR-507:
--------------------------------------------

I have just finished implementing a SpellCheck library (using Lucene) for a project which was not already using Solr. I implemented a few ideas there which can be added to Solr.

 - Given a user query consisting of many words, return just one suggestion for the whole query e.g. search for "hybrd sedn" gives you "hybrid sedan" as a suggestion
 - Give me a suggestion on a per-field basis
 - Never give duplicate words in a suggestion e.g. My index contains "Mercedes-Benz" and user searches for "mercedec bens", he should not get a suggestion like "Mercedes-Benz Mercedes-Benz"
 - Don't try to give a suggestion for tokens less than a given length (my impl used 3). For a query like "mercedes e class" it avoids giving a suggestion like "mercedes e-class c-class"

I understand that these tweaks are often very specific to the use-case, but we can atleast provide the features for people to use as they see fit. In order to implement the multiple-field support, we can change SpellCheckerRequestHandler to create HighFrequencyDictionary for each configured field and add them all to the spell check index. We can use the overloaded suggestSimilar method (which accepts field) to query. If this sounds fine, I can give a patch to add these features.

> Spell Checking Improvements
> ---------------------------
>
>                 Key: SOLR-507
>                 URL: https://issues.apache.org/jira/browse/SOLR-507
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Jayson Minard
>
> Creating a placeholder issue to track Spell Checking Improvements.  Individual issues can later be created and linked for each area of separable concern when they are determined.  
> Areas to discuss include:
> # spell suggestions from within the current query (minus terms being corrected) and filter so that suggestions are always valid
> ** need approaches to merging the spelling list with the current mask of valid records.  Also, is this a better change to Lucene first, or something that belongs in Solr?
> ** need to add spell checking as query component and make available to various query handlers
> ** spell checking to be field specific to support responding correctly with dismax queries
> # spell suggestions from a distributed search (SOLR-303) by augmenting the response, or alternatively just provide a federating of Spell Checker requests on their own and let the application decide when to use each.
> # spell suggestions as a search component to augment other queries
> What are other typical areas of concern, or suggestions for improvements for spell checking that can be tracked?  
> I am willing to look at driving a patch for this area, especially for spell checking working within the current result set, and across  distributed search.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.