You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (Updated) (JIRA)" <ji...@apache.org> on 2011/12/29 18:59:30 UTC

[jira] [Updated] (SOLR-2993) Integrate WordBreakSpellChecker with Solr

     [ https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2993:
-----------------------------

    Attachment: SOLR-2993.patch

Patch adds features described in this issue.  Users can create a Dictionary configuration in solrconfig.xml like this:

{code:xml}
<lst name="spellchecker">
 <str name="name">wordbreak</str>
 <str name="classname">solr.WordBreakSolrSpellChecker</str>      
 <str name="field">lowerfilt</str>
 <str name="combineWords">true</str>
 <str name="breakWords">true</str>
 <int name="maxChanges">10</int>
</lst>
{code}

Users can also specify multiple "spellcheck.dictionary" parameters.  All specified dictionaries are consulted and results are interleaved. (this is handled by the new ConjunctionSolrSpellChecker) Collations are created with combinations from the different spellcheckers, with care taken that mutliple overlapping corrections do not occur in the same collation.

{code:xml}
<requestHandler name="spellCheckWithWordbreak" class="org.apache.solr.handler.component.SearchHandler">
 <lst name="defaults">
  <str name="spellcheck.dictionary">default</str>
  <str name="spellcheck.dictionary">wordbreak</str>
  <str name="spellcheck.count">20</str>
 </lst>
 <arr name="last-components">
  <str>spellcheck</str>
 </arr>
</requestHandler>
{code}

A future enhancement (outside the scope of this issue) would be to extend ConjunctionSolrSpellChecker to allow arbitrary dictionary combinations.  For instance, if a user wanted to query two fields and have two separate dictionaries consulted for each field, etc.  With this patch, however, ConjunctionSolrSpellChecker is intended to be used to add Word-Break suggestions in with Single-Word suggestions.
                
> Integrate WordBreakSpellChecker with Solr
> -----------------------------------------
>
>                 Key: SOLR-2993
>                 URL: https://issues.apache.org/jira/browse/SOLR-2993
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud, spellchecker
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use of shingle-based dictionaries.  
> - Seamlessly integrate word-break suggestions with single-word spelling corrections from the existing FileBased-, IndexBased- or Direct- spell checkers.  
> - Provide collation support for word-break errors including cases where the user has a mix of single-word spelling errors and word-break errors in the same query.  
> - Provide shard support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org