You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Stefan Oestreicher (JIRA)" <ji...@apache.org> on 2008/08/14 12:39:45 UTC

[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

     [ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Oestreicher updated SOLR-606:
------------------------------------

    Attachment: handler.component.SpellCheckComponent-collate-patch.txt

I recently ran into this exact issue and I found the problem.
The collation is created by replacing the misspelled tokens with the suggestions using a StringBuilder:

{noformat}
for (Iterator<Map.Entry<Token, String>> bestIter = best.entrySet().iterator(); bestIter.hasNext();) {
        Map.Entry<Token, String> entry = bestIter.next();
        Token tok = entry.getKey();
        collation.replace(tok.startOffset(), tok.endOffset(), entry.getValue());
}
{noformat}

As you can see it's just replacing the relevant tokens in the original query. However, if the length of a suggestion doesn't equal the length of the original token, all offsets used after that replacement are no longer valid thus randomly yielding incorrect results.
I fixed that by keeping track of that difference and adding it to the token offsets. For this to work I had to change the HashMap to a LinkedHashMap since this solution depends on the iteration order of the Tokens to correspond to their occurrence in the string.

> spellcheck.colate doesn't handle multiple tokens properly
> ---------------------------------------------------------
>
>                 Key: SOLR-606
>                 URL: https://issues.apache.org/jira/browse/SOLR-606
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 1.3
>         Environment: tomcat
>            Reporter: Geoffrey Young
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: handler.component.SpellCheckComponent-collate-patch.txt, SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
> 	"q":"redbull air show"}},
>   "spellcheck":{
>    "suggestions":[
> 	"redbull",[
> 	 "suggestion",["redbelly"]],
> 	"show",[
> 	 "suggestion",["shot"]],
> 	"collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
> 	"q":"redbull air show",
> 	"spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
> 	"redbull air show",[
> 	 "suggestion",["redbull singers"]],
> 	"collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.