You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Stefan Oestreicher (JIRA)" <ji...@apache.org> on 2008/08/14 12:39:45 UTC
[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle
multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Oestreicher updated SOLR-606:
------------------------------------
Attachment: handler.component.SpellCheckComponent-collate-patch.txt
I recently ran into this exact issue and I found the problem.
The collation is created by replacing the misspelled tokens with the suggestions using a StringBuilder:
{noformat}
for (Iterator<Map.Entry<Token, String>> bestIter = best.entrySet().iterator(); bestIter.hasNext();) {
Map.Entry<Token, String> entry = bestIter.next();
Token tok = entry.getKey();
collation.replace(tok.startOffset(), tok.endOffset(), entry.getValue());
}
{noformat}
As you can see it's just replacing the relevant tokens in the original query. However, if the length of a suggestion doesn't equal the length of the original token, all offsets used after that replacement are no longer valid thus randomly yielding incorrect results.
I fixed that by keeping track of that difference and adding it to the token offsets. For this to work I had to change the HashMap to a LinkedHashMap since this solution depends on the iteration order of the Tokens to correspond to their occurrence in the string.
> spellcheck.colate doesn't handle multiple tokens properly
> ---------------------------------------------------------
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
> Issue Type: Bug
> Components: spellchecker
> Affects Versions: 1.3
> Environment: tomcat
> Reporter: Geoffrey Young
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: handler.component.SpellCheckComponent-collate-patch.txt, SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors when handed a query with multiple tokens.
> {noformat}
> {
> "responseHeader":{
> "params":{
> "q":"redbull air show"}},
> "spellcheck":{
> "suggestions":[
> "redbull",[
> "suggestion",["redbelly"]],
> "show",[
> "suggestion",["shot"]],
> "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between tokens, left over 'w' from input string)
> {noformat}
> {
> "responseHeader":{
> "params":{
> "q":"redbull air show",
> "spellcheck.q":"redbull air show"}},
> "spellcheck":{
> "suggestions":[
> "redbull air show",[
> "suggestion",["redbull singers"]],
> "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a space, but the collation is way off.
> --Geoff
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.