You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/01/31 15:37:34 UTC

[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

    [ https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806833#action_12806833 ] 

Robert Muir commented on SOLR-1670:
-----------------------------------

Steven, i don't have a problem with your patch (I do not wish to be in the way of anyone trying to work on SynonymFilter)

But i want to explain some of where i was coming from.

The main reason i got myself into this mess was to try to add wordnet support to solr. However, this is currently not possible without duplicating a lot of code.
We need to be really careful about allowing any order, it does matter in some situations.
For example, in Lucene's synonymfilter (with wordnet support), it has an option to limit the number of expansions (so its like a top-N synonym expansion).
Solr doesnt currently have this, so its N/A for now, but just an example where the order suddenly becomes important.

only slightly related: we added some improvements to this assertion in lucene recently and found a lot of bugs, better checking for clearAttribute() and end()
at some I would like to port these test improvements over to solr, too. 


> synonymfilter/map repeat bug
> ----------------------------
>
>                 Key: SOLR-1670
>                 URL: https://issues.apache.org/jira/browse/SOLR-1670
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>            Reporter: Robert Muir
>            Assignee: Yonik Seeley
>         Attachments: SOLR-1670.patch, SOLR-1670.patch, SOLR-1670_test.patch
>
>
> as part of converting tests for SOLR-1657, I ran into a problem with synonymfilter
> the test for 'repeats' has a flaw, it uses this assertTokEqual construct which does not really validate that two lists of token are equal, it just stops at the shorted one.
> {code}
>     // repeats
>     map.add(strings("a b"), tokens("ab"), orig, merge);
>     map.add(strings("a b"), tokens("ab"), orig, merge);
>     assertTokEqual(getTokList(map,"a b",false), tokens("ab"));
>     /* in reality the result from getTokList is ab ab ab!!!!! */
> {code}
> when converted to assertTokenStreamContents this problem surfaced. attached is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.