You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Commented) (JIRA)" <ji...@apache.org> on 2012/04/11 16:19:16 UTC

[jira] [Commented] (LUCENE-3971) MappingCharFilter rarely has wrong correctOffset (for finalOffset)

    [ https://issues.apache.org/jira/browse/LUCENE-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251614#comment-13251614 ] 

Robert Muir commented on LUCENE-3971:
-------------------------------------

I agree Dawid: what do you think about the difficulty of LUCENE-3830? 

I feel like with an FST, the logic would probably be easier, and
the filter would probably be faster (and we have pretty good tests,
in general this thing works, this is just a corner case).

On the other hand if there is a simple way we can fix the bug in the
existing code it could be nice, e.g. for a future 3.6.1 or something
like that.

But I'll take any solutions anyone has :)
                
> MappingCharFilter rarely has wrong correctOffset (for finalOffset) 
> -------------------------------------------------------------------
>
>                 Key: LUCENE-3971
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3971
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-3971_test.patch
>
>
> Found this bug over on LUCENE-3969, but I'm currently tracking a ton of bugs, so
> I figure I would open an issue and see if this one is obvious to anyone:
> Consider this input string: "gzw f quaxot" (length = 12) with a WhitespaceTokenizer.
> If i have mapping rules like this, then it works!:
> {noformat}
> "t" => ""
> {noformat}
> But if I have mapping rules like this:
> {noformat}
> "t" => ""
> "tmakdbl" => "c"
> {noformat}
> Then it will compute final offset wrong:
> {noformat}
>     [junit] junit.framework.AssertionFailedError: finalOffset  expected:<12> but was:<11>
> {noformat}
> Looks like some logic/recursion bug in the correctOffset method? The second rule is not even "used" for this string,
> it just happens to also start with 't'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org