You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/06/11 07:21:08 UTC

[jira] Commented: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

    [ https://issues.apache.org/jira/browse/LUCENE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718300#action_12718300 ] 

Robert Muir commented on LUCENE-1545:
-------------------------------------

if you are looking for a more short-term solution (since i think 1488 will take quite a bit more time), it would be possible to make StandardAnalyzer more 'unicode-friendly'.

its not possible to make it 'correct', and adding additional unicode friendliness would make backwards compat a much more complex issue (different unicode versions across JVM  versions, etc).

but if you want, i'm willing to come up with some minor grammar changes for StandardAnalyzer that could help things like this.


> Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1545
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1545
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.4
>         Environment: Linux x86_64, Sun Java 1.6
>            Reporter: Andreas Hauser
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: AnalyzerTest.java
>
>
> Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E.
> The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining character is lost.
> Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org