You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/09/15 18:44:09 UTC

[jira] [Resolved] (TIKA-683) RTF Parser issues with non european characters

     [ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved TIKA-683.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

I'll open a follow-on issue for the mis-matched XHTML events from some parsers....

> RTF Parser issues with non european characters
> ----------------------------------------------
>
>                 Key: TIKA-683
>                 URL: https://issues.apache.org/jira/browse/TIKA-683
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Michael McCandless
>             Fix For: 1.0
>
>         Attachments: TIKA-683-unicode-testcase.patch, TIKA-683.patch, TIKA-683.patch, TIKA-683.patch, TIKA-683.patch, testRTFJapanese.rtf, testUnicodeUCNControlWordCharacterDoubling.rtf, testWORD_bold_character_runs.docx, testWORD_bold_character_runs2.docx
>
>
> As reported on user@ in "non-West European languages support":
>   http://mail-archives.apache.org/mod_mbox/tika-user/201107.mbox/%3COF0C0A3275.DA7810E9-ONC22578CC.0051EEDE-C22578CC.0052548B@il.ibm.com%3E
> The RTF Parser seems to be doubling up some non-european characters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira