You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/07/15 17:36:00 UTC
[jira] [Updated] (TIKA-683) RTF Parser issues with non european
characters
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Burch updated TIKA-683:
----------------------------
Attachment: testRTFJapanese.rtf
Add test file. Based on Jp_euc-jp_rtf1.rtf from http://mail-archives.apache.org/mod_mbox/tika-user/201106.mbox/%3COF03CF5CF6.40C9789F-ONC22578BC.0035A24F-C22578BC.0036C220@il.ibm.com%3E but with images removed to keep the size sane
> RTF Parser issues with non european characters
> ----------------------------------------------
>
> Key: TIKA-683
> URL: https://issues.apache.org/jira/browse/TIKA-683
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9
> Reporter: Nick Burch
> Attachments: testRTFJapanese.rtf
>
>
> As reported on user@ in "non-West European languages support":
> http://mail-archives.apache.org/mod_mbox/tika-user/201107.mbox/%3COF0C0A3275.DA7810E9-ONC22578CC.0051EEDE-C22578CC.0052548B@il.ibm.com%3E
> The RTF Parser seems to be doubling up some non-european characters
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira