You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/05/12 17:34:41 UTC

[jira] Resolved: (TIKA-417) Unable to parse the content for UCS2 Litte Endian encoded file

     [ https://issues.apache.org/jira/browse/TIKA-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-417.
--------------------------------

         Assignee: Jukka Zitting
    Fix Version/s: 0.8
       Resolution: Fixed

This problem was caused by a rare MP3 byte pattern that happened to also match the UCS2 LE byte order mark. I've fixed this in revision 943554. Thanks for the problem report!

> Unable to parse the content for UCS2 Litte Endian encoded file
> --------------------------------------------------------------
>
>                 Key: TIKA-417
>                 URL: https://issues.apache.org/jira/browse/TIKA-417
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Windows
>            Reporter: Rajiv Kumar
>            Assignee: Jukka Zitting
>             Fix For: 0.8
>
>         Attachments: TXT_UCS2_LE2.txt
>
>
> I have text file which I encoded in UCS2 Little Endian format using Notepad++. It is unable to parse the content and also it is not throwing any exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.