You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chaitra Rajappa (Jira)" <ji...@apache.org> on 2021/08/06 17:44:00 UTC
[jira] [Updated] (TIKA-3516) Unexpected charset IBM424_rtl detected
for utf_8 file by CharsetDetector
[ https://issues.apache.org/jira/browse/TIKA-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chaitra Rajappa updated TIKA-3516:
----------------------------------
Description:
Hi,
The CharsetDetector detects the wrong charset for a file as IBM424_rtl.
Resulting in exception
*_java.nio.charset.UnsupportedCharsetException: IBM424_rtl 17 at java.nio.charset.Charset.forName(Charset.java:531)_*
I see there is also an existing ticket with the same issue thats not been fixed.
https://issues.apache.org/jira/browse/TIKA-2396
Please suggest the changes to fix this.
Versions being used:
apache-core - 1.20
apache-parsers-1.20
Thanks
was:
Hi,
The CharsetDetector detects the wrong charset for a file as IBM424_rtl.
Resulting in exception
*_java.nio.charset.UnsupportedCharsetException: IBM424_rtl 17 at java.nio.charset.Charset.forName(Charset.java:531)_*
I see there is also an existing ticket with the same issue thats not been fixed.
https://issues.apache.org/jira/browse/TIKA-2396
Please suggest the changes to fix this.
Thanks
> Unexpected charset IBM424_rtl detected for utf_8 file by CharsetDetector
> --------------------------------------------------------------------------
>
> Key: TIKA-3516
> URL: https://issues.apache.org/jira/browse/TIKA-3516
> Project: Tika
> Issue Type: Bug
> Components: detector, parser
> Reporter: Chaitra Rajappa
> Priority: Major
>
> Hi,
> The CharsetDetector detects the wrong charset for a file as IBM424_rtl.
> Resulting in exception
> *_java.nio.charset.UnsupportedCharsetException: IBM424_rtl 17 at java.nio.charset.Charset.forName(Charset.java:531)_*
> I see there is also an existing ticket with the same issue thats not been fixed.
> https://issues.apache.org/jira/browse/TIKA-2396
> Please suggest the changes to fix this.
> Versions being used:
> apache-core - 1.20
> apache-parsers-1.20
> Thanks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)