You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/08/09 14:58:00 UTC
[jira] [Updated] (TIKA-3834) Tika-Server can not get the text of a document encoding in GB18030.
[ https://issues.apache.org/jira/browse/TIKA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-3834:
------------------------------
Priority: Trivial (was: Critical)
> Tika-Server can not get the text of a document encoding in GB18030.
> -------------------------------------------------------------------
>
> Key: TIKA-3834
> URL: https://issues.apache.org/jira/browse/TIKA-3834
> Project: Tika
> Issue Type: Bug
> Components: tika-server
> Affects Versions: 2.3.0
> Environment: Linux
> Reporter: Di Dongke
> Priority: Trivial
> Labels: tika-server
> Attachments: 111.csv, 112.csv
>
>
> There are 2 files :
> 111.csv (Content-Encoding: UTF-8)
> 112.csv (Content-Encoding: GB18030)
>
> Tika-app can get the text of the two files.
> java -jar tika-app-1.24.1.jar -t 111.csv
> java -jar tika-app-1.24.1.jar -t 112.csv
>
> Tika-server can get the text of 111.csv.
> curl -T 111.csv http://127.0.0.1:12000/tika --head "Accept: text/plain"
>
> {color:#FF0000}But Tika-server can not get the text of 112.csv.{color}
> curl -T 112.csv http://127.0.0.1:12000/tika --head "Accept: text/plain"
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)