You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2017/11/17 10:25:00 UTC

[jira] [Commented] (TIKA-2505) Tika server output encoding problems

    [ https://issues.apache.org/jira/browse/TIKA-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256798#comment-16256798 ] 

Nick Burch commented on TIKA-2505:
----------------------------------

Are you using 1.6 or 1.16? (They're very different!)

When you made your request to the server, what content encoding headers (if any) did you send?

That said, I think there might be a problem with the PDF itself, or PDFBox. When I try your file with the Tika App, I get errors like these;
{code}
WARN  No Unicode mapping for 0 (0) in font null
WARN  No Unicode mapping for 1 (1) in font null
WARN  No Unicode mapping for 2 (2) in font null
WARN  No Unicode mapping for .notdef (3) in font null
WARN  No Unicode mapping for 4 (4) in font null
WARN  No Unicode mapping for 5 (5) in font null
WARN  No Unicode mapping for 6 (6) in font null
WARN  No Unicode mapping for 7 (7) in font null
WARN  No Unicode mapping for 8 (8) in font null
WARN  No Unicode mapping for 9 (9) in font null
WARN  No Unicode mapping for 10 (10) in font null
WARN  No Unicode mapping for 11 (11) in font null
{code}

> Tika server output encoding problems
> ------------------------------------
>
>                 Key: TIKA-2505
>                 URL: https://issues.apache.org/jira/browse/TIKA-2505
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.16
>            Reporter: Fanni Kovacs
>         Attachments: original.pdf, response.txt
>
>
> Hello,
> We noticed during a conversion of large amount of files, there are some issues when we get a non UTF-8 response from tika server 1.6.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)