You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Priya (Jira)" <ji...@apache.org> on 2021/09/08 06:32:00 UTC

[jira] [Created] (TIKA-3545) TIKA PDF parsing issues

Priya created TIKA-3545:
---------------------------

             Summary: TIKA PDF parsing issues
                 Key: TIKA-3545
                 URL: https://issues.apache.org/jira/browse/TIKA-3545
             Project: Tika
          Issue Type: Bug
          Components: parser, tika-server
    Affects Versions: 1.21
         Environment: Tested on DEV env
            Reporter: Priya
         Attachments: 365.jpg

I am using tika-core 1.21 and tika-parsers 1.21 jar files as tika dependencies in Manifoldcf 2.14 version to crawl some files, Out of which some of the PDF's files are not getting parsed correctly.
Getting some issues while parsing *PDF* files. Some strange characters appeared, tried changing Tika jar files version also 1.24 and 1.27 (for 1.27-it didn't even extract files correctly).
 
Also checked with the document content, it seems to be fine.
Can anybody help me on this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)