You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Marcin Gil (JIRA)" <ji...@apache.org> on 2016/02/21 19:56:18 UTC

[jira] [Created] (TIKA-1863) --text-main content missing in output file

Marcin Gil created TIKA-1863:
--------------------------------

             Summary: --text-main content missing in output file
                 Key: TIKA-1863
                 URL: https://issues.apache.org/jira/browse/TIKA-1863
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.12
         Environment: Windows 10 64
            Reporter: Marcin Gil


When converting both PDF and DOC files to text with following command
java -jar tika.jar --text-main --encoding=UTF-8 input.pdf > output.txt

The output file is missing a random amount of LAST and FIRST lines in input file. 

Example file:
https://dl.dropboxusercontent.com/u/11435743/tika-issue-1.pdf
Text starting from "15 Akt oskarżenia" is missing (at the bottom of the file).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)