You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tomas Safarik (JIRA)" <ji...@apache.org> on 2013/11/12 10:56:19 UTC

[jira] [Created] (TIKA-1194) Missing text from MS Word (DOC) file

Tomas Safarik created TIKA-1194:
-----------------------------------

             Summary: Missing text from MS Word (DOC) file
                 Key: TIKA-1194
                 URL: https://issues.apache.org/jira/browse/TIKA-1194
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.4
            Reporter: Tomas Safarik
            Priority: Critical


Hello,

we noticed that filtered text from some MS Word DOC files is missing one line (in table) in the original document.

- If you add or remove one character anywhere before the problematic line filtered text is correct. If you get the text bac to original the filtering problem is back.
- If the file is resaved as DOCX filtering works fine.

I will provide sample document. And please let me know if more information is needed.

Regards,

Tomas



--
This message was sent by Atlassian JIRA
(v6.1#6144)