You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tomas Safarik (JIRA)" <ji...@apache.org> on 2013/11/12 10:56:19 UTC
[jira] [Created] (TIKA-1194) Missing text from MS Word (DOC) file
Tomas Safarik created TIKA-1194:
-----------------------------------
Summary: Missing text from MS Word (DOC) file
Key: TIKA-1194
URL: https://issues.apache.org/jira/browse/TIKA-1194
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.4
Reporter: Tomas Safarik
Priority: Critical
Hello,
we noticed that filtered text from some MS Word DOC files is missing one line (in table) in the original document.
- If you add or remove one character anywhere before the problematic line filtered text is correct. If you get the text bac to original the filtering problem is back.
- If the file is resaved as DOCX filtering works fine.
I will provide sample document. And please let me know if more information is needed.
Regards,
Tomas
--
This message was sent by Atlassian JIRA
(v6.1#6144)