You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tomas Safarik (JIRA)" <ji...@apache.org> on 2015/03/16 11:57:39 UTC

[jira] [Comment Edited] (TIKA-1194) Missing text from MS Word (DOC) file

    [ https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363062#comment-14363062 ] 

Tomas Safarik edited comment on TIKA-1194 at 3/16/15 10:57 AM:
---------------------------------------------------------------

I was finally able to prepare version of document that does not contain any confidential information.

Problem is missing line that should be filtered from last cell:
"mluvil s Patouškem: poslat nabídku"

We tracked the problem to POI where for some reason the cell is discarded because POI thinks the cell is after end of table.


was (Author: tssk):
I was finally able to prepare version of document that is does not contain any confidential information.

Problem is missing line that should be filtered from last cell:
"mluvil s Patouškem: poslat nabídku"

We tracked the problem to POI where for some reason the cell is discarded because POI thinks the cell is after end of table.

> Missing text from MS Word (DOC) file
> ------------------------------------
>
>                 Key: TIKA-1194
>                 URL: https://issues.apache.org/jira/browse/TIKA-1194
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Tomas Safarik
>            Priority: Critical
>         Attachments: OP-06-015.doc
>
>
> Hello,
> we noticed that filtered text from some MS Word DOC files is missing one line (in table cell) in the original document.
> - If you add or remove one character anywhere before the problematic line/cell then the filtered text is correct. If you get the text back to original the filtering problem is back.
> - If the file is resaved as DOCX filtering works fine.
> I will provide sample document. And please let me know if more information is needed.
> Regards,
> Tomas



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)