You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Konstantin Gribov (JIRA)" <ji...@apache.org> on 2015/03/11 19:15:38 UTC

[jira] [Created] (TIKA-1574) Frames in header/footer in doc files aren't extracted

Konstantin Gribov created TIKA-1574:
---------------------------------------

             Summary: Frames in header/footer in doc files aren't extracted
                 Key: TIKA-1574
                 URL: https://issues.apache.org/jira/browse/TIKA-1574
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.7
         Environment: linux, openjdk7/openjdk8
            Reporter: Konstantin Gribov
            Assignee: Konstantin Gribov


Text from frames in header/footer are omitted in WordParser. Text from frames in document body are extracted fine.
Same document converted to docx is extracted fully.

Maybe, it's upstream bug, I'll dig into it and file a ticket to poi bugtracker if it's the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)