You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2017/01/11 16:54:49 UTC

[jira] [Commented] (TIKA-2192) Extract embedded files from headers, footers, footnotes, etc from docx/m

    [ https://issues.apache.org/jira/browse/TIKA-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818813#comment-15818813 ] 

Hudson commented on TIKA-2192:
------------------------------

SUCCESS: Integrated in Jenkins build tika-2.x #194 (See [https://builds.apache.org/job/tika-2.x/194/])
TIKA-2192 (tallison: rev e02084cc64c5a825dae6e16853c5dac3cbb55f46)
* (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XWPFWordExtractorDecorator.java


> Extract embedded files from headers, footers, footnotes, etc from docx/m
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2192
>                 URL: https://issues.apache.org/jira/browse/TIKA-2192
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 2.0, 1.15
>
>
> While working on an alternate SAX parser for docx/docm, I found that we're not currently extracting embedded documents from headers, footers, footnotes, endnotes or comments.  We should fix this in our classic DOM parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)