You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/05/15 23:18:47 UTC

[jira] [Resolved] (TIKA-646) tika command line can't extract metadata for OOXML files

     [ https://issues.apache.org/jira/browse/TIKA-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-646.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

I've added a EndDocumentShieldingContentHandler to Tika core, then in r1103546 I've used this to ensure that the OOXML and ODF parsers hold off their end document call until after the metadata parsing. This seems to have fixed the Tika app

> tika command line can't extract metadata for OOXML files
> --------------------------------------------------------
>
>                 Key: TIKA-646
>                 URL: https://issues.apache.org/jira/browse/TIKA-646
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Maxim Valyanskiy
>             Fix For: 1.0
>
>
> Tika CLI application displays metadata on endDocument() event. Some parsers (OOXML for example) fills metadata after text extraction (after endDocument), that data is missed in output.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira