You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/05/03 07:17:03 UTC

[jira] [Created] (TIKA-652) Custom metadata from more formats

Custom metadata from more formats
---------------------------------

                 Key: TIKA-652
                 URL: https://issues.apache.org/jira/browse/TIKA-652
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.9
            Reporter: Nick Burch
            Assignee: Nick Burch


Currently, Tika handles custom metadata from Open Document files. Any custom metadata is returned with a custom: prefix (see OpenOfficeParserTest#testOO2Metadata for example)

Microsoft file formats don't include custom metadata in the parsing, and nor does PDF

Assuming we're happy with including custom metadata from Documents in the parsing step, with the custom: prefix, I'll go ahead and add it for the Microsoft (ole2 and ooxml) and PDF parsers

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (TIKA-652) Custom metadata from more formats

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-652.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.0

Fixed in r1100100. Both the Microsoft Office and Open Document parsers handle custom metadata in the same way now, with the custom: prefix on the entries

> Custom metadata from more formats
> ---------------------------------
>
>                 Key: TIKA-652
>                 URL: https://issues.apache.org/jira/browse/TIKA-652
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> Currently, Tika handles custom metadata from Open Document files. Any custom metadata is returned with a custom: prefix (see OpenOfficeParserTest#testOO2Metadata for example)
> Microsoft file formats don't include custom metadata in the parsing, and nor does PDF
> Assuming we're happy with including custom metadata from Documents in the parsing step, with the custom: prefix, I'll go ahead and add it for the Microsoft (ole2 and ooxml) and PDF parsers

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira