You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2022/06/09 15:09:00 UTC

[jira] [Commented] (TIKA-3789) Allow parsers to pass embedded metadata to container file's metadata

    [ https://issues.apache.org/jira/browse/TIKA-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552262#comment-17552262 ] 

Hudson commented on TIKA-3789:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika ยป tika-main-jdk8 #635 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/635/])
TIKA-3789: Allow custom embedded parsers and EmbeddedDocumentHandlers to add metadata to the container file's metadata (tallison: [https://github.com/apache/tika/commit/3778ecb131a379a8445b5cf5ce5cc9d37069f7f2])
* (edit) tika-core/src/test/java/org/apache/tika/parser/mock/MockParser.java
* (edit) tika-core/src/main/java/org/apache/tika/parser/ParseRecord.java
* (edit) tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
* (edit) CHANGES.txt
* (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/test-documents/mock/embedded_to_parent_metadata.xml.gz
* (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java


> Allow parsers to pass embedded metadata to container file's metadata
> --------------------------------------------------------------------
>
>                 Key: TIKA-3789
>                 URL: https://issues.apache.org/jira/browse/TIKA-3789
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>
> There are some use cases where custom parsers might want to pass metadata from embedded files to the parent's metadata in the /tika (json) output or programmatically.
> We can follow the pattern in TIKA-3788.
> As with TIKA-3788, this metadata will be written after the parse so it will not show up in standard xhtml output (e.g. /tika (html/xhtml) or programmatically in the XHTMLContentHandler).  However, it will appear in the json output option from /tika and in the Metadata object programmatically.
> As with TIKA-3788, we encourage using the /rmeta endpoint, -J in tika-app or the RecursiveParserWrapper instead of this option.  However, for those who need to work with a flattened view of a document, this can be invaluable.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)