You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ray Gauss II (JIRA)" <ji...@apache.org> on 2012/10/24 20:06:12 UTC

[jira] [Updated] (TIKA-775) Embed Capabilities

     [ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ray Gauss II updated TIKA-775:
------------------------------

    Attachment: embed.diff

Attached is a newer patch which:

- Adds an Embedder interface, similar to Parser, which defines getSupportedEmbedTypes and an embed method
- Adds a base ExternalEmbedder implementation of the Embedder interface, similar to ExternalParser, which can call a command line executable, the default being sed, to perform embedding
- Adds a base ExternalEmbedderTest which 'embeds' lines in a text file then uses a TXTParser to verify the expected embedded metadata exists

The embed methods have been refactored to take an output stream argument for writing to as suggested in the past here.

Unless anyone sees an issue with the concepts or approach I'll commit in a few days.
                
> Embed Capabilities
> ------------------
>
>                 Key: TIKA-775
>                 URL: https://issues.apache.org/jira/browse/TIKA-775
>             Project: Tika
>          Issue Type: Improvement
>          Components: general, metadata
>    Affects Versions: 1.0
>         Environment: The default ExternalEmbedder requires that sed be installed.
>            Reporter: Ray Gauss II
>              Labels: embed, patch
>             Fix For: 1.3
>
>         Attachments: embed.diff, tika-core-embed-patch.txt, tika-parsers-embed-patch.txt
>
>
> This patch defines and implements the concept of embedding tika metadata into a file stream, the reverse of extraction.
> In the tika-core project an interface defining an Embedder and a generic sed ExternalEmbedder implementation meant to be extended or configured are added.  These classes are essentially a reverse flow of the existing Parser and ExternalParser classes.
> In the tika-parsers project an ExternalEmbedderTest unit test is added which uses the default ExternalEmbedder (calls sed) to embed a value placed in Metadata.DESCRIPTION then verify the operation by parsing the resulting stream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira