You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/03/18 14:00:00 UTC

[jira] [Created] (TIKA-3703) Consider adding a frictionless data package output format

Tim Allison created TIKA-3703:
---------------------------------

             Summary: Consider adding a frictionless data package output format
                 Key: TIKA-3703
                 URL: https://issues.apache.org/jira/browse/TIKA-3703
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


For those who want more than just text and metadata, e.g. bytes for thumbnails, or embedded images or embedded files or rendered pages, it would be great to return that data in a standard format.  Our current /unpack endpoint uses a zip file but with our own "standard".

I was thinking about heading down the pure json option by including these byte streams as base64 encoded metadata values in our current metadata object.  Not sure which is the better way to go.

I'm opening this issue to discuss options.

https://frictionlessdata.io/standards/#standards-toolkit



--
This message was sent by Atlassian Jira
(v8.20.1#820001)