You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Manolo Caracuel (JIRA)" <ji...@apache.org> on 2018/01/06 02:05:00 UTC

[jira] [Issue Comment Deleted] (TIKA-2542) Support in tika-server for getting plain text and metadata at the same time

     [ https://issues.apache.org/jira/browse/TIKA-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manolo Caracuel updated TIKA-2542:
----------------------------------
    Comment: was deleted

(was: Pull request:

https://github.com/apache/tika/pull/216)

> Support in tika-server for getting plain text and metadata at the same time
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2542
>                 URL: https://issues.apache.org/jira/browse/TIKA-2542
>             Project: Tika
>          Issue Type: Improvement
>          Components: core, server
>    Affects Versions: 1.17
>            Reporter: Manolo Caracuel
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.18
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> It would be good to have a way to get a files plain text extracted and also get the metadata detected. Currently you can only get the metadata if the request has Accepts of text/xml or text/html but then the text in the body is not the plain text as it contains html elements as well.
> I propose that when requesting /tika/plain with Accepts header of text/xml, an xhtml document is returned with the metadata in head's meta elements and the plain text in the body.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)