You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2020/10/07 18:07:00 UTC

[jira] [Resolved] (TIKA-3044) add -C/--content cli option using WriteOutContentHandler

     [ https://issues.apache.org/jira/browse/TIKA-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-3044.
-------------------------------
    Resolution: Fixed

Thank you [~alexander.klimetschek]! Let me know if I botched anything.

> add -C/--content cli option using WriteOutContentHandler
> --------------------------------------------------------
>
>                 Key: TIKA-3044
>                 URL: https://issues.apache.org/jira/browse/TIKA-3044
>             Project: Tika
>          Issue Type: New Feature
>          Components: cli
>            Reporter: Alexander Klimetschek
>            Priority: Major
>
> For text extraction, the cli currently provides both --text and --text-main options. For html files, --text will return the body, while --text-main will only return the title. There is currently no cli option that gives all text content. However, the Tika API has the WriteOutContentHandler which does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)