You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/10/13 13:08:00 UTC
[jira] [Created] (TIKA-3571) Add an interface for a rendering
engine
Tim Allison created TIKA-3571:
---------------------------------
Summary: Add an interface for a rendering engine
Key: TIKA-3571
URL: https://issues.apache.org/jira/browse/TIKA-3571
Project: Tika
Issue Type: Wish
Reporter: Tim Allison
We've now seen a few requests for extracting text _and_ rendering PDFs, and certainly it might be useful to have alternatives for rendering files (e.g. this [Alfresco study|https://hub.alfresco.com/t5/alfresco-content-services-blog/pdf-rendering-engine-performance-and-fidelity-comparison/ba-p/287618]), including MSOffice or at least PPTx...
And there are cases where users don't want the rendered images, but they do want OCR to be run against the images.
I doubt I'll have a chance to work on this for a while, but I wanted to open an issue for discussion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)