You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/03/01 15:17:00 UTC
[jira] [Created] (TIKA-3306) Clean up ocr routing in 2.0.0
Tim Allison created TIKA-3306:
---------------------------------
Summary: Clean up ocr routing in 2.0.0
Key: TIKA-3306
URL: https://issues.apache.org/jira/browse/TIKA-3306
Project: Tika
Issue Type: Task
Reporter: Tim Allison
I somewhat cleaned up ocr routing in 2.0.0 on an earlier issue. What I didn't like about that is that we overrode/temporarily overwrote the content-type. Let's add a "parser-override" content type to differentiate from a user override content type, and let's not overwrite the content-type for parser-content-type overrides.
In addition to avoiding the muddling of content-type, this fix will also prevent ocr- content types from being written into the xhtml metadata during OCR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)