You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/03/01 15:17:00 UTC

[jira] [Created] (TIKA-3306) Clean up ocr routing in 2.0.0

Tim Allison created TIKA-3306:
---------------------------------

             Summary: Clean up ocr routing in 2.0.0
                 Key: TIKA-3306
                 URL: https://issues.apache.org/jira/browse/TIKA-3306
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


I somewhat cleaned up ocr routing in 2.0.0 on an earlier issue.  What I didn't like about that is that we overrode/temporarily overwrote the content-type.  Let's add a "parser-override" content type to differentiate from a user override content type, and let's not overwrite the content-type for parser-content-type overrides.

 

In addition to avoiding the muddling of content-type, this fix will also prevent ocr- content types from being written into the xhtml metadata during OCR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)