You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/05/14 22:53:00 UTC

[jira] [Commented] (TIKA-3306) Clean up ocr routing in 2.0.0

    [ https://issues.apache.org/jira/browse/TIKA-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344908#comment-17344908 ] 

Hudson commented on TIKA-3306:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #232 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/232/])
TIKA-3306 -- add timeout in PipesClient (tallison: [https://github.com/apache/tika/commit/e687ac93073e6a4897486b82f980f9dd144d2c6f])
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncConfig.java
* (edit) tika-app/src/main/resources/log4j2.properties
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesClient.java
* (edit) tika-server/tika-server-client/src/test/resources/tika-config-simple-fs-emitter.xml
* (edit) tika-app/src/main/resources/log4j2_batch_process.properties
* (edit) tika-server/tika-server-standard/src/main/resources/log4j2.properties
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesConfigBase.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncEmitter.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/async/AsyncProcessor.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesResult.java
* (edit) tika-core/src/main/java/org/apache/tika/pipes/PipesServer.java
* (edit) tika-core/src/test/java/org/apache/tika/pipes/async/AsyncProcessorTest.java


> Clean up ocr routing in 2.0.0
> -----------------------------
>
>                 Key: TIKA-3306
>                 URL: https://issues.apache.org/jira/browse/TIKA-3306
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Trivial
>
> I somewhat cleaned up ocr routing in 2.0.0 on an earlier issue.  What I didn't like about that is that we overrode/temporarily overwrote the content-type.  Let's add a "parser-override" content type to differentiate from a user override content type, and let's not overwrite the content-type for parser-content-type overrides.
>  
> In addition to avoiding the muddling of content-type, this fix will also prevent ocr- content types from being written into the xhtml metadata during OCR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)