You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/08/18 16:37:00 UTC

[jira] [Resolved] (TIKA-3518) Tika 1.26 not Working with Tesseract 4.0 and Higher Version

     [ https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-3518.
-------------------------------
    Fix Version/s: 2.1.0
       Resolution: Fixed

> Tika 1.26 not Working with Tesseract 4.0 and Higher Version
> -----------------------------------------------------------
>
>                 Key: TIKA-3518
>                 URL: https://issues.apache.org/jira/browse/TIKA-3518
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr, tika-batch, tika-dl, tika-server
>    Affects Versions: 1.26
>            Reporter: Abha
>            Priority: Major
>             Fix For: 2.1.0
>
>
> ProcessBuilder not creating tmp file for Tesseract 4.1 and Higher Versions With Tika 1.26 and JDK 1.8
> I am working on a project which integrates Tika and Tesseract OCR Tika Version is 1.26, JDK 1.8 Now for any Tesseract Version earlier than 4.0 works fine and extracts the image/pdf data correctly But upgrading the TesseractOCR to 4.1.1 or Higher results in no data extraction. I debugged the issue and found that the ProcessBuilder is not creating the temporary txt output file from which TesseractOCR extracts the result, resulting in the issue. Any idea if this is a version compatibility issue Or How to resolve this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)