You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Thejan Wijesinghe (JIRA)" <ji...@apache.org> on 2017/03/18 08:37:41 UTC

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

    [ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931124#comment-15931124 ] 

Thejan Wijesinghe commented on TIKA-2293:
-----------------------------------------

# So I have created Tess4JOCRParser and it is working smoothly with multiple image types including png, jpg, jpeg, tiff, bmp, gif, jp2, jpx and ppm. 

# I wrote a benchmark test to compare this parser with the TesseractOCRParser and you can see the results below,

# TesseractOCRParser took 449 seconds to OCR 100 images while Tess4JOCRParser only took 417 seconds. This result varies time to time, but most of the times Tess4JOCRParser OCR an image, 300 ms faster than the    TesseractOCRParser, refer to the following links to refer to the source files in my repo. 

https://github.com/ThejanW/tika/blob/TIKA-2293/tika-parsers/src/main/java/org/apache/tika/parser/ocr/Tess4JOCRParser.java
https://github.com/ThejanW/tika/blob/TIKA-2293/tika-parsers/src/test/java/org/apache/tika/parser/ocr/Tess4JOCRParserTest.java





>  Tess4jOCRParser - A simpler Java version of TesseractOCRParser
> ---------------------------------------------------------------
>
>                 Key: TIKA-2293
>                 URL: https://issues.apache.org/jira/browse/TIKA-2293
>             Project: Tika
>          Issue Type: Improvement
>          Components: ocr
>            Reporter: Thejan Wijesinghe
>             Fix For: 1.15
>
>
> Right now, TesseractOCRParser calls tesseract and imagemagick from command line. Intention of this new parser "Tess4jOCRParser" is to use the Tess4J API instead of the runtime.exec way to executing tesseract out of process.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)