You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Petr Vas (JIRA)" <ji...@apache.org> on 2014/08/12 13:47:14 UTC

[jira] [Comment Edited] (TIKA-93) OCR support

    [ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093987#comment-14093987 ] 

Petr Vas edited comment on TIKA-93 at 8/12/14 11:45 AM:
--------------------------------------------------------

[~chrismattmann], do you know when we can expect this OCR parser to appear in released version (i.e. is there any expected release date for Tika 1.7)?
Would there be any RC / beta version that can be used?

I can see that previous versions of Tika used to be released each half year or so and it puts 1.7 release date somewhere in Feb 2015. Does it sounds right?


was (Author: yonyonson):
Chris, do you know when we can expect this OCR parser to appear in released version (i.e. is there any expected release date for Tika 1.7)?
Would there be any RC / beta version that can be used?

I can see that previous versions of Tika used to be released each half year or so and it puts 1.7 release date somewhere in Feb 2015. Does it sounds right?

> OCR support
> -----------
>
>                 Key: TIKA-93
>                 URL: https://issues.apache.org/jira/browse/TIKA-93
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.7
>
>         Attachments: TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TesseractOCRParser.patch, TesseractOCRParser.patch, TesseractOCR_Tyler.patch, TesseractOCR_Tyler_v2.patch, testOCR.docx, testOCR.pdf, testOCR.pptx
>
>
> I don't know of any decent open source pure Java OCR libraries, but there are command line OCR tools like Tesseract (http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika to extract text content (where available) from image files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)