You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/06/02 15:27:59 UTC

[jira] [Assigned] (TIKA-1994) Integrate OCR with PDFParser

     [ https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison reassigned TIKA-1994:
---------------------------------

    Assignee: Tim Allison

> Integrate OCR with PDFParser
> ----------------------------
>
>                 Key: TIKA-1994
>                 URL: https://issues.apache.org/jira/browse/TIKA-1994
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>
> Users can now run OCR on individual images embedded inline with PDFs if they do the right configuration.  
> It might be useful to run OCR against each rendered page (instead of the component images). 
> Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912).  This will allow us to experiment with strategies until the cleaner integration is available with PDFBox 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)