You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/06/03 18:55:59 UTC

[jira] [Created] (TIKA-1995) Improve OCR Strategy options for the PDFParser

Tim Allison created TIKA-1995:
---------------------------------

             Summary: Improve OCR Strategy options for the PDFParser
                 Key: TIKA-1995
                 URL: https://issues.apache.org/jira/browse/TIKA-1995
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


On TIKA-1994, we added the capability to run OCR on a full page for PDFs instead of the inline images.  The initial patch only had three OCR strategies: no_ocr, ocr_only, ocr_and_text.  Let's add other strategies that might improve performance (speed/accuracy/redundancy).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)