You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/06/03 18:55:59 UTC
[jira] [Created] (TIKA-1995) Improve OCR Strategy options for the
PDFParser
Tim Allison created TIKA-1995:
---------------------------------
Summary: Improve OCR Strategy options for the PDFParser
Key: TIKA-1995
URL: https://issues.apache.org/jira/browse/TIKA-1995
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
On TIKA-1994, we added the capability to run OCR on a full page for PDFs instead of the inline images. The initial patch only had three OCR strategies: no_ocr, ocr_only, ocr_and_text. Let's add other strategies that might improve performance (speed/accuracy/redundancy).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)