You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/01/06 15:44:00 UTC

[jira] [Created] (TIKA-3264) Improve the per page OCR heuristics for AUTO mode

Tim Allison created TIKA-3264:
---------------------------------

             Summary: Improve the per page OCR heuristics for AUTO mode
                 Key: TIKA-3264
                 URL: https://issues.apache.org/jira/browse/TIKA-3264
             Project: Tika
          Issue Type: Improvement
    Affects Versions: 2.0.0
            Reporter: Tim Allison


We're currently using character count per page as the sole reason to run OCR in AUTO mode on PDFs.

Let's use this issue to discuss better options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)