You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matthew Caruana Galizia (JIRA)" <ji...@apache.org> on 2017/01/11 11:48:58 UTC

[jira] [Created] (TIKA-2235) Use Tesseract's recommended DPI for PDF images

Matthew Caruana Galizia created TIKA-2235:
---------------------------------------------

             Summary: Use Tesseract's recommended DPI for PDF images
                 Key: TIKA-2235
                 URL: https://issues.apache.org/jira/browse/TIKA-2235
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.14
            Reporter: Matthew Caruana Galizia
            Priority: Minor


From the [Tesseract wiki|https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality]:

{quote}
Tesseract works best on images which have a DPI of at least 300 dpi....
{quote}

PDFParserConfig is currently initialised with a value of 200 for ocrDPI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)