You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/02/21 21:57:01 UTC

[jira] [Commented] (TIKA-2584) Tika should have a way to pass arbitrary Tesseract options

    [ https://issues.apache.org/jira/browse/TIKA-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372078#comment-16372078 ] 

ASF GitHub Bot commented on TIKA-2584:
--------------------------------------

ewanmellor opened a new pull request #224: Fix for TIKA-2584 contributed by ewanmellor.
URL: https://github.com/apache/tika/pull/224
 
 
   Add TesseractOCRConfig.{add,get}OtherTesseractConfig, plus parsing of
   TesseractOCRConfig.properties to extract any key-value pair where the key
   has an underscore.
   
   Inside TesseractOCRParser, pass these key-value pairs to Tesseract using
   its -c command line option.
   
   This gives a mechanism by which user code can pass arbitrary options to
   Tesseract without Tika having to understand them.
   
   This PR depends on PR 222 / TIKA-2582 because of merge conflicts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Tika should have a way to pass arbitrary Tesseract options
> ----------------------------------------------------------
>
>                 Key: TIKA-2584
>                 URL: https://issues.apache.org/jira/browse/TIKA-2584
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Ewan Mellor
>            Priority: Minor
>
> Tesseract has a very large number of config options (use tesseract --print-parameters to see them).  There is no mechanism for TesseractOCRParser / TesseractOCRConfig to pass these to Tesseract, and so they cannot be controlled by user code.
> Tika should pass these through as opaque key-value pairs, so that user code can set them as necessary.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)