You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/27 03:44:00 UTC

[jira] [Commented] (TIKA-2613) Tesseract 4.0 has removed -psm, so Tika must update

    [ https://issues.apache.org/jira/browse/TIKA-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414971#comment-16414971 ] 

ASF GitHub Bot commented on TIKA-2613:
--------------------------------------

ewanmellor opened a new pull request #230: Fix for TIKA-2613 contributed by ewanmellor.
URL: https://github.com/apache/tika/pull/230
 
 
   Change -psm on the Tesseract command line to --psm, with two dashes.
   This matches a change in Tesseract 4.0 to remove the one-dash version.
   It has been deprecated since Nov 2016.
   
   The Tesseract cset is ee201e1f4.
   
   Also, move the config file (i.e. getOutputType in Tika's terms) so that it
   is the last parameter on the command line.  Tesseract logs an error
   message (though otherwise doesn't fail) if the config file is not the
   last thing on the command line.
   
   This PR depends upon PR #224 / TIKA 2584 and PR #222 / TIKA-2582 because of merge conflicts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Tesseract 4.0 has removed -psm, so Tika must update
> ---------------------------------------------------
>
>                 Key: TIKA-2613
>                 URL: https://issues.apache.org/jira/browse/TIKA-2613
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Ewan Mellor
>            Priority: Major
>
> Tesseract 4.0 (currently in beta-1) has removed the {{\-psm}} flag, in favor of {{\-\-psm}} (with two dashes).
> The {{\-\-psm}} variant was introduced in Nov 2016, so it should be safe to simply switch Tika to use the two-dash variant, even for people still using Tesseract 3.05.
> For reference, the Tesseract cset is:
> {code}
> commit ee201e1f4fa277a4b2ecd751a45d3bf1eba6dfdb
> Author: Stefan Weil <sw...@weilnetz.de>
> Date: Sun Mar 25 17:28:33 2018 +0200
> Remove deprecated support for -psm argument (#1419)
> It was replaced by --psm and deprecated in commit 92d981b93.
> Signed-off-by: Stefan Weil <sw...@weilnetz.de>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)