You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@solr.apache.org by "Cassandra Targett (Jira)" <ji...@apache.org> on 2021/08/13 22:21:00 UTC

[jira] [Resolved] (SOLR-11773) configurable language config for tesseract ocr

     [ https://issues.apache.org/jira/browse/SOLR-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cassandra Targett resolved SOLR-11773.
--------------------------------------
    Resolution: Won't Fix

I think the project is pretty unlikely to do this for a couple of reasons. First, we're actively discussing removing Solr Cell entirely (SOLR-13973), and second, we don't recommend running Solr Cell in production so enabling configuration of this in Solr would be counter to that.

> configurable language config for tesseract ocr
> ----------------------------------------------
>
>                 Key: SOLR-11773
>                 URL: https://issues.apache.org/jira/browse/SOLR-11773
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 7.1
>            Reporter: Advokat
>            Priority: Minor
>
> Currently to change the language for tesseract I have to manipulate the \org\apache\tika\parser\ocr\TesseractOCRConfig.properties in tika-parsers-1.16.jar.
> There is no possibility to set the language in solrconfig.xml or on each request to the ExtractingRequestHandler.
> If someone has documents with different languages its impossible to configure this. Tesseract will not work as good as it could with correct set language.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org