You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Cassandra Targett (Jira)" <ji...@apache.org> on 2021/08/13 22:15:00 UTC
[jira] [Updated] (SOLR-11773) configurable language config for
tesseract ocr
[ https://issues.apache.org/jira/browse/SOLR-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cassandra Targett updated SOLR-11773:
-------------------------------------
Component/s: contrib - Solr Cell (Tika extraction)
> configurable language config for tesseract ocr
> ----------------------------------------------
>
> Key: SOLR-11773
> URL: https://issues.apache.org/jira/browse/SOLR-11773
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Affects Versions: 7.1
> Reporter: Advokat
> Priority: Minor
>
> Currently to change the language for tesseract I have to manipulate the \org\apache\tika\parser\ocr\TesseractOCRConfig.properties in tika-parsers-1.16.jar.
> There is no possibility to set the language in solrconfig.xml or on each request to the ExtractingRequestHandler.
> If someone has documents with different languages its impossible to configure this. Tesseract will not work as good as it could with correct set language.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org