You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2014/11/17 18:26:44 UTC
[Tika Wiki] Update of "TikaOCR" by DaveMeikle
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaOCR" page has been changed by DaveMeikle:
https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=4&rev2=5
Comment:
Added information for overriding default configuration
`curl -T /path/to/tiff/image.tiff http://localhost:9998/tika --header "Content-type: image/tiff"`
+ = Overriding Default Configuration =
+
+ When using the OCR Parser Tika will use the following default settings:
+ * Tesseract installation path = ""
+ * Language dictionary = "eng"
+ * Page Segmentation Mode = "1"
+ * Minmum file size = 0
+ * Maximum file size = 2147483647
+ * Timeout = 120
+
+ To changes these settings you can either modify the existing TesseractOCRConfig.properties file in tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by creating your own and placing it in the package org/apache/tika/parser/ocr on your classpath.
+
+ It is worth noting that doing this when using one of the executable JARs, either the tika-app or tika-server JARs, will require you to execute them without using the ''-jar'' command. For example, something like the following for the tika-app or tika-server, respectively:
+
+ `java -cp /path/to/your/classpath:/path/to/tika-app-X.X.jar org.apache.tika.cli.TikaCLI`
+
+ `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar org.apache.tika.server.TikaServerCli`
+