You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2014/11/17 18:26:44 UTC

[Tika Wiki] Update of "TikaOCR" by DaveMeikle

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "TikaOCR" page has been changed by DaveMeikle:
https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=4&rev2=5

Comment:
Added information for overriding default configuration

  
  `curl -T /path/to/tiff/image.tiff http://localhost:9998/tika --header "Content-type: image/tiff"`
  
+ = Overriding Default Configuration =
+ 
+ When using the OCR Parser Tika will use the following default settings:
+  * Tesseract installation path = ""
+  * Language dictionary = "eng"
+  * Page Segmentation Mode = "1"
+  * Minmum file size = 0
+  * Maximum file size = 2147483647
+  * Timeout = 120
+ 
+ To changes these settings you can either modify the existing TesseractOCRConfig.properties file in tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by creating your own and placing it in the package org/apache/tika/parser/ocr on your classpath.
+ 
+ It is worth noting that doing this when using one of the executable JARs, either the tika-app or tika-server JARs, will require you to execute them without using the ''-jar'' command. For example, something like the following for the tika-app or tika-server, respectively:
+ 
+ `java -cp /path/to/your/classpath:/path/to/tika-app-X.X.jar org.apache.tika.cli.TikaCLI`
+ 
+ `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar org.apache.tika.server.TikaServerCli`
+