You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/08/21 08:02:16 UTC

[Tika Wiki] Update of "TikaOCR" by SergeyTsalkov

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "TikaOCR" page has been changed by SergeyTsalkov:
https://wiki.apache.org/tika/TikaOCR?action=diff&rev1=6&rev2=7

  
  `java -cp /path/to/your/classpath:/path/to/tika-server-1.7-SNAPSHOT.jar org.apache.tika.server.TikaServerCli`
  
+ = Disable Tika OCR =
+ Tika's OCR will trigger on images embedded within, say, office documents in addition to images you upload directly. Because OCR slows down Tika, you might want to disable it if you don't need the results. You can disable OCR by simply uninstalling tesseract, but if that's not an option, here is a tika.xml config file that disables OCR:
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?>
+ <properties>
+   <parsers>
+     <parser class="org.apache.tika.parser.DefaultParser">
+       <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
+     </parser>
+   </parsers>
+ </properties>
+ }}}
+