You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2018/09/18 18:00:22 UTC

[Tika Wiki] Update of "TikaJAXRS" by TimothyAllison

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "TikaJAXRS" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/TikaJAXRS?action=diff&rev1=48&rev2=49

  
  Also, please be polite.  This feature was added as a convenience.  Please consider using a robust crawler (instead of our simple {{{TikaInputStream.get(new URL(fileUrl))}}}) that will allow for better configuration of redirects, timeouts, cookies, etc.; and a robust crawler will respect robots.txt!
  
+ = Making Tika Server Robust to OOMs, Infinite Loops and Memory Leaks =
+ As of Tika 1.19, users can make tika-server more robust by running it with the {{{-spawnChild}}} option.  This 
+ starts tika-server in a child process, and if there's an OOM, a timeout or other catastrophic problem with the child process, the 
+ parent process will kill and/or restart the child process.
+ 
+ The following options are available only with the {{{-spawnChild}}} option.
+ 
+  * {{{-maxFiles}}}: restart the child process after it has processed {{{maxFiles}}}.  If there is a slow building memory leak, this restart of the JVM should help.
+  * {{{-taskTimeoutMillis}}} and {{{-taskPulseMillis}}}: {{{taskPulseMillis}}} specifies how often to check to determine if a parse/detect task has timed out {{{taskTimeoutMillis}}}
+  * {{{-pingTimeoutMillis}}} and {{{-pingPulseMillis}}}: {{{pingPulseMillis}}} specifies how often for the parent process to ping the child process to check status.  {{{pingTimeoutMillis}}} how long the parent process should wait to hear back from the child process before restarting it and/or how long the child process should wait to receive a ping from the parent process before shutting itself down. 
+ 
+ If the child process is in the process of shutting down, and it gets a new request it will return {{{503 -- Service Unavailable}}}.  If the server times out on a file, the client will receive an IOException from the closed socket.
+