You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/09/11 20:48:00 UTC
[jira] [Resolved] (TIKA-2725) Make tika-server robust against
ooms/infinite loops/memory leaks
[ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-2725.
-------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
1.19
I committed what I'd declare to be an experimental/beta version of this. The legacy tika-server behavior is untouched. To trigger the new version, add {{-spawnChild}} to the commandline.
Still to be done, IMHO, before I forget...
1) Update the wiki.
2) Clean up the shutdown procedure – if there's a parse timeout, we should allow x milliseconds for the other parses to complete before killing the server (and restarting!)
3) Clean up the thread in the watchdog that checks for ping timeouts. We need to lock/synchronize to ensure that the child process is not null when the timeout goes to kill the child.
4) Add maxRestarts as a parameter.
5) Gracefully handle failed jvm start-ups.
6) Figure out how to add an interceptor for "not available" if the child is in the process of shutting down – get rid of {{checkIsOperating()}}.
7) Figure out why 9998 is still taken by the other unit tests.
Finally, many thanks to [~jukkaz] and the ForkParser, from which I plagiarized quite a bit. :) Problems, are, of course, my own.
> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>
> Key: TIKA-2725
> URL: https://issues.apache.org/jira/browse/TIKA-2725
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
> Fix For: 1.19, 2.0.0
>
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks. I see two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put a watcher thread in the child that will kill the child on oom/timeout/after x files. The parent process can then restart the child if it dies.
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream. I propose 2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x. We could also add a status ping from parent to child in case the child gets caught up in stop the world gc (h/t [~bleskes]).
> Other options/recommendations?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)