You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/09/06 12:27:00 UTC

[jira] [Created] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks

Tim Allison created TIKA-2725:
---------------------------------

             Summary: Make tika-server robust against ooms/infinite loops/memory leaks
                 Key: TIKA-2725
                 URL: https://issues.apache.org/jira/browse/TIKA-2725
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison
            Assignee: Tim Allison


Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks.  I see two ways of making it robust:

1) use the ForkParser
2) have tika-server spawn a child process that actually runs the server, put a watcher thread in the child that will kill the child on oom/timeout/after x files.  The parent process can then restart the child if it dies. 

I somewhat prefer 2) so that we don't have to doubly pass the inputstream.  I propose 2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x.  We could also add a status ping from parent to child in case the child gets caught up in stop the world gc (h/t [~bleskes]).

Other options/recommendations?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [jira] [Created] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks

Posted by Oleg Tikhonov <ol...@apache.org>.
Hi Tim,
What if watcher thread fails/gets stuck etc?



On Thu, Sep 6, 2018 at 3:27 PM Tim Allison (JIRA) <ji...@apache.org> wrote:

> Tim Allison created TIKA-2725:
> ---------------------------------
>
>              Summary: Make tika-server robust against ooms/infinite
> loops/memory leaks
>                  Key: TIKA-2725
>                  URL: https://issues.apache.org/jira/browse/TIKA-2725
>              Project: Tika
>           Issue Type: Task
>             Reporter: Tim Allison
>             Assignee: Tim Allison
>
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory
> leaks.  I see two ways of making it robust:
>
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server,
> put a watcher thread in the child that will kill the child on
> oom/timeout/after x files.  The parent process can then restart the child
> if it dies.
>
> I somewhat prefer 2) so that we don't have to doubly pass the
> inputstream.  I propose 2), and I propose making it optional in Tika 1.x,
> but then the default in Tika 2.x.  We could also add a status ping from
> parent to child in case the child gets caught up in stop the world gc (h/t
> [~bleskes]).
>
> Other options/recommendations?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>