You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2018/09/06 13:58:00 UTC

[jira] [Commented] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks

    [ https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605816#comment-16605816 ] 

Tim Allison commented on TIKA-2725:
-----------------------------------

From [~oleg@apache.org] on the dev list: 

bq. What if watcher thread fails/gets stuck etc?

To confirm, that's the watcher thread in the child process.  Y, that's why I think we should also have a ping from the parent process.  WDYT?  

> Make tika-server robust against ooms/infinite loops/memory leaks
> ----------------------------------------------------------------
>
>                 Key: TIKA-2725
>                 URL: https://issues.apache.org/jira/browse/TIKA-2725
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>
> Currently, tika-server is vulnerable to ooms, inifinite loops and memory leaks.  I see two ways of making it robust:
> 1) use the ForkParser
> 2) have tika-server spawn a child process that actually runs the server, put a watcher thread in the child that will kill the child on oom/timeout/after x files.  The parent process can then restart the child if it dies. 
> I somewhat prefer 2) so that we don't have to doubly pass the inputstream.  I propose 2), and I propose making it optional in Tika 1.x, but then the default in Tika 2.x.  We could also add a status ping from parent to child in case the child gets caught up in stop the world gc (h/t [~bleskes]).
> Other options/recommendations?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [jira] [Commented] (TIKA-2725) Make tika-server robust against ooms/infinite loops/memory leaks

Posted by Oleg Tikhonov <ol...@apache.org>.
In this approach, probably it is the only way ...
What is tika-server typical env? stand-alone, distributed ... like replicas
in cluster?
Are there some time limitation for recovery? How do we know what point to
start processing from?
Do we mark documents which were processed?
For example, if tika-server had run on Docker swarm/K8S then orchestrator
would have restarted a failed replica itself ...


On Thu, Sep 6, 2018 at 4:58 PM Tim Allison (JIRA) <ji...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/TIKA-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605816#comment-16605816
> ]
>
> Tim Allison commented on TIKA-2725:
> -----------------------------------
>
> From [~oleg@apache.org] on the dev list:
>
> bq. What if watcher thread fails/gets stuck etc?
>
> To confirm, that's the watcher thread in the child process.  Y, that's why
> I think we should also have a ping from the parent process.  WDYT?
>
> > Make tika-server robust against ooms/infinite loops/memory leaks
> > ----------------------------------------------------------------
> >
> >                 Key: TIKA-2725
> >                 URL: https://issues.apache.org/jira/browse/TIKA-2725
> >             Project: Tika
> >          Issue Type: Task
> >            Reporter: Tim Allison
> >            Assignee: Tim Allison
> >            Priority: Major
> >
> > Currently, tika-server is vulnerable to ooms, inifinite loops and memory
> leaks.  I see two ways of making it robust:
> > 1) use the ForkParser
> > 2) have tika-server spawn a child process that actually runs the server,
> put a watcher thread in the child that will kill the child on
> oom/timeout/after x files.  The parent process can then restart the child
> if it dies.
> > I somewhat prefer 2) so that we don't have to doubly pass the
> inputstream.  I propose 2), and I propose making it optional in Tika 1.x,
> but then the default in Tika 2.x.  We could also add a status ping from
> parent to child in case the child gets caught up in stop the world gc (h/t
> [~bleskes]).
> > Other options/recommendations?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>