You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/05/27 23:51:39 UTC

[jira] Commented: (TIKA-416) Out-of-process text extraction

    [ https://issues.apache.org/jira/browse/TIKA-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872395#action_12872395 ] 

Jukka Zitting commented on TIKA-416:
------------------------------------

See http://jukkaz.wordpress.com/2010/05/27/forking-a-jvm/ for a summary of my current approach on how to achieve this.

> Out-of-process text extraction
> ------------------------------
>
>                 Key: TIKA-416
>                 URL: https://issues.apache.org/jira/browse/TIKA-416
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
>
> There's currently no easy way to guard against JVM crashes or excessive memory or CPU use caused by parsing very large, broken or intentionally malicious input documents. To better protect against such cases and to generally improve the manageability of resource consumption by Tika it would be great if we had a way to run Tika parsers in separate JVM processes. This could be handled either as a separate "Tika parser daemon" or as an explicitly managed pool of forked JVMs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.