You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org> on 2012/02/10 13:17:59 UTC

[jira] [Commented] (TIKA-860) Make ZIP bomb detection configureable

    [ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205389#comment-13205389 ] 

Uwe Schindler commented on TIKA-860:
------------------------------------

Maybe thats a duplicate of TIKA-741, not sure.
                
> Make ZIP bomb detection configureable
> -------------------------------------
>
>                 Key: TIKA-860
>                 URL: https://issues.apache.org/jira/browse/TIKA-860
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Uwe Schindler
>
> The detection of ZIP bombs is nice and the original issue says it's configureable, but I found no solution how to change ParseContext of the AutoDetectParser to e.g. allow deeper nesting levels. The SecureContentHandler instantiation is hardcoded and there is no point of intervention.
> In my case a simple ZIP of an Eclipse project: http://store.pangaea.de/Publications/AltaweelM_2011/Salinization.zip triggered the bomb detection, but it is of course no bomb. Its just because the JAR/WAR files in this projects itself contain other JAR files and class files :-) This overflows the nesting level of 10 - maybe even the TIKA OSGI bundle triggers the bomb detection (not tested).
> In my case I would like to raise the nesting level, but there is no solution. My change was to simply filter away JAR files (as they contain no metadata we are interested in our own development, we already removed e.g. CLASS file parsers from out TIKA config so we have a very simple parser structure only allowing pdf, office documents, txt files,...) by using a custom DocumentSelector in my ParseContext.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira