You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Pascal Essiembre (JIRA)" <ji...@apache.org> on 2016/02/07 04:59:39 UTC

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

    [ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136128#comment-15136128 ] 

Pascal Essiembre commented on TIKA-741:
---------------------------------------

It looks like maxDepth 100 is not enough.  I am using Tika 1.11 and I am able to reproduce a false "zip bomb" exception with the following PDF: https://www.db.com/ir/en/download/DB_Interim_Report_1Q2015.pdf

Without a maximum, the currentDepth for this file goes up to 120 in SecureContentHandler.  Not being configurable at a higher level makes it impossible to parse this file with the current code.  

Shall I create a new ticket instead? Or may be re-open the following one to make it configurable: #TIKA-860 ?  If we can't make it configurable, maybe a much higher maxDepth is a good idea (e.g. 1000)?

FYI, this specific PDF issue was originally reported here: https://github.com/Norconex/collector-http/issues/221

> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
>                 Key: TIKA-741
>                 URL: https://issues.apache.org/jira/browse/TIKA-741
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Erik Hetzner
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0
>
>
> I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)