You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Hetzner (Created) (JIRA)" <ji...@apache.org> on 2011/10/04 03:16:34 UTC

[jira] [Created] (TIKA-741) Make "Zip bomb" (XML nesting) detection level configurable?

Make "Zip bomb" (XML nesting) detection level configurable?
-----------------------------------------------------------

                 Key: TIKA-741
                 URL: https://issues.apache.org/jira/browse/TIKA-741
             Project: Tika
          Issue Type: New Feature
          Components: parser
    Affects Versions: 1.0
            Reporter: Erik Hetzner
            Priority: Minor


I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html

Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.

Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-741.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0
         Assignee: Jukka Zitting

In revision 1179254 I increased the default permitted XML nesting level to 100 and introduced a separate limit of at most 10 nested <div class="package-entry"> elements to catch excessive nesting of package formats.

The maximum nesting limits can be set directly on on the SecureContentHandler level, but are not currently configurable if you're using the Tika facade or the AutoDetectParser class. I'd like to come up with default settings that work for all practical cases before we consider adding such low level configuration options to the higher level APIs.
                
> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
>                 Key: TIKA-741
>                 URL: https://issues.apache.org/jira/browse/TIKA-741
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Erik Hetzner
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0
>
>
> I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

Posted by "Jukka Zitting (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-741:
-------------------------------

    Affects Version/s:     (was: 1.0)
                       0.10
           Issue Type: Bug  (was: New Feature)
              Summary: "Zip bomb" (XML nesting) detection is too strict  (was: Make "Zip bomb" (XML nesting) detection level configurable?)

Updated summary since the described behaviour is arguably an error in the default setting.
                
> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
>                 Key: TIKA-741
>                 URL: https://issues.apache.org/jira/browse/TIKA-741
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Erik Hetzner
>            Priority: Minor
>
> I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

Posted by "Erik Hetzner (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121064#comment-13121064 ] 

Erik Hetzner commented on TIKA-741:
-----------------------------------

100 levels should probably do the trick. Thanks!
                
> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
>                 Key: TIKA-741
>                 URL: https://issues.apache.org/jira/browse/TIKA-741
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Erik Hetzner
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0
>
>
> I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira