You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/02/08 16:56:39 UTC
[jira] [Comment Edited] (TIKA-741) "Zip bomb" (XML nesting)
detection is too strict
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137110#comment-15137110 ]
Tim Allison edited comment on TIKA-741 at 2/8/16 3:55 PM:
----------------------------------------------------------
I'd recommend adding the following to your EnhancedPDF2XHTML:
{noformat}
@Override
protected void writeParagraphStart() throws IOException {
+ super.writeParagraphStart();
{noformat}
and
{noformat}
@Override
protected void writeParagraphEnd() throws IOException {
+ super.writeParagraphEnd();
{noformat}
Finally, if your modifications of our PDFParsers are enhancements that have general applicability, please, oh, please share them with us.
was (Author: tallison@mitre.org):
I'd recommend adding the following to your EnhancedPDF2XHTML:
{{noformat}}
@Override
protected void writeParagraphStart() throws IOException {
+ super.writeParagraphStart();
{{noformat}}
and
{{noformat}}
@Override
protected void writeParagraphEnd() throws IOException {
+ super.writeParagraphEnd();
{{noformat}}
Finally, if your modifications of our PDFParsers are enhancements that have general applicability, please, oh, please share them with us.
> "Zip bomb" (XML nesting) detection is too strict
> ------------------------------------------------
>
> Key: TIKA-741
> URL: https://issues.apache.org/jira/browse/TIKA-741
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10
> Reporter: Erik Hetzner
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 1.0
>
>
> I get "zip bomb" errors from many HTML documents, e.g. http://www.akhbaar.org/wesima_articles/index-20100101-82736.html
> Is there a way that the element nesting level could be made configurable? 30 elements just doesn't seem to be enough.
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)