You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2012/06/30 19:31:06 UTC

[jira] [Commented] (TIKA-788) DWG parser infinite loop on possibly corrupt file

    [ https://issues.apache.org/jira/browse/TIKA-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404576#comment-13404576 ] 

Nick Burch commented on TIKA-788:
---------------------------------

I've had a go at this in r1355780, changing the logic to skip the header section if the apparent offset is over 10mb (the header is normally very close to the start of the file, so this shouldn't affect real files)

What would be good is if we could get one of these problematic files, along with the metadata that AutoCAD reports for it (ideally the same set of test values that our other sample files have). We can then hopefully work out how to distinguish the two kinds of files, and how to find the metadata in these ones.
                
> DWG parser infinite loop on possibly corrupt file
> -------------------------------------------------
>
>                 Key: TIKA-788
>                 URL: https://issues.apache.org/jira/browse/TIKA-788
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Stas Shaposhnikov
>
> When parsing some dwg items, it is possible that the parser may cause itself to go into an infinite loop.
> Attached is the file causing the problem.
> Here is a possible patch that will at least proceed until an error is thrown.
> {noformat}
> === modified file 'tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java'
> --- tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java        2011-11-24 11:30:33 +0000
> +++ tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java        2011-11-25 05:27:41 +0000
> @@ -274,8 +274,10 @@
>              return false;
>          }
>          while (toSkip > 0) {
> -            byte[] skip = new byte[Math.min((int) toSkip, 0x4000)];
> -            IOUtils.readFully(stream, skip);
> +            byte[] skip = new byte[(int) Math.min(toSkip, 0x4000)];
> +            if (IOUtils.readFully(stream, skip) == -1) {
> +               return false; //invalid skip
> +            }
>              toSkip -= skip.length;
>          }
>          return true;
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira