You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim-Christian Mundt (Commented) (JIRA)" <ji...@apache.org> on 2011/12/13 01:07:30 UTC

[jira] [Commented] (TIKA-788) DWG parser infinite loop on possibly corrupt file

    [ https://issues.apache.org/jira/browse/TIKA-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167963#comment-13167963 ] 

Tim-Christian Mundt commented on TIKA-788:
------------------------------------------

We get the same error with a few dwg files that can be read by AutoCAD just fine. This is the signature and header of such a file:

41 43 31 30 32 31 00 00 00 00 00 01 03 A0 0D 00 00 1B 01 20 00 00 1B 01 00 00 00 00 00 00 00 00 80 0C 00 00 20 62 00 00 80 00 00 00 00 5F 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The problem is "80 0C 00 00 20 62 00" at 0x20 which should indicate where the Metadata begins, however the number is far too big (107889578478720) and especially too big for a cast to int in the section Stas modified. While Stas' change solves the problem of the infinite loop, there could be a more robust way. As mentioned, AutoCAD manages to open the file. Anyways, we'd be content with Stas' patch applied.
                
> DWG parser infinite loop on possibly corrupt file
> -------------------------------------------------
>
>                 Key: TIKA-788
>                 URL: https://issues.apache.org/jira/browse/TIKA-788
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Stas Shaposhnikov
>
> When parsing some dwg items, it is possible that the parser may cause itself to go into an infinite loop.
> Attached is the file causing the problem.
> Here is a possible patch that will at least proceed until an error is thrown.
> {noformat}
> === modified file 'tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java'
> --- tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java        2011-11-24 11:30:33 +0000
> +++ tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java        2011-11-25 05:27:41 +0000
> @@ -274,8 +274,10 @@
>              return false;
>          }
>          while (toSkip > 0) {
> -            byte[] skip = new byte[Math.min((int) toSkip, 0x4000)];
> -            IOUtils.readFully(stream, skip);
> +            byte[] skip = new byte[(int) Math.min(toSkip, 0x4000)];
> +            if (IOUtils.readFully(stream, skip) == -1) {
> +               return false; //invalid skip
> +            }
>              toSkip -= skip.length;
>          }
>          return true;
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira