You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Benjamin Muschko (JIRA)" <ji...@apache.org> on 2010/03/05 17:22:27 UTC

[jira] Commented: (PDFBOX-424) Stream decoding hangs up

    [ https://issues.apache.org/jira/browse/PDFBOX-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841910#action_12841910 ] 

Benjamin Muschko commented on PDFBOX-424:
-----------------------------------------

Will this issue be addressed anytime soon? This is really critical to me. Especially, when you run the PDF text extraction as a centralized service it will use up all threads. Any comments would be very welcome.

> Stream decoding hangs up
> ------------------------
>
>                 Key: PDFBOX-424
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-424
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Michael Schlegel
>
> Sometimes it can happens that decoding of streams can hang up.
> The reason can be find in org.apache.pdfbox.filter.FlateFilter decode method.
> Here you ask for available datas in the compressedData stream:
>     decompressor = new InflaterInputStream(compressedData);
>     int mayRead = compressedData.available();
>     byte[] buffer = new byte[Math.min(mayRead, BUFFER_SIZE)];
> Sometimes compressedData.available() returns 0.
> Later you iterate over stream datas.
>     while((amountRead = decompressor.read(buffer, 0, Math.min(mayRead, BUFFER_SIZE))) != -1 )
>     {
>         result.write(buffer, 0, amountRead);
>     }
> Because mayRead is 0 with every loop you try to read 0 bytes from stream ==> amountRead will be 0 for every loop ==> Loop nether finishes.
> You can test this following PDF-Document: http://www.usu.de/d/Case_Studies/BSM/Profiles_in_Excellence_FIDUCIA_AG.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.