You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Benjamin Muschko (JIRA)" <ji...@apache.org> on 2010/03/05 17:22:27 UTC
[jira] Commented: (PDFBOX-424) Stream decoding hangs up
[ https://issues.apache.org/jira/browse/PDFBOX-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841910#action_12841910 ]
Benjamin Muschko commented on PDFBOX-424:
-----------------------------------------
Will this issue be addressed anytime soon? This is really critical to me. Especially, when you run the PDF text extraction as a centralized service it will use up all threads. Any comments would be very welcome.
> Stream decoding hangs up
> ------------------------
>
> Key: PDFBOX-424
> URL: https://issues.apache.org/jira/browse/PDFBOX-424
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Reporter: Michael Schlegel
>
> Sometimes it can happens that decoding of streams can hang up.
> The reason can be find in org.apache.pdfbox.filter.FlateFilter decode method.
> Here you ask for available datas in the compressedData stream:
> decompressor = new InflaterInputStream(compressedData);
> int mayRead = compressedData.available();
> byte[] buffer = new byte[Math.min(mayRead, BUFFER_SIZE)];
> Sometimes compressedData.available() returns 0.
> Later you iterate over stream datas.
> while((amountRead = decompressor.read(buffer, 0, Math.min(mayRead, BUFFER_SIZE))) != -1 )
> {
> result.write(buffer, 0, amountRead);
> }
> Because mayRead is 0 with every loop you try to read 0 bytes from stream ==> amountRead will be 0 for every loop ==> Loop nether finishes.
> You can test this following PDF-Document: http://www.usu.de/d/Case_Studies/BSM/Profiles_in_Excellence_FIDUCIA_AG.pdf
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.