You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Rainer Thiel (JIRA)" <ji...@apache.org> on 2010/11/06 22:36:23 UTC

[jira] Issue Comment Edited: (PDFBOX-847) FlateFilter.java swallows Exceptions (should rethrow)

    [ https://issues.apache.org/jira/browse/PDFBOX-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929217#action_12929217 ] 

Rainer Thiel edited comment on PDFBOX-847 at 11/6/10 5:34 PM:
--------------------------------------------------------------

It's further complicated by the fact that though 3 exception types are caught, only a single msg text is logged. It's useless for investigating a possible underlying cause.

      was (Author: slowlearner):
    It's all made worse by the fact that though 3 exception types are caught, only a single msg text is logged. It's useless for investigating a possible underlying cause.
  
> FlateFilter.java swallows Exceptions (should rethrow)
> -----------------------------------------------------
>
>                 Key: PDFBOX-847
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-847
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.1
>            Reporter: Andreas Wollschlaeger
>
> I just re-discovered an issue in FlateFilter.java, which i mentioned quite a while ago on the mailinglist; and which was agreed to be an misfeature :-)
> In FlateFilter.java, at lines 115ff, we find this piece of code:
>                     try 
>                     {
>                         // decoding not needed
>                         while ((amountRead = decompressor.read(buffer, 0, Math.min(mayRead,BUFFER_SIZE))) != -1)
>                         {
>                             result.write(buffer, 0, amountRead);
>                         }
>                     }
>                     catch (OutOfMemoryError exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may occur
>                         log.error("Stop reading corrupt stream");
>                     }
>                     catch (ZipException exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may occur
>                         log.error("Stop reading corrupt stream");
>                     }
>                     catch (EOFException exception) 
>                     {
>                         // if the stream is corrupt an OutOfMemoryError may occur
>                         log.error("Stop reading corrupt stream");
>                     }
> which means these Exceptions are discarded and not reported upstream to the caller. This is very infortunate, as the caller has no means to discover that text extraction is incomplete. I discovered this on troubleshooting Alfresco DMS, which uses PDFBox for indexing PDF documents - except an innocent log message, Alfresco does not know that conversion has failed.
> Proposed solution is to re-throw all 3 Exceptions and let the caller handle the exceptions 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.