You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (Resolved JIRA)" <ji...@apache.org> on 2012/03/10 16:20:57 UTC
[jira] [Resolved] (PDFBOX-1232) FlateDecoder in stream mode

     [ https://issues.apache.org/jira/browse/PDFBOX-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1232.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

I added the patch in revision 1299219 as proposed with some minor modifications. 

Thanks for the contribution!
                
> FlateDecoder in stream mode
> ---------------------------
>
>                 Key: PDFBOX-1232
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1232
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Dave Smith
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>
> The zlib (the unlying spec for Flate compression) does not require an Z_STREAM_END to terminate the compression. The Java InflateInputStream is really assuming that you are reading a zip or gzip file which will always have a Z_STREAM_END (Z_STREAM_END is a constant in the zlib library which Java calls natively) . So the following chunk decodes fine using  the jcraft zlib decoder, but fails using the InflateInputStream.
> 3 0 obj
> <<
> /Type /XObject
> /Subtype /Form
> /FormType 1
> /Resources << /Font 4 0 R
> /ProcSet [/PDF /ImageC /Text]>>
> /BBox [0 0 595 842]
> /Matrix [1 0 0 1 0 0]
> /Filter /FlateDecode
> /Length 5 >>
> stream
> H<89>^C^@
> endstream
> endobj
> The blob is 72, -119, 3, 0, 13 decimal. It decodes to an empty string.
> The fix is to use Inflater and check to see if it has consumed all of the input buffer and make sure it has nothing to write into the output buffer.
> protected ByteArrayOutputStream decompress(InputStream in)
>       throws IOException, DataFormatException
>   {
>       ByteArrayOutputStream out = new ByteArrayOutputStream();
>       byte buf[] = new byte[1000];
>       Inflater inflater = new Inflater();
>       int read = in.read(buf);
>       if(read == 0)
>       {
>               return out;
>       }
>       inflater.setInput(buf,0,read);
>       byte res[] = new byte[1000];
>       while(true)
>       {
>               int resRead = inflater.inflate(res);
>               if(resRead !=0)
>               {
>                       out.write(res,0,resRead);
>                       continue;
>               }
>               if(inflater.finished() || inflater.needsDictionary() ||  in.available()==0)
>               {
>                       out.close();
>                       return out;
>               }
>              read = in.read(buf);
>              inflater.setInput(buf,0,read);
>     
>       }
>   }
> We then need to change FlateFilter.decode(InputStream compressedData, OutputStream result,
> COSDictionary options, int filterIndex )
> to look like ...
>  if (compressedData.available() > 0)
>           {
>               try
>               {
>                       baos =  decompress(compressedData);
>               }
> if (predictor==-1 || predictor == 1 )
>               {
>                  result.write(baos.toByteArray());
>               }
> else
> {
>  use the bytearrayoutput stream as before ...
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira