You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "David Medinets (JIRA)" <ji...@apache.org> on 2016/11/26 15:35:58 UTC

[jira] [Created] (PDFBOX-3595) For a PDF - Loading from URL works. Loading from BAIS does not.

David Medinets created PDFBOX-3595:
--------------------------------------

             Summary: For a PDF - Loading from URL works. Loading from BAIS does not.
                 Key: PDFBOX-3595
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3595
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 2.0.3, 1.8.12
         Environment: Windows
            Reporter: David Medinets
            Priority: Minor


I've found several PDF files at https://www.supremecourt.gov/opinions/boundvolumes.aspx that throw an exception when using PDDocument.load with a ByteArrayInputStream but do not throw an exception when the same PDF is loaded using a URL.

v1.8.12 is the last version in which the load method takes a URL object. I mention it here in case that reference point of 'working' code helps diagnose this issue.
 
Below is the complete program that shows the two approaches. The first works. The second does not.

```
package com.affy.wildtuna.adrivers;

import java.io.ByteArrayInputStream;
import java.net.URL;
import org.apache.commons.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;

public class ShowInvalidDistancesSetException {

    public static void main(final String[] args) throws Exception {
        String url = "https://www.supremecourt.gov/opinions/boundvolumes/545bv.pdf";
        PDDocument doc01 = PDDocument.load(new URL(url));
        doc01.close();
        System.out.println("Loading from URL works.");
        
        String contents = IOUtils.toString(new URL(url).openStream());
        try (ByteArrayInputStream bais = new ByteArrayInputStream(contents.getBytes())) {
            PDDocument doc = PDDocument.load(bais);
            doc.close();
        }
    }
}
```

Here is the program's output:

```
WARNING: Specified stream length 6845 is wrong. Fall back to reading stream until 'endstream'.
Loading from URL works.
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
Exception in thread "main" java.io.IOException
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64)
	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:574)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
	at com.affy.wildtuna.adrivers.ShowInvalidDistancesSetException.main(ShowInvalidDistancesSetException.java:18)
Caused by: java.util.zip.DataFormatException: invalid distances set
	at java.util.zip.Inflater.inflateBytes(Native Method)
	at java.util.zip.Inflater.inflate(Inflater.java:259)
	at java.util.zip.Inflater.inflate(Inflater.java:280)
	at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:169)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
	... 9 more
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org