You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2016/11/26 15:40:58 UTC

[jira] [Comment Edited] (PDFBOX-3595) For a PDF - Loading from URL works. Loading from BAIS does not.

    [ https://issues.apache.org/jira/browse/PDFBOX-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15698101#comment-15698101 ] 

Tilman Hausherr edited comment on PDFBOX-3595 at 11/26/16 3:40 PM:
-------------------------------------------------------------------

{code}
        String contents = IOUtils.toString(new URL(url).openStream());
{code}
You're converting binary data to a java string and back. Why are you expecting that this would work?


was (Author: tilman):
You're converting binary data to a java string and back. Why are you expecting that this would work?

> For a PDF - Loading from URL works. Loading from BAIS does not.
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-3595
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3595
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.12, 2.0.3
>         Environment: Windows
>            Reporter: David Medinets
>            Priority: Minor
>
> I've found several PDF files at https://www.supremecourt.gov/opinions/boundvolumes.aspx that throw an exception when using PDDocument.load with a ByteArrayInputStream but do not throw an exception when the same PDF is loaded using a URL.
> v1.8.12 is the last version in which the load method takes a URL object. I mention it here in case that reference point of 'working' code helps diagnose this issue.
>  
> Below is the complete program that shows the two approaches. The first works. The second does not.
> ```
> package com.affy.wildtuna.adrivers;
> import java.io.ByteArrayInputStream;
> import java.net.URL;
> import org.apache.commons.io.IOUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class ShowInvalidDistancesSetException {
>     public static void main(final String[] args) throws Exception {
>         String url = "https://www.supremecourt.gov/opinions/boundvolumes/545bv.pdf";
>         PDDocument doc01 = PDDocument.load(new URL(url));
>         doc01.close();
>         System.out.println("Loading from URL works.");
>         
>         String contents = IOUtils.toString(new URL(url).openStream());
>         try (ByteArrayInputStream bais = new ByteArrayInputStream(contents.getBytes())) {
>             PDDocument doc = PDDocument.load(bais);
>             doc.close();
>         }
>     }
> }
> ```
> Here is the program's output:
> ```
> WARNING: Specified stream length 6845 is wrong. Fall back to reading stream until 'endstream'.
> Loading from URL works.
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Exception in thread "main" java.io.IOException
> 	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
> 	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
> 	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
> 	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
> 	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64)
> 	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:574)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
> 	at com.affy.wildtuna.adrivers.ShowInvalidDistancesSetException.main(ShowInvalidDistancesSetException.java:18)
> Caused by: java.util.zip.DataFormatException: invalid distances set
> 	at java.util.zip.Inflater.inflateBytes(Native Method)
> 	at java.util.zip.Inflater.inflate(Inflater.java:259)
> 	at java.util.zip.Inflater.inflate(Inflater.java:280)
> 	at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:169)
> 	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
> 	... 9 more
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org