You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Martijn Brinkers (JIRA)" <ji...@apache.org> on 2010/11/01 10:09:24 UTC

[jira] Issue Comment Edited: (PDFBOX-872) ERROR org.apache.pdfbox.filter.FlateFilter - Stop reading corrupt stream

    [ https://issues.apache.org/jira/browse/PDFBOX-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926860#action_12926860 ] 

Martijn Brinkers edited comment on PDFBOX-872 at 11/1/10 5:08 AM:
------------------------------------------------------------------

According to PDFBox the PDF is encrypted because document.isEncrypted() returns true but the document does not seem to be encrypted.

Acrobat reader security tab says: "All contents of the document are encrypted and search engines cannot access the document's meta data".

Somehow it's 'encrypted' but all PDF viewers can open it. This however seems not be supported by PDFBox 

      was (Author: martijn_brinkers):
    According to PDFBox the PDF is encrypted because document.isEncrypted() returns true but the document does not seem to be encrypted.
  
> ERROR org.apache.pdfbox.filter.FlateFilter  - Stop reading corrupt stream
> -------------------------------------------------------------------------
>
>                 Key: PDFBOX-872
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-872
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.3.1
>         Environment: Windows XP [Версия 5.1.2600]
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) Client VM (build 17.1-b03, mixed mode, sharing)
>            Reporter: Vladimir
>            Priority: Critical
>
> This report: http://www2.goldmansachs.com/our-firm/press/press-releases/current/pdfs/2010-q2-earnings.pdf
> With this code:
> public static String getTransformed(InputStream inputStream) {
>         PDDocument pdDocument = null;
>         String document = null;
>         try {
>             PDFParser parser = new PDFParser(inputStream);
>             parser.parse();
>             pdDocument = parser.getPDDocument();
>             PDFText2HTML pdf2html = new PDFText2HTML("UTF-8");
>             document = pdf2html.getText(pdDocument);
>         } catch (IOException e) {
>             e.printStackTrace();      
>         } finally {
>             if (pdDocument != null) {
>                 try {
>                     pdDocument.getDocument().close();
>                 } catch (IOException e) {
>                     e.printStackTrace();
>                       }
>             }
>         }
>         return document;
>     }
> returns:
> 17:01:15,609 [main] ERROR org.apache.pdfbox.filter.FlateFilter  - Stop reading corrupt stream
> null
> java.io.IOException: Error: Expected an integer type, actual=''
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:81)
> 	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:449)
> 	at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1112)
> 	at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:591)
> 	at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:246)
> 	at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:184)
> 	at com.selerityfinancial.wwwscraper.utils.PDFUtil.getTransformed(PDFUtil.java:25)
> 	at com.selerityfinancial.wwwscraper.utils.PDFUtil.main(PDFUtil.java:55)
> in Foxit PDF this file was opened normally

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.