You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "CP (JIRA)" <ji...@apache.org> on 2010/09/02 22:59:54 UTC

[jira] Created: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

ClassCastException: COSInteger cannot be cast to COSDictionary
--------------------------------------------------------------

                 Key: PDFBOX-813
                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.2.1
         Environment: Windows XP

java version "1.6.0_12"
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)

            Reporter: CP
            Priority: Critical


I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.

(I will try uploading the PDF that causes this problem)

2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
java.io.IOException: Error: Expected an integer type, actual='bj'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905670#action_12905670 ] 

CP commented on PDFBOX-813:
---------------------------

The PDF was created with Apache FOP v0.95

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914224#action_12914224 ] 

CP commented on PDFBOX-813:
---------------------------

I downloaded the latest code today from SVN, built new libraries, added them to the project and ran the same test. 

I still see the problem.


> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

CP updated PDFBOX-813:
----------------------

    Description: 
I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.

(I will try uploading the PDF that causes this problem)

2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
java.io.IOException: Error: Expected an integer type, actual='bj'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)

2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)


  was:
I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.

(I will try uploading the PDF that causes this problem)

2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
java.io.IOException: Error: Expected an integer type, actual='bj'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)



> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved PDFBOX-813.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.3.0
         Assignee: Jukka Zitting

Fixed in revision 1022444 by explicitly checking the type of the Root object.

I think this is a better solution than dropping (or deprecating) the forceParsing option, as there are quite a few malformed PDFs out there that can still be processed reasonably well even with relaxed parsing rules. Sometimes this results in PDDocuments with unexpected internal structures, but it's IMHO better to try degrading gracefully in such cases.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Assignee: Jukka Zitting
>            Priority: Critical
>             Fix For: 1.3.0
>
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

CP updated PDFBOX-813:
----------------------

    Affects Version/s: 1.3.0

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916640#action_12916640 ] 

CP commented on PDFBOX-813:
---------------------------

Thanks for your feedback and suggestions.

I have stopped using the force option and the code now behaves better. So yes, the force parameter is probably unnecessary now.


> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

CP updated PDFBOX-813:
----------------------

    Attachment: PDFBoxBug.java
                CancerSummReport_34914.pdf

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905681#action_12905681 ] 

Adam Nichols commented on PDFBOX-813:
-------------------------------------

This PDF does not conform to the Adobe PDF specification.  If you open the PDF in a text editor and scroll down to the bottom, you'll see a random "bj" after the %%EOF marker.  That's invalid, but PDFBox is cool about it and it just warns you of this problem, it doesn't cause any serious problem.  The second message also is merely a warning.

Having said that, I tested using latest code from SVN and this PDF loaded properly.  I tested both with PDDocument.load( inputpath); and PDDocument.load(inputpath, true);

What version of PDFBox are you using?  I'd suggest trying the latest from SVN if you can.  The logs you posted so not contain any stacktraces which caused the code to abort.  There should be some other stacktrace which has more information.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915503#action_12915503 ] 

Adam Nichols commented on PDFBOX-813:
-------------------------------------

Well, I can tell you the reason it can't be parsed is because it's not a valid PDF.  If you open it and look at the bottom, you'll find that the trailer looks like this:
trailer
<<
/Size 41
/Root 2

There's not even a newline nor carriage return after that last "2".  Since this does not conform to Adobe's PDF specification, the way this should be handled is undefined, so throwing an exception is not unreasonable.

However, what is interesting is that if you replace PDDocument.load(inputpath, true); with PDDocument.load(inputpath); or PDDocument.load(inputpath, false); the exception is not thrown!  I find this most interesting because force is only passed into the parser object it's only used once in that class and it seems to be used to prevent an exception from being thrown.

I looked into this a little further and found that if forceParsing is false, the exception your PDF throws is an IOException and it's caught and basically ignored by code which handles invalid PDFs which have random data after the EOF marker.  If you are blindly loading a document (aka forcing the loading), and that document is corrupt, you can't expect that there was enough information read to properly.

My suggestion would be to load documents without the force option and understand that there are some non-conforming PDFs which may not be able to be parsed and have your code handle that accordingly.  This message will hit the developers mailing list and we will discuss the possibility of deprecating the force option on the load() method.  While it may have been accurate when it was first introduced, I feel that it's misleading now that we handle so many different things which are out-of-spec.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914444#action_12914444 ] 

CP commented on PDFBOX-813:
---------------------------

Just to be sure I'm clear: a WARN message is logged but the ClassCastException is severe and causes pdDoc.getDocumentCatalog().getAllPages(); method to abort so my code cannot continue. 

Thanks.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1, 1.3.0
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "CP (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905881#action_12905881 ] 

CP commented on PDFBOX-813:
---------------------------

Adam, thank you much for your reply but I think there still is a problem here.

After calling PDDocument.load(...) did you then call getDocumentCatalog().getAllPages() ? When I call getAllPages() the code throws a ClassCastException which stops the code from running - not just a warning.

I'm running PDFBox v1.2.1


> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-813) ClassCastException: COSInteger cannot be cast to COSDictionary

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905970#action_12905970 ] 

Adam Nichols commented on PDFBOX-813:
-------------------------------------

Ah, I missed the doc.getDocumentCatalog().getAllPages(); part.  However, when I added that in just now, I found that it still worked okay with the latest code from SVN.  I made a few different changes to the parser recently so I'm not sure which of the changes fix the problem.  If you can try the latest code and confirm that it works, I'll go ahead and close this issue.

I'm not sure when PDFBox 1.3.0 will be released, but the changes will be in that version.  I'll bring it up to the other developers and see if we can come up with either an ETA or a list of tasks which should be completed for the 1.3.0 release.

> ClassCastException: COSInteger cannot be cast to COSDictionary
> --------------------------------------------------------------
>
>                 Key: PDFBOX-813
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-813
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.1
>         Environment: Windows XP
> java version "1.6.0_12"
> Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
> Java HotSpot(TM) Client VM (build 11.2-b01, mixed mode, sharing)
>            Reporter: CP
>            Priority: Critical
>         Attachments: CancerSummReport_34914.pdf, PDFBoxBug.java
>
>
> I get the below exceptions when calling pdfDoc.getDocumentCatalog().getAllPages(). The code continues after the first exception because I've called PDDocument.load("C:/CancerSummReport_34914.pdf", true)  setting the load "force" param to true. The second exception causes the code to abort.
> (I will try uploading the PDF that causes this problem)
> 2010-09-02 16:47:47,521 [main] WARN  (PDFParser.java:189) - Parsing Error, Skipping Object
> java.io.IOException: Error: Expected an integer type, actual='bj'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1310)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:497)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:878)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:843)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:768)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:16)
> 2010-09-02 16:47:47,552 [main] WARN  (BaseParser.java:215) - Invalid dictionary, found:? but expected:''
> Exception in thread "main" java.lang.ClassCastException: org.apache.pdfbox.cos.COSInteger cannot be cast to org.apache.pdfbox.cos.COSDictionary
> 	at org.apache.pdfbox.pdmodel.PDDocument.getDocumentCatalog(PDDocument.java:414)
> 	at com.xyz.framework.functionalTests.PDFBoxBug.main(PDFBoxBug.java:18)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.