You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Yubin Zheng (JIRA)" <ji...@apache.org> on 2008/11/03 07:42:44 UTC

[jira] Created: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

ClassCastException when call parseCOSArray in BaseParser.java 
--------------------------------------------------------------

                 Key: PDFBOX-385
                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 0.7.3, 0.7.2, 0.7.1, 0.7.0
         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
            Reporter: Yubin Zheng
         Attachments: Test9.pdf

When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
  COSArray po = new COSArray();
        COSBase pbo = null;
        skipSpaces();
        int i = 0;
        while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
        {
            pbo = parseDirObject();
            if( pbo instanceof COSObject )
            {
                COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
                COSInteger number = (COSInteger)po.remove( po.size() -1 );
                COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
                pbo = document.getObjectFromPool(key);
            }
            if( pbo != null )
            {
                po.add( pbo );
            }
            else
            {
                //it could be a bad object in the array which is just skipped
            }
            skipSpaces();
        }
        pdfSource.read(); //read ']'
        skipSpaces();
        return po;
    }

If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-385:
--------------------------------------

    Component/s:     (was: FontBox)
                 Parsing
       Assignee: Andreas Lehmkühler

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>            Assignee: Andreas Lehmkühler
>         Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651436#action_12651436 ] 

Jukka Zitting commented on PDFBOX-385:
--------------------------------------

Yes, at least it shouldn't fail with a ClassCastException on non-standard input.

Would anyone be interested in coming up with a patch for solving this issue?

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>         Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Yubin Zheng (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651268#action_12651268 ] 

Yubin Zheng commented on PDFBOX-385:
------------------------------------

I don't know PDF Spec, and how to generate this PDF, but it is ok for acrobat reader, so I think PDFbox should handle this gracefully

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>         Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-385:
--------------------------------------

    Attachment: BaseParser_385-Patch.diff

As already mentioned in a former comment the pdf-document isn't welformed. The object reference is broken. 
I made a patch to prevent a NPE in those cases.

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>         Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-385.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0-incubator

Fixed in version 732009

> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>            Assignee: Andreas Lehmkühler
>             Fix For: 0.8.0-incubator
>
>         Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-385) ClassCastException when call parseCOSArray in BaseParser.java

Posted by "Rainer Schwarze (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646684#action_12646684 ] 

Rainer Schwarze commented on PDFBOX-385:
----------------------------------------

The relevant part of the PDF file is shown below - critical is the line "/K [ 219 0 R 0 R ...". I don't know for sure, whether the PDF spec allows that or not. It seems like not, because trying to "save as" in Acrobat 7 results in an error.
Was this PDF file modified somehow after creating it with PDFMaker?

Relevant part of PDF file:

218 0 obj
<<
/K [ 219 0 R 0 R << /Obj 12 0 R /Pg 5 0 R /Type /OBJR >> ]
/P 217 0 R
/S /Link
>>
endobj
219 0 obj
<<
/K 220 0 R
/P 218 0 R
/S /Hyperlink
>>
endobj


> ClassCastException when call parseCOSArray in BaseParser.java 
> --------------------------------------------------------------
>
>                 Key: PDFBOX-385
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-385
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
>         Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
>            Reporter: Yubin Zheng
>         Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java  see the caused by it:
>   COSArray po = new COSArray();
>         COSBase pbo = null;
>         skipSpaces();
>         int i = 0;
>         while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
>         {
>             pbo = parseDirObject();
>             if( pbo instanceof COSObject )
>             {
>                 COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
>                 COSInteger number = (COSInteger)po.remove( po.size() -1 );
>                 COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
>                 pbo = document.getObjectFromPool(key);
>             }
>             if( pbo != null )
>             {
>                 po.add( pbo );
>             }
>             else
>             {
>                 //it could be a bad object in the array which is just skipped
>             }
>             skipSpaces();
>         }
>         pdfSource.read(); //read ']'
>         skipSpaces();
>         return po;
>     }
> If meet the specific PDF document, the statment     COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.