You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Yubin Zheng (JIRA)" <ji...@apache.org> on 2008/11/03 07:42:44 UTC
[jira] Created: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
ClassCastException when call parseCOSArray in BaseParser.java
--------------------------------------------------------------
Key: PDFBOX-385
URL: https://issues.apache.org/jira/browse/PDFBOX-385
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 0.7.3, 0.7.2, 0.7.1, 0.7.0
Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
Reporter: Yubin Zheng
Attachments: Test9.pdf
When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
COSArray po = new COSArray();
COSBase pbo = null;
skipSpaces();
int i = 0;
while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
{
pbo = parseDirObject();
if( pbo instanceof COSObject )
{
COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
COSInteger number = (COSInteger)po.remove( po.size() -1 );
COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
pbo = document.getObjectFromPool(key);
}
if( pbo != null )
{
po.add( pbo );
}
else
{
//it could be a bad object in the array which is just skipped
}
skipSpaces();
}
pdfSource.read(); //read ']'
skipSpaces();
return po;
}
If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-385:
--------------------------------------
Component/s: (was: FontBox)
Parsing
Assignee: Andreas Lehmkühler
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Assignee: Andreas Lehmkühler
> Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651436#action_12651436 ]
Jukka Zitting commented on PDFBOX-385:
--------------------------------------
Yes, at least it shouldn't fail with a ClassCastException on non-standard input.
Would anyone be interested in coming up with a patch for solving this issue?
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Yubin Zheng (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651268#action_12651268 ]
Yubin Zheng commented on PDFBOX-385:
------------------------------------
I don't know PDF Spec, and how to generate this PDF, but it is ok for acrobat reader, so I think PDFbox should handle this gracefully
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-385:
--------------------------------------
Attachment: BaseParser_385-Patch.diff
As already mentioned in a former comment the pdf-document isn't welformed. The object reference is broken.
I made a patch to prevent a NPE in those cases.
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-385.
---------------------------------------
Resolution: Fixed
Fix Version/s: 0.8.0-incubator
Fixed in version 732009
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Assignee: Andreas Lehmkühler
> Fix For: 0.8.0-incubator
>
> Attachments: BaseParser_385-Patch.diff, Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-385) ClassCastException when call
parseCOSArray in BaseParser.java
Posted by "Rainer Schwarze (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646684#action_12646684 ]
Rainer Schwarze commented on PDFBOX-385:
----------------------------------------
The relevant part of the PDF file is shown below - critical is the line "/K [ 219 0 R 0 R ...". I don't know for sure, whether the PDF spec allows that or not. It seems like not, because trying to "save as" in Acrobat 7 results in an error.
Was this PDF file modified somehow after creating it with PDFMaker?
Relevant part of PDF file:
218 0 obj
<<
/K [ 219 0 R 0 R << /Obj 12 0 R /Pg 5 0 R /Type /OBJR >> ]
/P 217 0 R
/S /Link
>>
endobj
219 0 obj
<<
/K 220 0 R
/P 218 0 R
/S /Hyperlink
>>
endobj
> ClassCastException when call parseCOSArray in BaseParser.java
> --------------------------------------------------------------
>
> Key: PDFBOX-385
> URL: https://issues.apache.org/jira/browse/PDFBOX-385
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 0.7.0, 0.7.1, 0.7.2, 0.7.3
> Environment: Window XP professional sp2, liferay 3.3.0 Enterprise + Jboss 402
> Reporter: Yubin Zheng
> Attachments: Test9.pdf
>
>
> When parse spefical PDF document, PDF will throw ClassCaseException, then Lucene integrated in Liferay will can not get text by parse PDF to add the index.
> Debug the issue, the method "parseCOSArray" at BaseParser.java see the caused by it:
> COSArray po = new COSArray();
> COSBase pbo = null;
> skipSpaces();
> int i = 0;
> while( ((i = pdfSource.peek()) > 0) && ((char)i != ']') )
> {
> pbo = parseDirObject();
> if( pbo instanceof COSObject )
> {
> COSInteger genNumber = (COSInteger)po.remove( po.size() -1 );
> COSInteger number = (COSInteger)po.remove( po.size() -1 );
> COSObjectKey key = new COSObjectKey(number.intValue(), genNumber.intValue());
> pbo = document.getObjectFromPool(key);
> }
> if( pbo != null )
> {
> po.add( pbo );
> }
> else
> {
> //it could be a bad object in the array which is just skipped
> }
> skipSpaces();
> }
> pdfSource.read(); //read ']'
> skipSpaces();
> return po;
> }
> If meet the specific PDF document, the statment COSInteger number = (COSInteger)po.remove( po.size() -1 ); will raise error, it means the object is COSObject, not COSInteger. so Cast Class fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.