You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "ahmad makram (Created) (JIRA)" <ji...@apache.org> on 2011/11/17 15:38:52 UTC
[jira] [Created] (PDFBOX-1174) i have problem in
BaseParser.readInt
i have problem in BaseParser.readInt
-------------------------------------
Key: PDFBOX-1174
URL: https://issues.apache.org/jira/browse/PDFBOX-1174
Project: PDFBox
Issue Type: Bug
Components: Parsing, PDModel
Affects Versions: 1.6.0
Reporter: ahmad makram
Fix For: 1.6.0
i can't load PDF to PDDocument.load( )
it give me this exception
java.io.IOException: Error: Expected an integer type, actual='Fatal'
at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160821#comment-13160821 ]
Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------
Let me add to that:
a) the last line of the file contains only the end-of-file marker, %%EOF (PDFSepc 1.7: 3.4.4)
b) Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file.PDFSpec 1.7: Appendix H, 3.4.4
So from that perspective that file is not conforming and if the content after %%EOF exceeds 1024 bytes Acrobat/Reader will also complain.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160894#comment-13160894 ]
fasihs commented on PDFBOX-1174:
--------------------------------
@Thomas
Our customer pays a lot of money for those files, so I wouldn't exactly qualify them as spam :-)
Like Timo we are downloading the file from a web form. I wonder whether that is a problem with the programming of the form or the server mixung up the data? (I checked our own generated pdf files - no html content at the end)
Anyway thx everybody for helping. Timo's hint about forceParsing really saved my day...
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "ahmad makram (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ahmad makram updated PDFBOX-1174:
---------------------------------
Attachment: bk20104w680-t.pdf
this sample PDF which give me this issue
thanks for ur help
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160793#comment-13160793 ]
fasihs commented on PDFBOX-1174:
--------------------------------
After taking a closer look at the file I found the problem.
This is what I found at the end of the file (There are several appearances of endstream but they don't seem to affect the parser) :
trailer
<</Size 2784>>
startxref
116
%%EOF
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<title>GetFile</title>
<meta name="CODE_LANGUAGE" content="Visual Basic .NET 7.1">
<meta name="vs_defaultClientScript" content="JavaScript">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
</HEAD>
<body MS_POSITIONING="GridLayout">
<form name="Form1" method="post" action="getfile.aspx?Path=c%3a%5cDownloads%5c_Download%5cGPPR%5creports%5cGPPR164.pdf" id="Form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTUxMTcwNzgxMGRkX3jBvL2JNxEHFBCuyf6nrcb2XD0=" />
</div>
</form>
</body>
</HTML>
After removing the lines after EOF everything works fine...
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Andreas Lehmkühler (Updated JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-1174:
---------------------------------------
Fix Version/s: (was: 1.6.0)
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160746#comment-13160746 ]
fasihs commented on PDFBOX-1174:
--------------------------------
No attachments, no comments, no outlines. More Properties:
Tagged PDF:Yes
Fast Web View: No
SecurityMethod: No Security
I'll try to remove the content without repairing the file..
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160735#comment-13160735 ]
Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------
It's too bad, but thanks for providing some informations.
maybe you can tell me if the document has attachments (like pdf, doc or some other)
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160798#comment-13160798 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
We have many PDF files with trailing HTML content after the %%EOF like in your case. A conforming parser will look for the last %%EOF and simply skip the content after it. The current sequential parser however does not know if the %%EOF is really the last one and therefore has to parse further.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-1174.
----------------------------------------
Resolution: Fixed
Fix Version/s: 1.7.0
Assignee: Andreas Lehmkühler
I can't reproduce the exception using the current trunk (revision 1339254). I just used the PDFReader without any further option like force or nonSeq.
Set this to resolved.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Assignee: Andreas Lehmkühler
> Fix For: 1.7.0
>
> Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160822#comment-13160822 ]
Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------
Thx for providing some parts of the document. looks really like garbage. I will try to create some testfiles with the product from verypdf.com.
The domain has a touch of sarcasm :-)
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159299#comment-13159299 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
This is a 'normal' problem with the current serial PDF parser. If an object is parsed it expects the start of another one (reading the object number). However there are a large number of PDFs in the wild containing some garbage in between. For a conforming parser using the XREF table this is not a problem since it only parses the content the XREF table refers to.
The current short term solution is to specify 'forceParsing=true' in PDDocument.load( FILENAME, forceParsing ). This will try to find the next object start if such an error like the reported one occurs.
The long term solution is a conforming parser (PDFBOX-1000) or a nearly conforming parser (PDFBOX-1104). I have reworked the first code of PDFBOX-1104 so that it is now a valid replacement of the current parser. In short time I will post this to PDFBOX-1104.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Fix For: 1.6.0
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159281#comment-13159281 ]
fasihs commented on PDFBOX-1174:
--------------------------------
I'm having the same problem. The same file worked with pdfbox 1.2.0.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Fix For: 1.6.0
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160790#comment-13160790 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
@Maruan
While we have a large collection of PDF files for processing (with all kinds of problems) we unfortunately are not allowed to give them away (files of our customers). The only thing I can do is test PDFBOX against the collection and report problems (at code level).
I'm very interested in your work on PDFBOX-1000. As I wrote above I've enhanced the code from PDFBOX-1104 and use this as a nearly conforming parser. This is now able to parse all the documents I had problems with because of the sequential parser. If I've cleaned up some bits I will provide it in a new Jira issue.
Kind regards
Timo
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178402#comment-13178402 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
@ahmad
if a current build of PDFBOX with forceParsing=true does not work for you, you can get a snapshot from SVN and patch it according to PDFBOX-1199
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160828#comment-13160828 ]
Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------
@ fasihs
I saw such a file some time ago in my spam folder. a normal pdf with html and much much javascript inside it
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160766#comment-13160766 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
Only for illustration here is a snippet from a PDF with garbage (remains from old objects; therefore problematic to find next correct start; I've deactivated looking for 'stream' in PDFParser#skipToNextObj since it stopped on each 'endstream'):
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>
endstream
endobj
<</Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
...K....qz'..w...
3P.
p..K....0Q".ՋN;..A&.....
...#).9m.ƿL,/.>.....@..|..B..
ո..
..V...5.=.J.......g{.Oq.
endstream
endobj
<0C60053F04D4C65448AD9638BA1EB781591C15E992BA7C448D75> >stream
{.#H .....p...c.T.....b.F}g˫.Z.3)1.n..&.:...A.>..@..G...jʻMW@....g.8ew.)..!]..
..
.T.^o....u^
....\....^..
endstream
endobj
/Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
5..3..~.M^j .;sX9..i(...y.......m.!SL.R.'.W.:H!;G..c...(O.W^./...@m..z.Γ{..i.8.XK..ư............eK.Q....a.
endstream
endobj
Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
A...ck...ޘ....;.Ÿa.. GwN...w.NCӛ..+.n~......Y., X.Q.@-. .E..S**.....*6#..d]...#.ȷ..&+;0...С.m...*....
endstream
endobj
lter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
..iN......xj?{0<&m....W.#...Z].d!....!...b..>.o.N..Gq28.K.W.k......y f...9 ..u{....i..xV ..I....(=6M..W.
endstream
endobj
The first endstream/endobj is regular end of an object. The PDF was produced by verypdf.com.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160778#comment-13160778 ]
Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------
@Timo
I'm currently working on PDFBOX-1000 so I'm very interested in files for testing. If you could attach the file you are referencing above I would like to include it in my set of tests to see if the issues are resolved with the ConformingParser.
With kind regards
Maruan
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160849#comment-13160849 ]
Timo Boehme commented on PDFBOX-1174:
-------------------------------------
@Maruan
While the PDF spec can be used to validate a PDF, we have to cope with content after %%EOF if PDFBOX should be able to process all PDF documents which are readable by standard PDF readers (this is what most users/customers expect). Quite a large number of documents in our collection have this extra content - in most cases articles of publishers (it seems that the HTML download modules often add this content).
@Thomas
It seems to me that the garbage is created while updating the document. Thus to generate such a document one first has to create a PDF and change some content afterward. However I'm not sure since we do not use this software but only have to cope with the produced results :-(
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "ahmad makram (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162300#comment-13162300 ]
ahmad makram edited comment on PDFBOX-1174 at 12/4/11 7:05 AM:
---------------------------------------------------------------
i attached sample PDF which give me this issue
could u send a new jar file for PDFBox which solve this problem
thanks for ur help
was (Author: ahmad_makram):
this sample PDF which give me this issue
thanks for ur help
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
> Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160721#comment-13160721 ]
Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------
Is it possible to upload such a pdf? I find out that some files which assumed might be containing garbage using streams with FlatDecode. With the option forceParsing the document would be incomplete or broken as well.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160875#comment-13160875 ]
Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------
@Timo
the information was only meant as you wrote "... a conforming parser ..." to outline the fact that the PDF is not conforming and would need some kind of relaxed parsing. I can put some starting points for that into PDFBOX-1000 but will first try to ensure that conforming files are parsed successfully based an Adams's work. Maybe we should discuss at PDFBOX-1000 how flexible the parser should be.
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1174) i have problem in
BaseParser.readInt
Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160728#comment-13160728 ]
fasihs commented on PDFBOX-1174:
--------------------------------
The option forceParsing seems to work for me. Unfortunately the file causing the problem contains confidential data. We receive the file from a third party, so I don't know how it is created. I found the following information in the properties:
Creator: Adobe InDesign CS3 (5.0.4)
Producer: Adobe PDF Library 8.0
PDF Version: PDF-1.4
> i have problem in BaseParser.readInt
> -------------------------------------
>
> Key: PDFBOX-1174
> URL: https://issues.apache.org/jira/browse/PDFBOX-1174
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing, PDModel
> Affects Versions: 1.6.0
> Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira