You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "ahmad makram (Created) (JIRA)" <ji...@apache.org> on 2011/11/17 15:38:52 UTC

[jira] [Created] (PDFBOX-1174) i have problem in BaseParser.readInt

i have problem in  BaseParser.readInt
-------------------------------------

                 Key: PDFBOX-1174
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing, PDModel
    Affects Versions: 1.6.0
            Reporter: ahmad makram
             Fix For: 1.6.0


i can't load PDF to PDDocument.load( )
it give me this exception

java.io.IOException: Error: Expected an integer type, actual='Fatal'
	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160821#comment-13160821 ] 

Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------

Let me add to that:

a) the last line of the file contains only the end-of-file marker, %%EOF (PDFSepc 1.7: 3.4.4)
b) Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file.PDFSpec 1.7: Appendix H, 3.4.4

So from that perspective that file is not conforming and if the content after %%EOF exceeds 1024 bytes Acrobat/Reader will also complain.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160894#comment-13160894 ] 

fasihs commented on PDFBOX-1174:
--------------------------------

@Thomas
Our customer pays a lot of money for those files, so I wouldn't exactly qualify them as spam :-)

Like Timo we are downloading the file from a web form. I wonder whether that is a problem with the programming of the form or the server mixung up the data? (I checked our own generated pdf files - no html content at the end)

Anyway thx everybody for helping. Timo's hint about forceParsing really saved my day...

                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "ahmad makram (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ahmad makram updated PDFBOX-1174:
---------------------------------

    Attachment: bk20104w680-t.pdf

this sample PDF which give me this issue

thanks for ur help
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>         Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160793#comment-13160793 ] 

fasihs commented on PDFBOX-1174:
--------------------------------

After taking a closer look at the file I found the problem.  
This is what I found at the end of the file (There are several appearances of endstream but they don't seem to affect the parser) :

trailer
<</Size 2784>>
startxref
116
%%EOF

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
	<HEAD>
		<title>GetFile</title>
		<meta name="CODE_LANGUAGE" content="Visual Basic .NET 7.1">
		<meta name="vs_defaultClientScript" content="JavaScript">
		<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
	</HEAD>
	<body MS_POSITIONING="GridLayout">
		<form name="Form1" method="post" action="getfile.aspx?Path=c%3a%5cDownloads%5c_Download%5cGPPR%5creports%5cGPPR164.pdf" id="Form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTUxMTcwNzgxMGRkX3jBvL2JNxEHFBCuyf6nrcb2XD0=" />
</div>

		</form>
	</body>
</HTML>


After removing the lines after EOF everything works fine...
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Andreas Lehmkühler (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-1174:
---------------------------------------

    Fix Version/s:     (was: 1.6.0)
    
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160746#comment-13160746 ] 

fasihs commented on PDFBOX-1174:
--------------------------------

No attachments, no comments, no outlines. More Properties:
Tagged PDF:Yes
Fast Web View: No
SecurityMethod: No Security
I'll try to remove the content without repairing the file..
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160735#comment-13160735 ] 

Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------

It's too bad, but thanks for providing some informations.
maybe you can tell me if the document has attachments (like pdf, doc or some other) 
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160798#comment-13160798 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

We have many PDF files with trailing HTML content after the %%EOF like in your case. A conforming parser will look for the last %%EOF and simply skip the content after it. The current sequential parser however does not know if the %%EOF is really the last one and therefore has to parse further.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1174.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

I can't reproduce the exception using the current trunk (revision 1339254). I just used the PDFReader without any further option like force or nonSeq.

Set this to resolved.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160822#comment-13160822 ] 

Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------

Thx for providing some parts of the document. looks really like garbage. I will try to create some testfiles with the product from verypdf.com. 

The domain has a touch of sarcasm :-)
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159299#comment-13159299 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

This is a 'normal' problem with the current serial PDF parser. If an object is parsed it expects the start of another one (reading the object number). However there are a large number of PDFs in the wild containing some garbage in between. For a conforming parser using the XREF table this is not a problem since it only parses the content the XREF table refers to.
The current short term solution is to specify 'forceParsing=true' in PDDocument.load( FILENAME, forceParsing ). This will try to find the next object start if such an error like the reported one occurs.

The long term solution is a conforming parser (PDFBOX-1000) or a nearly conforming parser (PDFBOX-1104). I have reworked the first code of PDFBOX-1104 so that it is now a valid replacement of the current parser. In short time I will post this to PDFBOX-1104.  
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>             Fix For: 1.6.0
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159281#comment-13159281 ] 

fasihs commented on PDFBOX-1174:
--------------------------------

I'm having the same problem. The same file worked with pdfbox 1.2.0.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>             Fix For: 1.6.0
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160790#comment-13160790 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

@Maruan

While we have a large collection of PDF files for processing (with all kinds of problems) we unfortunately are not allowed to give them away (files of our customers). The only thing I can do is test PDFBOX against the collection and report problems (at code level).

I'm very interested in your work on PDFBOX-1000. As I wrote above I've enhanced the code from PDFBOX-1104 and use this as a nearly conforming parser. This is now able to parse all the documents I had problems with because of the sequential parser. If I've cleaned up some bits I will provide it in a new Jira issue.

Kind regards

Timo
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178402#comment-13178402 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

@ahmad

if a current build of PDFBOX with forceParsing=true does not work for you, you can get a snapshot from SVN and patch it according to PDFBOX-1199
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>         Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160828#comment-13160828 ] 

Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------

@ fasihs
I saw such a file some time ago in my spam folder. a normal pdf with html and much much javascript inside it
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160766#comment-13160766 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

Only for illustration here is a snippet from a PDF with garbage (remains from old objects; therefore problematic to find next correct start; I've deactivated looking for 'stream' in PDFParser#skipToNextObj since it stopped on each 'endstream'):

      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
<?xpacket end="r"?>
endstream
endobj
<</Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
...K....qz'..w...
3P.
p..K....0Q".ՋN;..A&.....
...#).9m.ƿL,/.>.....@..|..B..
ո..
..V...5.=.J.......g{.Oq.
endstream
endobj
<0C60053F04D4C65448AD9638BA1EB781591C15E992BA7C448D75>   >stream
{.#H    .....p...c.T.....b.F}g˫.Z.3)1.n..&.:...A.>..@..G...jʻMW@....g.8ew.)..!]..
..
.T.^o....u^
....\....^..
endstream
endobj
/Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
5..3..~.M^j     .;sX9..i(...y.......m.!SL.R.'.W.:H!;G..c...(O.W^./...@m..z.Γ{..i.8.XK..ư............eK.Q....a.
endstream
endobj
Filter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
A...ck...ޘ....;.Ÿa.. GwN...w.NCӛ..+.n~......Y., X.Q.@-. .E..S**.....*6#..d]...#.ȷ..&+;0...С.m...*....
endstream
endobj
lter/FlateDecode/First 13/Length 108/N 2/Type/ObjStm>>stream
..iN......xj?{0<&m....W.#...Z].d!....!...b..>.o.N..Gq28.K.W.k......y    f...9 ..u{....i..xV ..I....(=6M..W.
endstream
endobj

The first endstream/endobj is regular end of an object. The PDF was produced by verypdf.com.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160778#comment-13160778 ] 

Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------

@Timo

I'm currently working on PDFBOX-1000 so I'm very interested in files for testing. If you could attach the file you are referencing above I would like to include it in my set of tests to see if the issues are resolved with the ConformingParser.

With kind regards

Maruan
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Timo Boehme (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160849#comment-13160849 ] 

Timo Boehme commented on PDFBOX-1174:
-------------------------------------

@Maruan
While the PDF spec can be used to validate a PDF, we have to cope with content after %%EOF if PDFBOX should be able to process all PDF documents which are readable by standard PDF readers (this is what most users/customers expect). Quite a large number of documents in our collection have this extra content - in most cases articles of publishers (it seems that the HTML download modules often add this content).

@Thomas
It seems to me that the garbage is created while updating the document. Thus to generate such a document one first has to create a PDF and change some content afterward. However I'm not sure since we do not use this software but only have to cope with the produced results :-(
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "ahmad makram (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162300#comment-13162300 ] 

ahmad makram edited comment on PDFBOX-1174 at 12/4/11 7:05 AM:
---------------------------------------------------------------

i attached sample PDF which give me this issue
could u send a new jar file for PDFBox which solve this problem
thanks for ur help
                
      was (Author: ahmad_makram):
    this sample PDF which give me this issue

thanks for ur help
                  
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>         Attachments: bk20104w680-t.pdf
>
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Thomas Chojecki (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160721#comment-13160721 ] 

Thomas Chojecki commented on PDFBOX-1174:
-----------------------------------------

Is it possible to upload such a pdf? I find out that some files which assumed might be containing garbage using streams with FlatDecode. With the option forceParsing the document would be incomplete or broken as well.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "Maruan Sahyoun (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160875#comment-13160875 ] 

Maruan Sahyoun commented on PDFBOX-1174:
----------------------------------------

@Timo

the information was only meant as you wrote "... a conforming parser ..." to outline the fact that the PDF is not conforming and would need some kind of relaxed parsing. I can put some starting points for that into PDFBOX-1000 but will first try to ensure that conforming files are parsed successfully based an Adams's work. Maybe we should discuss at PDFBOX-1000 how flexible the parser should be.
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1174) i have problem in BaseParser.readInt

Posted by "fasihs (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160728#comment-13160728 ] 

fasihs commented on PDFBOX-1174:
--------------------------------

The option forceParsing seems to work for me. Unfortunately the file causing the problem contains confidential data. We receive the file from a third party, so I don't know how it is created. I found the following information in the properties:
Creator: Adobe InDesign CS3 (5.0.4)
Producer: Adobe PDF Library 8.0
PDF Version: PDF-1.4
                
> i have problem in  BaseParser.readInt
> -------------------------------------
>
>                 Key: PDFBOX-1174
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1174
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing, PDModel
>    Affects Versions: 1.6.0
>            Reporter: ahmad makram
>
> i can't load PDF to PDDocument.load( )
> it give me this exception
> java.io.IOException: Error: Expected an integer type, actual='Fatal'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:517)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1007)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira