You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/05/04 00:41:15 UTC
[jira] [Commented] (PDFBOX-1845) PDDocument.load() give Error: Expected a long type at offset 1633

    [ https://issues.apache.org/jira/browse/PDFBOX-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988831#comment-13988831 ] 

Tilman Hausherr commented on PDFBOX-1845:
-----------------------------------------

I uncompressed the first PDF with qpdf and now PDFBox can process it. If [~david.keller] wants to render this file he won't like it, because the images are compressed with JPEG2000 and there's a bug in the plugin.

> PDDocument.load() give Error: Expected a long type at offset 1633
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-1845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.0, 2.0.0
>         Environment: Windows 8.1
>            Reporter: David KELLER
>            Priority: Blocker
>         Attachments: 14 01 2014-2.pdf, 14 01 2014.pdf
>
>
> I run this simple program with the file in attachment (scanned OCR document from Nuance Omnipage 18)
> 	public static void main(String[] args)
> 	throws Exception {
> 		System.out.println("Start SplitFileTest...");
> 		String path = "D:\\test\\batch\\scan_manual\\courrier\\david.keller\\";
> 		String pdfFile = path + "14 01 2014.pdf";
> 		
> 		FileInputStream pdfInputStream = new FileInputStream(pdfFile);
> 		
> 		PDDocument pdDocument = PDDocument.load(pdfInputStream);
> 		List<PDPage> pages = pdDocument.getDocumentCatalog().getAllPages();
> 		
> 		pdfInputStream.close();
> 	}
> And with the 1.8.0 version I have this error :
> java.io.IOException: Error: Expected an integer type, actual='12977[373'
>         at org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1622)
>         at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
>         at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:604)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1187)
> And I have just builded the 2.0.0 from the last code source and I have this error :
>  java.io.IOException: Error: Expected a long type at offset 1633
> 	at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1682)
> 	at org.apache.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:100)
> 	at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:663)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1101)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)



--
This message was sent by Atlassian JIRA
(v6.2#6252)