You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2016/07/29 16:03:22 UTC

[jira] [Updated] (PDFBOX-2920) IndexOutOfBounds Exception when loading large PDF

     [ https://issues.apache.org/jira/browse/PDFBOX-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-2920:
------------------------------------
    Description: 
I'm getting exceptions loading large pdfs (~6-8 GB each). I've tried using PDDocument.load() and PDDocument.loadNonSeq(). I can't attach a PDF due to the file size limit of 10 Mb. If there is another way to get it to someone, I can work that out. Here is my code:

	
{code}
	public static void main(String[] args) {
		
		LOGGER.info("Test Large PDF Load " + TEST_PDF);
		try {
			LOGGER.info("Create Steam");
			InputStream is = new FileInputStream(TEST_PDF);
			LOGGER.info("Start Load");
			PDDocument doc = PDDocument.load(is);
//			PDDocument doc = PDDocument.loadNonSeq(is, null);
			LOGGER.info("Finished Load");
			doc.close();
			is.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
{code}
This first error is using PDDocument.load()

Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
INFO: Test Large PDF Load D:\workspace_trunk_luna\test_pdfbox\pdfs\ELOISA ARTOLA CD17433_Indigo.pdf
Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
INFO: Create Steam
Aug 06, 2015 1:32:44 PM hp.pdfbox.test.Main main
INFO: Start Load
org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
at hp.pdfbox.test.Main.main(Main.java:22)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1041, Size: 1041
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.FilterOutputStream.close(Unknown Source)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:616)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:650)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
... 3 more


This error was using PDDocument.loadNonSeq()

INFO: Create Steam
Aug 06, 2015 1:51:47 PM hp.pdfbox.test.Main main
INFO: Start Load
Aug 06, 2015 1:53:39 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref
WARNING: Did not found XRef object at specified startxref position 8552119825
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 509, Size: 509
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.FilterOutputStream.close(Unknown Source)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1847)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1448)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1374)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1348)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:429)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:915)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1305)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1288)
at hp.pdfbox.test.Main.main(Main.java:22) 

  was:
I'm getting exceptions loading large pdfs (~6-8 GB each). I've tried using PDDocument.load() and PDDocument.loadNonSeq(). I can't attach a PDF due to the file size limit of 10 Mb. If there is another way to get it to someone, I can work that out. Here is my code:

	
	public static void main(String[] args) {
		
		LOGGER.info("Test Large PDF Load " + TEST_PDF);
		try {
			LOGGER.info("Create Steam");
			InputStream is = new FileInputStream(TEST_PDF);
			LOGGER.info("Start Load");
			PDDocument doc = PDDocument.load(is);
//			PDDocument doc = PDDocument.loadNonSeq(is, null);
			LOGGER.info("Finished Load");
			doc.close();
			is.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

This first error is using PDDocument.load()

Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
INFO: Test Large PDF Load D:\workspace_trunk_luna\test_pdfbox\pdfs\ELOISA ARTOLA CD17433_Indigo.pdf
Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
INFO: Create Steam
Aug 06, 2015 1:32:44 PM hp.pdfbox.test.Main main
INFO: Start Load
org.apache.pdfbox.exceptions.WrappedIOException
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
at hp.pdfbox.test.Main.main(Main.java:22)
Caused by: java.lang.IndexOutOfBoundsException: Index: 1041, Size: 1041
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.FilterOutputStream.close(Unknown Source)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:616)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:650)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
... 3 more


This error was using PDDocument.loadNonSeq()

INFO: Create Steam
Aug 06, 2015 1:51:47 PM hp.pdfbox.test.Main main
INFO: Start Load
Aug 06, 2015 1:53:39 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref
WARNING: Did not found XRef object at specified startxref position 8552119825
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 509, Size: 509
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.FilterOutputStream.close(Unknown Source)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1847)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1448)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1374)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1348)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:429)
at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:915)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1305)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1288)
at hp.pdfbox.test.Main.main(Main.java:22) 


> IndexOutOfBounds Exception when loading large PDF
> -------------------------------------------------
>
>                 Key: PDFBOX-2920
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2920
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.8, 1.8.9, 1.8.10
>         Environment: Software
>            Reporter: Brad Baker
>              Labels: parser
>
> I'm getting exceptions loading large pdfs (~6-8 GB each). I've tried using PDDocument.load() and PDDocument.loadNonSeq(). I can't attach a PDF due to the file size limit of 10 Mb. If there is another way to get it to someone, I can work that out. Here is my code:
> 	
> {code}
> 	public static void main(String[] args) {
> 		
> 		LOGGER.info("Test Large PDF Load " + TEST_PDF);
> 		try {
> 			LOGGER.info("Create Steam");
> 			InputStream is = new FileInputStream(TEST_PDF);
> 			LOGGER.info("Start Load");
> 			PDDocument doc = PDDocument.load(is);
> //			PDDocument doc = PDDocument.loadNonSeq(is, null);
> 			LOGGER.info("Finished Load");
> 			doc.close();
> 			is.close();
> 		} catch (IOException e) {
> 			e.printStackTrace();
> 		}
> 	}
> {code}
> This first error is using PDDocument.load()
> Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
> INFO: Test Large PDF Load D:\workspace_trunk_luna\test_pdfbox\pdfs\ELOISA ARTOLA CD17433_Indigo.pdf
> Aug 06, 2015 1:31:14 PM hp.pdfbox.test.Main main
> INFO: Create Steam
> Aug 06, 2015 1:32:44 PM hp.pdfbox.test.Main main
> INFO: Start Load
> org.apache.pdfbox.exceptions.WrappedIOException
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:278)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1186)
> at hp.pdfbox.test.Main.main(Main.java:22)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 1041, Size: 1041
> at java.util.ArrayList.rangeCheck(Unknown Source)
> at java.util.ArrayList.get(Unknown Source)
> at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
> at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
> at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> at java.io.BufferedOutputStream.flush(Unknown Source)
> at java.io.FilterOutputStream.close(Unknown Source)
> at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:616)
> at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:650)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203)
> ... 3 more
> This error was using PDDocument.loadNonSeq()
> INFO: Create Steam
> Aug 06, 2015 1:51:47 PM hp.pdfbox.test.Main main
> INFO: Start Load
> Aug 06, 2015 1:53:39 PM org.apache.pdfbox.pdfparser.XrefTrailerResolver setStartxref
> WARNING: Did not found XRef object at specified startxref position 8552119825
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 509, Size: 509
> at java.util.ArrayList.rangeCheck(Unknown Source)
> at java.util.ArrayList.get(Unknown Source)
> at org.apache.pdfbox.io.RandomAccessBuffer.seek(RandomAccessBuffer.java:110)
> at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:106)
> at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> at java.io.BufferedOutputStream.flush(Unknown Source)
> at java.io.FilterOutputStream.close(Unknown Source)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1847)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1448)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1374)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1348)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:429)
> at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:915)
> at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1305)
> at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1288)
> at hp.pdfbox.test.Main.main(Main.java:22) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org