You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org> on 2012/02/10 11:17:59 UTC
[jira] [Commented] (PDFBOX-1227) File submitted to PFDBOX-708 throws OOME

    [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205331#comment-13205331 ] 

Antoni Mylka commented on PDFBOX-1227:
--------------------------------------

The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings i.e. -Xmx now results in:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
	at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
	at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
	at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
	at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)

Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
	at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
	at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
	
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
                
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira