You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Antoni Mylka (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 23:36:57 UTC
[jira] [Created] (PDFBOX-1227) File submitted to PFDBOX-708 throws
OOME
File submitted to PFDBOX-708 throws OOME
----------------------------------------
Key: PDFBOX-1227
URL: https://issues.apache.org/jira/browse/PDFBOX-1227
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.7.0
Environment: Windows 7 64bit
Reporter: Antoni Mylka
I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-1227) File submitted to PFDBOX-708
throws OOME
Posted by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205331#comment-13205331 ]
Antoni Mylka commented on PDFBOX-1227:
--------------------------------------
The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings i.e. -Xmx now results in:
java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)
Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54
java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
> Key: PDFBOX-1227
> URL: https://issues.apache.org/jira/browse/PDFBOX-1227
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Environment: Windows 7 64bit
> Reporter: Antoni Mylka
> Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1227) File submitted to PFDBOX-708 throws
OOME
Posted by "Antoni Mylka (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoni Mylka updated PDFBOX-1227:
---------------------------------
Attachment: TestPdfbox706FlexEnableBeta1.java
pdfbox-706-flex-enable-beta1.pdf
pdfbox-706-flex-enable-beta1.pdf is to be put in "antoni" package. TestPdfbox706FlexEnableBeta1.java is to be put in org.apache.pdfbox.antoni. They are taken directly from my own private regression test suite. The copyright for the file belongs to somebody else, therefore I mark the "not intended for inclusion" radio button.
This test used to pass. It started failing after I updated my working copy to 1242038
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
> Key: PDFBOX-1227
> URL: https://issues.apache.org/jira/browse/PDFBOX-1227
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Environment: Windows 7 64bit
> Reporter: Antoni Mylka
> Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1227) File submitted to PFDBOX-706 throws
OOME
Posted by "Antoni Mylka (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoni Mylka updated PDFBOX-1227:
---------------------------------
Description: I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME. (was: I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.)
Summary: File submitted to PFDBOX-706 throws OOME (was: File submitted to PFDBOX-708 throws OOME)
Edited the issue description. Corrected the number of the older issue where the PDF in question comes from.
> File submitted to PFDBOX-706 throws OOME
> ----------------------------------------
>
> Key: PDFBOX-1227
> URL: https://issues.apache.org/jira/browse/PDFBOX-1227
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Environment: Windows 7 64bit
> Reporter: Antoni Mylka
> Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (PDFBOX-1227) File submitted to PFDBOX-706 throws
OOME
Posted by "Antoni Mylka (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoni Mylka closed PDFBOX-1227.
--------------------------------
Resolution: Invalid
I close the issue as invalid. Sorry to bother you guyes. The ArrayIndexOutOfBoundsException was a bug in my own test code. I can live with the increased memory consumption. Increased the -Xmx for my unit tests to 512m
> File submitted to PFDBOX-706 throws OOME
> ----------------------------------------
>
> Key: PDFBOX-1227
> URL: https://issues.apache.org/jira/browse/PDFBOX-1227
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Environment: Windows 7 64bit
> Reporter: Antoni Mylka
> Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (PDFBOX-1227) File submitted to
PFDBOX-708 throws OOME
Posted by "Antoni Mylka (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205331#comment-13205331 ]
Antoni Mylka edited comment on PDFBOX-1227 at 2/10/12 10:16 AM:
----------------------------------------------------------------
The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings) now results in:
java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)
Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54
java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
was (Author: antheque):
The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings i.e. -Xmx now results in:
java.lang.OutOfMemoryError: Java heap space
at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)
Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54
java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
> Key: PDFBOX-1227
> URL: https://issues.apache.org/jira/browse/PDFBOX-1227
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Environment: Windows 7 64bit
> Reporter: Antoni Mylka
> Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira