You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Antoni Mylka (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 23:36:57 UTC

[jira] [Created] (PDFBOX-1227) File submitted to PFDBOX-708 throws OOME

File submitted to PFDBOX-708 throws OOME
----------------------------------------

                 Key: PDFBOX-1227
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.7.0
         Environment: Windows 7 64bit
            Reporter: Antoni Mylka


I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1227) File submitted to PFDBOX-708 throws OOME

Posted by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205331#comment-13205331 ] 

Antoni Mylka commented on PDFBOX-1227:
--------------------------------------

The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings i.e. -Xmx now results in:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
	at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
	at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
	at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
	at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)

Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
	at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
	at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
	
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
                
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1227) File submitted to PFDBOX-708 throws OOME

Posted by "Antoni Mylka (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoni Mylka updated PDFBOX-1227:
---------------------------------

    Attachment: TestPdfbox706FlexEnableBeta1.java
                pdfbox-706-flex-enable-beta1.pdf

pdfbox-706-flex-enable-beta1.pdf is to be put in "antoni" package. TestPdfbox706FlexEnableBeta1.java is to be put in org.apache.pdfbox.antoni. They are taken directly from my own private regression test suite. The copyright for the file belongs to somebody else, therefore I mark the "not intended for inclusion" radio button.

This test used to pass. It started failing after I updated my working copy to 1242038
                
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-1227) File submitted to PFDBOX-706 throws OOME

Posted by "Antoni Mylka (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoni Mylka updated PDFBOX-1227:
---------------------------------

    Description: I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME.  (was: I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.)
        Summary: File submitted to PFDBOX-706 throws OOME  (was: File submitted to PFDBOX-708 throws OOME)

Edited the issue description. Corrected the number of the older issue where the PDF in question comes from.
                
> File submitted to PFDBOX-706 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (PDFBOX-1227) File submitted to PFDBOX-706 throws OOME

Posted by "Antoni Mylka (Closed) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoni Mylka closed PDFBOX-1227.
--------------------------------

    Resolution: Invalid

I close the issue as invalid. Sorry to bother you guyes. The ArrayIndexOutOfBoundsException was a bug in my own test code. I can live with the increased memory consumption. Increased the -Xmx for my unit tests to 512m
                
> File submitted to PFDBOX-706 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-706. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (PDFBOX-1227) File submitted to PFDBOX-708 throws OOME

Posted by "Antoni Mylka (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205331#comment-13205331 ] 

Antoni Mylka edited comment on PDFBOX-1227 at 2/10/12 10:16 AM:
----------------------------------------------------------------

The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings) now results in:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
	at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
	at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
	at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
	at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)

Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
	at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
	at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
	
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
                
      was (Author: antheque):
    The PDF is obviously from PDFBOX-706, not PDFBOX-708. My test (as uploaded, with default maven-surefire-plugin settings i.e. -Xmx now results in:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
	at org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:131)
	at org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
	at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:117)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:279)
	at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:229)
	at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
	at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:214)
	at org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:468)
	at org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:143)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:39)

Now when I increase the heap size (by binary search, needs at least -Xmx296m), the pictures 51,52,53 get extracted properly. They aren't black any more, so there must have been some improvement in this respect. BUT now more pictures are found. I comment out those 'ifs' in that code, so that all pictures are supposed to be non-black. Then my test extracts 53 pictures and dies with this at picture 54

java.lang.ArrayIndexOutOfBoundsException: Coordinate out of bounds!
	at sun.awt.image.ByteInterleavedRaster.getDataElements(ByteInterleavedRaster.java:301)
	at java.awt.image.BufferedImage.getRGB(BufferedImage.java:871)
	at org.apache.pdfbox.antoni.pub.TestPdfbox706FlexEnableBeta1.testFile2(TestPdfbox706FlexEnableBeta1.java:45)
	
So now it seems there are two problems: increased memory consumption and an array index exception. Three pictures are extracted properly now though.
                  
> File submitted to PFDBOX-708 throws OOME
> ----------------------------------------
>
>                 Key: PDFBOX-1227
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1227
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.7.0
>         Environment: Windows 7 64bit
>            Reporter: Antoni Mylka
>         Attachments: TestPdfbox706FlexEnableBeta1.java, pdfbox-706-flex-enable-beta1.pdf
>
>
> I want to extract pictures from FLEX Enable Beta1 Feb13.pdf originally submitted to PDFBOX-708. It used to work, but now it throws an OOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira