You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Pontus Hulin (JIRA)" <ji...@apache.org> on 2011/09/15 08:28:09 UTC

[jira] [Created] (PDFBOX-1118) All images in pdf document is not listed/extracted

All images in pdf document is not listed/extracted
--------------------------------------------------

                 Key: PDFBOX-1118
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel, Utilities
    Affects Versions: 1.6.0
         Environment: Mac OS X, 10.6.8
            Reporter: Pontus Hulin


Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112582#comment-13112582 ] 

Andreas Lehmkühler commented on PDFBOX-1118:
--------------------------------------------

All my changes were related to org.apache.pdfbox.ExtractImages. AFAIU your implementation is based on that code, so that you have to make these changes [1] too.

[1] http://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/ExtractImages.java?r1=1083894&r2=1173259

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf, SDT1138-021_Page_1_Image_0001.jpg, SDT1138-021_Page_1_Image_0002.jpg
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-1118.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.7.0
         Assignee: Andreas Lehmkühler

I dug deeper into it and found the second image embedded in a XObjectForm.  I improved ExtractImages in revision 1173259 to export those images too.

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf, SDT1138-021_Page_1_Image_0001.jpg, SDT1138-021_Page_1_Image_0002.jpg
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Pontus Hulin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112637#comment-13112637 ] 

Pontus Hulin commented on PDFBOX-1118:
--------------------------------------

Sorry. My Bad. I have now got it running. Thanks for taking the time to help me. I owe you a beer!
Best regards
/ Pontus

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf, SDT1138-021_Page_1_Image_0001.jpg, SDT1138-021_Page_1_Image_0002.jpg
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Pontus Hulin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pontus Hulin updated PDFBOX-1118:
---------------------------------

    Attachment: SDT1138-021.pdf

In this file only one image are exported, there should be two

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>         Attachments: SDT1138-021.pdf
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Pontus Hulin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112573#comment-13112573 ] 

Pontus Hulin commented on PDFBOX-1118:
--------------------------------------

I downloaded the source code for revision 1173259, but I still only get one image. Do I need to rewrite my image extraction code somehow?


> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.7.0
>
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf, SDT1138-021_Page_1_Image_0001.jpg, SDT1138-021_Page_1_Image_0002.jpg
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Pontus Hulin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107766#comment-13107766 ] 

Pontus Hulin commented on PDFBOX-1118:
--------------------------------------

Hello Andreas.
If I export all images from the document in Acrobat, I get two images. 

I have also used icepdf, there im also able to export two images.

The original document is an InDesign spread, and this page is on the right hand side of the spread. Is that the reason it is not included?

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Pontus Hulin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pontus Hulin updated PDFBOX-1118:
---------------------------------

    Attachment: SDT1138-021_Page_1_Image_0002.jpg
                SDT1138-021_Page_1_Image_0001.jpg

These files are exported by Actobat 9, Mac os X.

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf, SDT1138-021_Page_1_Image_0001.jpg, SDT1138-021_Page_1_Image_0002.jpg
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PDFBOX-1118) All images in pdf document is not listed/extracted

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-1118:
---------------------------------------

    Attachment: PDFBOX1118-debugger.png

Every things works fine, there is only one image. The attached screenshot shows the content of the XObject-Dictionary, which contains 4 XObjectForm and 1 XObjectImage object.

> All images in pdf document is not listed/extracted
> --------------------------------------------------
>
>                 Key: PDFBOX-1118
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1118
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel, Utilities
>    Affects Versions: 1.6.0
>         Environment: Mac OS X, 10.6.8
>            Reporter: Pontus Hulin
>         Attachments: PDFBOX1118-debugger.png, SDT1138-021.pdf
>
>
> Im using pdfbox to extract all images in a pdf document, but I have found some documents where not all images are exported. Im using the code from org.apache.pdfbox.ExtractImages
> to extract the images.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira