You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "JOSE FREITAS (JIRA)" <ji...@apache.org> on 2010/12/21 13:40:19 UTC

[jira] Created: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Problem on writing some kind of images to a File in filesystem
--------------------------------------------------------------

                 Key: PDFBOX-927
                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 1.4.0, 1.3.1, 1.2.1
         Environment: JDK5 / 6

            Reporter: JOSE FREITAS
            Priority: Minor


I have an image object which is an instance of PDXObjectImage.

If it has PDIndexed as colorspace.
 
"image.getColorSpace() instanceof PDIndexed"

the image is wrongly rendered.

Is there any known issue with this colorSpace?

I think the problem could be at:
image.write2file(...) or 
image.write2OutputStream(...);


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JOSE FREITAS updated PDFBOX-927:
--------------------------------

    Attachment: 1___Im0-1.jpg

example of corrupted image from the pdf attached.

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002185#comment-13002185 ] 

Adam Nichols commented on PDFBOX-927:
-------------------------------------

I don't know much about image processing, but could this be related to PDFBOX-942?  Does the newly released PDFBox 1.5.0 have the same issue if you try the sample code from that issue?  Based on your comparison screenshot it looks like it may just be using too much compression and I noticed that 1___Im0-1.jpg is a jpg.

Also, I'm looking at the API for PDXObjectImage and it looks like there is a getRGBImage() method.  Have you tried seeing if you can get a cleaner image from that?  Also, is there any patterns between the image quality and getBitsPerComponent()/getColorSpace()/getSuffix()?  For example does it only happen with JPGs?  All JPGs, or just some of them?

I've only dealt with one other image problem in PDFBox and that was related to encryption, so I'm no expert, but hopefully this helps at least point you in the right direction.  Also, feel free to look into the PDFBox code to see how it works, it's not as scary as you might imagine. :-)

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, comparison.png, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JOSE FREITAS updated PDFBOX-927:
--------------------------------

    Attachment:     (was: ExtractImages.java)

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: ExtractImages.java, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002131#comment-13002131 ] 

JOSE FREITAS commented on PDFBOX-927:
-------------------------------------

I don't have the image from before putting it in the document. I receive the document and then I extract the image.
I'll send you an example.

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: ExtractImages.java, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JOSE FREITAS updated PDFBOX-927:
--------------------------------

    Attachment:     (was: test with pdindexed.pdf)

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: ExtractImages.java
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002537#comment-13002537 ] 

Andreas Lehmkühler commented on PDFBOX-927:
-------------------------------------------

I extracted the images using an other tool and got the same results

IMO that leads to the conclusion that

- the images embedded in the given pdf have a poor quality
- other readers like adobe or xpdf must have some build-in algos to improve the rendering automatically
- PDFBox works fine
- we should also add an image enhancer  ...


> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, comparison.png, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Updated: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JOSE FREITAS updated PDFBOX-927:
--------------------------------

    Attachment:     (was: extractImages.java)

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002518#comment-13002518 ] 

Andreas Lehmkühler commented on PDFBOX-927:
-------------------------------------------

This issue is about extracting images and PDFBOX-942 is about adding images to a pdf. So I guess both are not related to each other.

I'd a look at the code. It just uses the embedded stream and writes the data to the given output stream and it doesn't use the given colorspace. Maybe that is the reason for the rendering issue. I treid to use the getRGBImage method and to write the resulting BufferedImage to a file using ImageIO but I didn't work either. 

A possible solution could be to decode the stream using our own implementation, which doesn't exist yet, instead of using the ImageIO and to apply the given colorspace.

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, comparison.png, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] Updated: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JOSE FREITAS updated PDFBOX-927:
--------------------------------

    Attachment: comparison.png

comparison between image inside pdf and extracted image.

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, comparison.png, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001250#comment-13001250 ] 

Adam Nichols commented on PDFBOX-927:
-------------------------------------

What do you mean when you say it is "wrongly rendered"?  Can you post a screenshot of the image before and after corruption?  Also, if you have the image before it was included, compare that the the one which was extracted and see what is changing.  This could help us track down what the issue is and if its in PDFBox or in one of its dependencies.

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: ExtractImages.java, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-927) Problem on writing some kind of images to a File in filesystem

Posted by "JOSE FREITAS (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002136#comment-13002136 ] 

JOSE FREITAS commented on PDFBOX-927:
-------------------------------------

I've just realized that this problem is not related with PDIndexed

> Problem on writing some kind of images to a File in filesystem
> --------------------------------------------------------------
>
>                 Key: PDFBOX-927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.2.1, 1.3.1, 1.4.0
>         Environment: JDK5 / 6
>            Reporter: JOSE FREITAS
>            Priority: Minor
>         Attachments: 1___Im0-1.jpg, ExtractImages.java, comparison.png, test with pdindexed.pdf
>
>
> I have an image object which is an instance of PDXObjectImage.
> If it has PDIndexed as colorspace.
>  
> "image.getColorSpace() instanceof PDIndexed"
> the image is wrongly rendered.
> Is there any known issue with this colorSpace?
> I think the problem could be at:
> image.write2file(...) or 
> image.write2OutputStream(...);

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira