You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Emmeran Seehuber (Jira)" <ji...@apache.org> on 2020/05/26 21:32:00 UTC

[jira] [Comment Edited] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

    [ https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117050#comment-17117050 ] 

Emmeran Seehuber edited comment on PDFBOX-4847 at 5/26/20, 9:31 PM:
--------------------------------------------------------------------

The bug in the PNGConverter is, that it did not correctly write the ICC profile. It had a "one off" error, as it did not skip the 0-byte marker in the profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because of this (Exception while decoding the profile). But this meant that the colors were not correct (as the wrong color profile was used; the alternative DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
         if (state.iCCP != null || state.sRGB != null)
         {
             // We have got a color profile, which we must attach
             cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
             cosStream.setItem(COSName.ALTERNATE, colorSpace.getNumberOfComponents()
                     == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
             if (state.iCCP != null)
             {
+                cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
                 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
                         break;
                     iccProfileDataStart++;
                 }
+                iccProfileDataStart++;
                 if (iccProfileDataStart >= state.iCCP.length)
                 {
                     LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without the fix for this in PNGConverterTest the colors will be "miles" off with the PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the ICC profile, because there are some color rounding differences (off by 1 on the first pixel, for whatever reason, likely some different color conversion paths somewhere). There is a massive difference between converting single pixel values between colorspaces and converting a whole image at once (using ColorConversionOp). The later one may choose slightly different colors depending on the rendering intent and the colors in use in the image. The image from PDImage.getImage() would have been ColorConversionOp-converted, but in checkIdent() using getRGB() the image read with ImageIO would be "pixel by pixel" color converted. One could fix this by first converting the expected image using ColorConversionOp to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if you like. 


was (Author: rototor):
The bug in the PNGConverter is, that it did not correctly write the ICC profile. It had a "one off" error, as it did not skip the 0-byte marker in the profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because of this (Exception while decoding the profile). But this meant that the colors were not correct (as the wrong color profile was used; the alternative DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
         if (state.iCCP != null || state.sRGB != null)
         {
             // We have got a color profile, which we must attach
             cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
             cosStream.setItem(COSName.ALTERNATE, colorSpace.getNumberOfComponents()
                     == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
             if (state.iCCP != null)
             {
+                cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
                 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
                         break;
                     iccProfileDataStart++;
                 }
+                iccProfileDataStart++;
                 if (iccProfileDataStart >= state.iCCP.length)
                 {
                     LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without the fix for this in PNGConverterTest the colors will be "miles" off when the PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the, was there are some color rounding differences (off by 1 on the first pixel, for whatever reason, likely some different color conversion paths somewhere). There is a massive difference between converting single pixel values between colorspaces and converting a whole image at once (using ColorConversionOp). The later one may choose slightly different colors depending on the rendering intent and the colors in use in the image. The image from PDImage.getImage() would have been ColorConversionOp-converted, but in checkIdent() using getRGB() the image read with ImageIO would be "pixel by pixel" color converted. One could fix this by first converting the expected image using ColorConversionOp to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if you like.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4847
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4847
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel, Writing
>    Affects Versions: 2.0.19
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>              Labels: feature, patch
>         Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without any kind of color conversion/reduction). While implementing and testing it I also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original raster data in 8 or 16 bit without any kind of color interpretation. The user must know what he wants to do with this himself (E.g. to access the raw data of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by getRawRaster() as a BufferedImage. This is only successful if there is a matching java ColorSpace for the colorspace of the image. I.e. only for ICCBased images. In theory this also should work for PDIndexed sRGB images. But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract images in their original colorspace. For CMYK images this only works when a TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to compare it also ensures that the embedded ICC profile is correctly respected. The default PNG reader at least till JDK11 does *not* respect the embedded ICC profile. I.e. the colors are wrong. But there is a workaround for this in the PNGConverterTest (which I have in production for years now). See the screenshot for the correct color display of the png_rgb_romm_16.png testfile (left side; macOS Preview app) and the wrong display (right side; Java; inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter it also to do all kind of funny color things. E.g. a future patch could be to allow using the raw images to print PDFs. If the PDF you want to print has images with a gamut > sRGB (i.e. all modern cameras) and the target printer has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a difference in the resulting print. Such a mode would be rather slow, as the current sRGB image handling is optimized for speed and using the original raw images would need on demand color conversions in the printer driver. But you get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org