You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Emmeran Seehuber (JIRA)" <ji...@apache.org> on 2018/05/10 19:17:00 UTC

[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

    [ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470976#comment-16470976 ] 

Emmeran Seehuber commented on PDFBOX-4184:
------------------------------------------

As the topic image encoding comes up again in OpenHTMLToPDF (see [https://github.com/danfickle/openhtmltopdf/issues/212)] I reworked my 16 bit predictor based encoding I had laying around and extended it to support most BufferedImage formats and CMYK images. I originally did this for using with iText some time ago. See [^lossless_predictor_based_imageencoding.patch]

It implements image encoding with a PNG predictor. Depending on the image to encoding this results in massive space savings compared to simple image encoding without a predictor. Also image with extended color profiles work. 

To test the CMYK support I need a CMYK profile. Any one would do. For a quick test I used a profile from here:  [http://download.adobe.com/pub/adobe/iccprofiles/win/AdobeICCProfiles.zip]

I have no idea if we are allowed to include this profile in the test resources. It's missing in the patch, you must copy it from the download archive. I think we might also be allowed to use a profile from [http://www.eci.org/en/downloads]. But they did not publish any license information :(

I did not do any performance tests yet, but the predictor encoding should be faster then the existing encoding, as it tries to be more friendly to the cache (e.g. writing a row directly into a zip stream).

Please review this patch. Do I need to sign a CLA?

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-4184
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4184
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.9
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>             Fix For: 2.0.10, 3.0.0 PDFBox
>
>         Attachments: lossless_predictor_based_imageencoding.patch, pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images correctly. I've integrated a test for this here: [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as the images are currently not efficiently encoded. I.e. you could use PNG encodings to get a better compression. (By adding a COSName.DECODE_PARMS with a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is something for a later patch. It would also need another API, as there is a tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org