You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by JJ Blodgett <jj...@silvervinesoftware.com> on 2023/08/01 18:49:00 UTC

Border / Box around images and form elements with backgrounds

We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.

Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.

But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.

Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.

If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.


Re: Border / Box around images and form elements with backgrounds

Posted by JJ Blodgett <jj...@silvervinesoftware.com>.
Andreas,

Here is a link to the original PDF.

https://drive.google.com/file/d/1SXa4-EHikjXggKTL4NU9H896tQHEu_-r/view

It happens with our production documents as well but this is just an example I created to avoid sharing any proprietary information.

Thanks,
J. J.


________________________________
From: Andreas Lehmkühler <an...@lehmi.de.INVALID>
Sent: Sunday, August 6, 2023 10:34 PM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Re: Border / Box around images and form elements with backgrounds

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Please provide the source pdf you used for rendering as well.

Thanks in advance
Andreas

Am 01.08.23 um 22:30 schrieb JJ Blodgett:
> It looks like the attachments were stripped out of the email. I'll try to include Google doc links and hope these work:
>
> Example of bad behavior: https://urldefense.com/v3/__https://drive.google.com/file/d/1ZU-vvZ1uTTDM0LTRhDJPwqVX5nY2dBL_/view?usp=drive_link__;!!I_DbfM1H!FfGphHPTK_1AHOxCHKEgWXGEbTlNT_L-2VrzTYuHI5uPynUMyyVABp028UTjF9Cy6ZdRE7K89X_zn_E9K8154WUMjPv43_uyzjJv$
>
> ARGB render image: https://urldefense.com/v3/__https://drive.google.com/file/d/1ZwyZejehc6AdiQJHxdJ5QrsvfJbgSq9S/view?usp=drive_link__;!!I_DbfM1H!FfGphHPTK_1AHOxCHKEgWXGEbTlNT_L-2VrzTYuHI5uPynUMyyVABp028UTjF9Cy6ZdRE7K89X_zn_E9K8154WUMjPv438tkgomv$
> RGB render image: https://urldefense.com/v3/__https://drive.google.com/file/d/1m7Ikf1G65HoGJSHt9PLt6TVgT5qMhpMa/view?usp=drive_link__;!!I_DbfM1H!FfGphHPTK_1AHOxCHKEgWXGEbTlNT_L-2VrzTYuHI5uPynUMyyVABp028UTjF9Cy6ZdRE7K89X_zn_E9K8154WUMjPv432eTy26N$
>
> ARGB output PDF: https://urldefense.com/v3/__https://drive.google.com/file/d/1kb-SHEE8xS2PYTWrAgfYgmuKJMF6YUql/view?usp=drive_link__;!!I_DbfM1H!FfGphHPTK_1AHOxCHKEgWXGEbTlNT_L-2VrzTYuHI5uPynUMyyVABp028UTjF9Cy6ZdRE7K89X_zn_E9K8154WUMjPv435Dv25Yi$
> RGB output PDF: https://urldefense.com/v3/__https://drive.google.com/file/d/1PpHVEsSGcUltKZY0Gi-Kk1kLIx9XPLIW/view?usp=drive_link__;!!I_DbfM1H!FfGphHPTK_1AHOxCHKEgWXGEbTlNT_L-2VrzTYuHI5uPynUMyyVABp028UTjF9Cy6ZdRE7K89X_zn_E9K8154WUMjPv43yUATA7i$
>
>
> ________________________________
> From: JJ Blodgett <jj...@silvervinesoftware.com>
> Sent: Tuesday, August 1, 2023 11:49 AM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Border / Box around images and form elements with backgrounds
>
>
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
>
> We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.
>
> Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.
>
> But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.
>
> Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.
>
> If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Border / Box around images and form elements with backgrounds

Posted by Andreas Lehmkühler <an...@lehmi.de.INVALID>.
Please provide the source pdf you used for rendering as well.

Thanks in advance
Andreas

Am 01.08.23 um 22:30 schrieb JJ Blodgett:
> It looks like the attachments were stripped out of the email. I'll try to include Google doc links and hope these work:
> 
> Example of bad behavior: https://drive.google.com/file/d/1ZU-vvZ1uTTDM0LTRhDJPwqVX5nY2dBL_/view?usp=drive_link
> 
> ARGB render image: https://drive.google.com/file/d/1ZwyZejehc6AdiQJHxdJ5QrsvfJbgSq9S/view?usp=drive_link
> RGB render image: https://drive.google.com/file/d/1m7Ikf1G65HoGJSHt9PLt6TVgT5qMhpMa/view?usp=drive_link
> 
> ARGB output PDF: https://drive.google.com/file/d/1kb-SHEE8xS2PYTWrAgfYgmuKJMF6YUql/view?usp=drive_link
> RGB output PDF: https://drive.google.com/file/d/1PpHVEsSGcUltKZY0Gi-Kk1kLIx9XPLIW/view?usp=drive_link
> 
> 
> ________________________________
> From: JJ Blodgett <jj...@silvervinesoftware.com>
> Sent: Tuesday, August 1, 2023 11:49 AM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Border / Box around images and form elements with backgrounds
> 
> 
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
> 
> We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.
> 
> Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.
> 
> But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.
> 
> Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.
> 
> If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Border / Box around images and form elements with backgrounds

Posted by Tilman Hausherr <TH...@t-online.de>.
Hi,

Using losslessFactory is the only idea I had. I don't know where the 
line comes from. I suspect it's a "feature" in the graphics engine of 
some products.

Tilman

On 02.08.2023 22:05, JJ Blodgett wrote:
> This image limitation is a pain. But here is some code:
> var mediaBox = newPDPage.getMediaBox();
> var imgWidth = Int(mediaBox.getUpperRightX());
> var imgHeight = Int(mediaBox.getUpperRightY());
>
> var bufferedImage = objRender.renderImageWithDPI(i, dpi, objImageType[url.colorSpace]);
>
> var newPDImage = newJPEGFactory.createFromImage(newPDDocument, bufferedImage, url.quality, dpi);
> var newPDImageLL = newLosslessFactory.createFromImage(newPDDocument, bufferedImage);
>
> //newPDPageContentStream.drawImage(newPDImage, 0, 0, imgWidth, imgHeight);
> newPDPageContentStream.drawImage(newPDImageLL, 0, 0, imgWidth, imgHeight);
>
> I forgot that I had also tried the losslessFactory and it does work to generate / draw images but is also much larger filesize so not an option either. And when I do it high quality, it still prints the borders around the images and form fields. So didn't even solve the problem. I just can't figure out which step is introducing the border. The raw png files seem to be pristine so it's as I'm putting them back together I guess.
>
> ________________________________
> From: JJ Blodgett <jj...@silvervinesoftware.com>
> Sent: Wednesday, August 2, 2023 12:47 PM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Re: Border / Box around images and form elements with backgrounds
>
>
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
>
> Ok. Gotcha. What I'm doing is to create the image on the fly (as buffered image), then directly drawing the image using a JPEGFactory response. So maybe that's where it's getting hosed up. There are options to drawImage from content read from a file but we're trying to avoid that if possible to eliminate disk I/O as a bottleneck when dealing with multiple thousands of images.
>
> [cid:ba407754-72d1-481d-8d9c-a271163b0497]
>
> Maybe one of these 2 steps (likely the first one) is where this is getting introduced. I haven't found a way to directly draw a bufferedImage of a PNG into a PDF without writing to file first. I'm new to the PDFBox side of things so don't have a firm grasp on all of the possibilities yet.
>
> ________________________________
> From: Tilman Hausherr <TH...@t-online.de>
> Sent: Wednesday, August 2, 2023 11:27 AM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Re: Border / Box around images and form elements with backgrounds
>
>
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
>
> On 02.08.2023 18:11, JJ Blodgett wrote:
>
> Not sure what you mean about ARGB being jpeg. The examples I provided should have been PNG.
>
> Here's what I mean - the image in the PDF is a JPEG encoded image with a b/w JPEG ("DCTDecode" is JPEG) encoded mask. Using JPEG is a weird idea for b/w images, CCITT 4 is best. I'm wondering if the non-matching is because of the weird compression.
>
> [cid:part1.0ze00fN2.5NiM0jXY@t-online.de]
>
>
> Tilman
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Border / Box around images and form elements with backgrounds

Posted by JJ Blodgett <jj...@silvervinesoftware.com>.
This image limitation is a pain. But here is some code:
var mediaBox = newPDPage.getMediaBox();
var imgWidth = Int(mediaBox.getUpperRightX());
var imgHeight = Int(mediaBox.getUpperRightY());

var bufferedImage = objRender.renderImageWithDPI(i, dpi, objImageType[url.colorSpace]);

var newPDImage = newJPEGFactory.createFromImage(newPDDocument, bufferedImage, url.quality, dpi);
var newPDImageLL = newLosslessFactory.createFromImage(newPDDocument, bufferedImage);

//newPDPageContentStream.drawImage(newPDImage, 0, 0, imgWidth, imgHeight);
newPDPageContentStream.drawImage(newPDImageLL, 0, 0, imgWidth, imgHeight);

I forgot that I had also tried the losslessFactory and it does work to generate / draw images but is also much larger filesize so not an option either. And when I do it high quality, it still prints the borders around the images and form fields. So didn't even solve the problem. I just can't figure out which step is introducing the border. The raw png files seem to be pristine so it's as I'm putting them back together I guess.

________________________________
From: JJ Blodgett <jj...@silvervinesoftware.com>
Sent: Wednesday, August 2, 2023 12:47 PM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Re: Border / Box around images and form elements with backgrounds


EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Ok. Gotcha. What I'm doing is to create the image on the fly (as buffered image), then directly drawing the image using a JPEGFactory response. So maybe that's where it's getting hosed up. There are options to drawImage from content read from a file but we're trying to avoid that if possible to eliminate disk I/O as a bottleneck when dealing with multiple thousands of images.

[cid:ba407754-72d1-481d-8d9c-a271163b0497]

Maybe one of these 2 steps (likely the first one) is where this is getting introduced. I haven't found a way to directly draw a bufferedImage of a PNG into a PDF without writing to file first. I'm new to the PDFBox side of things so don't have a firm grasp on all of the possibilities yet.

________________________________
From: Tilman Hausherr <TH...@t-online.de>
Sent: Wednesday, August 2, 2023 11:27 AM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Re: Border / Box around images and form elements with backgrounds


EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On 02.08.2023 18:11, JJ Blodgett wrote:

Not sure what you mean about ARGB being jpeg. The examples I provided should have been PNG.

Here's what I mean - the image in the PDF is a JPEG encoded image with a b/w JPEG ("DCTDecode" is JPEG) encoded mask. Using JPEG is a weird idea for b/w images, CCITT 4 is best. I'm wondering if the non-matching is because of the weird compression.

[cid:part1.0ze00fN2.5NiM0jXY@t-online.de]


Tilman

Re: Border / Box around images and form elements with backgrounds

Posted by JJ Blodgett <jj...@silvervinesoftware.com>.
Ok. Gotcha. What I'm doing is to create the image on the fly (as buffered image), then directly drawing the image using a JPEGFactory response. So maybe that's where it's getting hosed up. There are options to drawImage from content read from a file but we're trying to avoid that if possible to eliminate disk I/O as a bottleneck when dealing with multiple thousands of images.

[cid:ba407754-72d1-481d-8d9c-a271163b0497]

Maybe one of these 2 steps (likely the first one) is where this is getting introduced. I haven't found a way to directly draw a bufferedImage of a PNG into a PDF without writing to file first. I'm new to the PDFBox side of things so don't have a firm grasp on all of the possibilities yet.

________________________________
From: Tilman Hausherr <TH...@t-online.de>
Sent: Wednesday, August 2, 2023 11:27 AM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Re: Border / Box around images and form elements with backgrounds


EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

On 02.08.2023 18:11, JJ Blodgett wrote:

Not sure what you mean about ARGB being jpeg. The examples I provided should have been PNG.

Here's what I mean - the image in the PDF is a JPEG encoded image with a b/w JPEG ("DCTDecode" is JPEG) encoded mask. Using JPEG is a weird idea for b/w images, CCITT 4 is best. I'm wondering if the non-matching is because of the weird compression.

[cid:part1.0ze00fN2.5NiM0jXY@t-online.de]


Tilman

Re: Border / Box around images and form elements with backgrounds

Posted by Tilman Hausherr <TH...@t-online.de>.
On 02.08.2023 18:11, JJ Blodgett wrote:
> Not sure what you mean about ARGB being jpeg. The examples I provided should have been PNG.

Here's what I mean - the image in the PDF is a JPEG encoded image with a 
b/w JPEG ("DCTDecode" is JPEG) encoded mask. Using JPEG is a weird idea 
for b/w images, CCITT 4 is best. I'm wondering if the non-matching is 
because of the weird compression.


Tilman

Re: Border / Box around images and form elements with backgrounds

Posted by JJ Blodgett <jj...@silvervinesoftware.com>.
Not sure what you mean about ARGB being jpeg. The examples I provided should have been PNG.

The use case and reason we're doing this has to do with file sizes. These are insurance documents. There could be 20-50 different individual templates at play depending on the exact makeup of each insurance policy and they get populated with data at generation time. And each policy gets a unique combination of documents at issuance. Each day, we take all of the policies that need to print for the day by type and state (which could be 100's or more total pages / docs) and put them in a single batch for the outsourced print service. There are sometimes 100's of individual batch files being printed daily. The Printer then takes each resulting Batch PDF file and converts to postscript (or something similar) to embed their print controls and then back to PDF before sending through their printers.

If we embed fonts in each individual document, we end up with very large files sizes once they're combined into large batches and it makes the print production and file sharing / storage process much more cumbersome. Because individual policy documents are generated first and have to print in specific orders in the final doc, we can't / don't know what fonts will be involved in each final Batch pdf. And it also slows down the process where we would have a hard time completing each day's print before having to start the next day's workload.

We're open to other options but haven't come up with a good solution yet for high volume clients. I know the PDF -> image -> PDF is a little non-traditional but we were trying to find something that could put the burden of accuracy on our system rather than ending up with garbled character sets in the final product since fonts are not currently embedded. Maybe we're missing something obvious but even a few misprinted documents is a huge liability and we're trying to reduce the likelihood of that happening to as close to zero as possible.


________________________________
From: Tilman Hausherr <TH...@t-online.de>
Sent: Tuesday, August 1, 2023 11:12 PM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Re: Border / Box around images and form elements with backgrounds

EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Why are the ARGB image and its mask both JPEG?

I can see the effect with Adobe Reader and Chrome, but not with PDF.js
and PDF-XChange.

The whole thing you're doing sounds weird. You're printing at a low dpi
instead of using vector fonts that will look great at every dpi. The
"font issues" are usually avoided by telling your clients that their
fonts MUST be embedded AND subsetted or else. If your printing is a mass
mailing then the fonts needs to be only once for the whole document.

Tilman

On 01.08.2023 22:30, JJ Blodgett wrote:
> It looks like the attachments were stripped out of the email. I'll try to include Google doc links and hope these work:
>
> Example of bad behavior: https://urldefense.com/v3/__https://drive.google.com/file/d/1ZU-vvZ1uTTDM0LTRhDJPwqVX5nY2dBL_/view?usp=drive_link__;!!I_DbfM1H!AA3gNX8OOE2YHGHy7kvG3paGSHiPOWdhUZyVGExa0KgE7WLWflgkWq8chYRmzaszJHMEuVtQQmjVGlOGKgxft-zfua8h-FGgh8g$
>
> ARGB render image: https://urldefense.com/v3/__https://drive.google.com/file/d/1ZwyZejehc6AdiQJHxdJ5QrsvfJbgSq9S/view?usp=drive_link__;!!I_DbfM1H!AA3gNX8OOE2YHGHy7kvG3paGSHiPOWdhUZyVGExa0KgE7WLWflgkWq8chYRmzaszJHMEuVtQQmjVGlOGKgxft-zfua8hg68zMzs$
> RGB render image: https://urldefense.com/v3/__https://drive.google.com/file/d/1m7Ikf1G65HoGJSHt9PLt6TVgT5qMhpMa/view?usp=drive_link__;!!I_DbfM1H!AA3gNX8OOE2YHGHy7kvG3paGSHiPOWdhUZyVGExa0KgE7WLWflgkWq8chYRmzaszJHMEuVtQQmjVGlOGKgxft-zfua8hNYO4wUs$
>
> ARGB output PDF: https://urldefense.com/v3/__https://drive.google.com/file/d/1kb-SHEE8xS2PYTWrAgfYgmuKJMF6YUql/view?usp=drive_link__;!!I_DbfM1H!AA3gNX8OOE2YHGHy7kvG3paGSHiPOWdhUZyVGExa0KgE7WLWflgkWq8chYRmzaszJHMEuVtQQmjVGlOGKgxft-zfua8hqLR2HkI$
> RGB output PDF: https://urldefense.com/v3/__https://drive.google.com/file/d/1PpHVEsSGcUltKZY0Gi-Kk1kLIx9XPLIW/view?usp=drive_link__;!!I_DbfM1H!AA3gNX8OOE2YHGHy7kvG3paGSHiPOWdhUZyVGExa0KgE7WLWflgkWq8chYRmzaszJHMEuVtQQmjVGlOGKgxft-zfua8hVb8L7hk$
>
>
> ________________________________
> From: JJ Blodgett <jj...@silvervinesoftware.com>
> Sent: Tuesday, August 1, 2023 11:49 AM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Border / Box around images and form elements with backgrounds
>
>
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
>
> We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.
>
> Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.
>
> But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.
>
> Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.
>
> If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Border / Box around images and form elements with backgrounds

Posted by Tilman Hausherr <TH...@t-online.de>.
Why are the ARGB image and its mask both JPEG?

I can see the effect with Adobe Reader and Chrome, but not with PDF.js 
and PDF-XChange.

The whole thing you're doing sounds weird. You're printing at a low dpi 
instead of using vector fonts that will look great at every dpi. The 
"font issues" are usually avoided by telling your clients that their 
fonts MUST be embedded AND subsetted or else. If your printing is a mass 
mailing then the fonts needs to be only once for the whole document.

Tilman

On 01.08.2023 22:30, JJ Blodgett wrote:
> It looks like the attachments were stripped out of the email. I'll try to include Google doc links and hope these work:
>
> Example of bad behavior: https://drive.google.com/file/d/1ZU-vvZ1uTTDM0LTRhDJPwqVX5nY2dBL_/view?usp=drive_link
>
> ARGB render image: https://drive.google.com/file/d/1ZwyZejehc6AdiQJHxdJ5QrsvfJbgSq9S/view?usp=drive_link
> RGB render image: https://drive.google.com/file/d/1m7Ikf1G65HoGJSHt9PLt6TVgT5qMhpMa/view?usp=drive_link
>
> ARGB output PDF: https://drive.google.com/file/d/1kb-SHEE8xS2PYTWrAgfYgmuKJMF6YUql/view?usp=drive_link
> RGB output PDF: https://drive.google.com/file/d/1PpHVEsSGcUltKZY0Gi-Kk1kLIx9XPLIW/view?usp=drive_link
>
>
> ________________________________
> From: JJ Blodgett <jj...@silvervinesoftware.com>
> Sent: Tuesday, August 1, 2023 11:49 AM
> To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
> Subject: Border / Box around images and form elements with backgrounds
>
>
> EXTERNAL: Do not click links or open attachments if you do not recognize the sender.
>
> We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.
>
> Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.
>
> But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.
>
> Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.
>
> If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Border / Box around images and form elements with backgrounds

Posted by JJ Blodgett <jj...@silvervinesoftware.com>.
It looks like the attachments were stripped out of the email. I'll try to include Google doc links and hope these work:

Example of bad behavior: https://drive.google.com/file/d/1ZU-vvZ1uTTDM0LTRhDJPwqVX5nY2dBL_/view?usp=drive_link

ARGB render image: https://drive.google.com/file/d/1ZwyZejehc6AdiQJHxdJ5QrsvfJbgSq9S/view?usp=drive_link
RGB render image: https://drive.google.com/file/d/1m7Ikf1G65HoGJSHt9PLt6TVgT5qMhpMa/view?usp=drive_link

ARGB output PDF: https://drive.google.com/file/d/1kb-SHEE8xS2PYTWrAgfYgmuKJMF6YUql/view?usp=drive_link
RGB output PDF: https://drive.google.com/file/d/1PpHVEsSGcUltKZY0Gi-Kk1kLIx9XPLIW/view?usp=drive_link


________________________________
From: JJ Blodgett <jj...@silvervinesoftware.com>
Sent: Tuesday, August 1, 2023 11:49 AM
To: users@pdfbox.apache.org <us...@pdfbox.apache.org>
Subject: Border / Box around images and form elements with backgrounds


EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

We're working on converting large batches of text-based PDF documents into images and then back to PDF (partly to avoid font issues with certain print processes down the line). But we've come across an issue that's preventing us from moving forward.

Both with version 2.0.29 and 3.0.0, we can generate clean images with "PDFRenderer" and renderImageWithDPI() or similar methods. With RGB output, we get solid images but the size is larger than we'd like. So we try to use ARGB which creates a smaller / transparent background image except for 2 items we've found. Any form field with a transparent background and any embedded image have a non-transparent background. The images look clean and presumably are exactly what we need out of the render process.

But as soon as we try to convert the images back into a PDF by drawing the image to a blank document page, we end up with a border around all images and form fields that are non-transparent. I've included examples of both the raw images and the resulting PDF (as well as the source PDF). We've tried all kinds of things from render settings to draw settings and can't find a combination that changes this at all. We could address all of the form fields by removing backgrounds in our templates. However, we can't actually do anything to get rid of company logos or other images that need to appear in the documents.

Because we can't figure out how to get around this issue, we're unable to use ARGB and file sizes are too large to work with. If we can get ARGB to write to documents without the border, I think we can move forward. Any ideas on how or why this happens and whether there is a workaround or not?  If it matters, we're using Adobe Coldfusion to access java objects from a programming standpoint. But I'm pretty sure that's not a limiting factor. But I did notice that the built-in CF functions for working with PDF's do the same thing. So it may not have a workaround.

If there's another way to accomplish the same thing (ie end up with image-based pdf rather than text to avoid text interpretation issues), that would also be a possible solution. We can't embed fonts in the documents because the file sizes would then be too large to work with over the 1,000's of individual documents.