You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Ethan Huang <yu...@gmail.com> on 2021/04/21 21:44:14 UTC

Why does JDK 11 produce larger file size when rendering?

Hello community,

When testing with JDK 11, we found it produces larger file size than JDK 8
for rendering PDF pages to images. I know PDFBOX uses the java.awt library
to do the rendering but would like to learn more if we know why it produces
such a difference and if it is configurable.

I have attached a test doc we have but I believe this is common to all
docs.

JDK 8
The size of the image produced from the first page: 74137 bytes
The size of the image produced from the second page: 51874 bytes

JDK 11
The size of the image produced from the first page: 102464 bytes
The size of the image produced from the second page: 69454 bytes

Re: Why does JDK 11 produce larger file size when rendering?

Posted by Tilman Hausherr <TH...@t-online.de>.
This is now over a month ago and there was a new jdk version since then.

I have a size difference but this is between an older oracle jdk8 and 
the latest amazon jdk11. With amazon corretto (latest) it's the same.

I did not test your code. For that I would have to review it first what 
it does and it's too long.

I retried rendering your file. There is a size difference when using 
amazon corretto 8 and 11 when saving as png.

Tilman

Am 03.06.2021 um 09:57 schrieb Ethan Huang:
> Hi Tilman,
>
> I was distracted by other work.
> Thanks for sharing the code you experimented with! I tried with your code
> in JDK 8, 9, 10, 11, they all produced files with the same size, 368KB.
> What are the JDK versions you tried?
>
> Just checking if it is PDFBox related. I would say it is more likely Java
> related but I am thinking if some Java changes would bring changes to
> logics in PDFBox.
>
> Here is the code I tried to experiment with PDFBox and different versions
> of JDK. For the file I shared earlier, JDK 8 would produce smaller sizes,
> although the code without PDFBox you shared above produces the same file
> sizes in JDK8 and 11.
> https://drive.google.com/file/d/1RLAT6doUXZSGH_81z45Bi5Ly-Cvjb87E/view?usp=sharing
>
> On Fri, Apr 23, 2021 at 8:20 AM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> It's definitively java and not PDFBox; I first did tests whether there
>> are different rendering hints, but no. Even when not using antialiasing,
>> there are differences in size. When using it there are differences in
>> size but also in color count.
>>
>> Try this code that contains no PDFBox:
>>
>>
>> int height = 3508;
>> int width = 2480;
>>
>> BufferedImage bimg = new BufferedImage(width, height,
>> BufferedImage.TYPE_INT_RGB);
>> Graphics2D g = (Graphics2D) bimg.getGraphics();
>> g.setColor(Color.WHITE);
>> g.fillRect(0, 0, width, height);
>> g.setColor(Color.BLACK);
>>
>> RenderingHints r = new RenderingHints(null);
>> //r.put(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY);
>> r.put(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
>> g.setRenderingHints(r);
>>
>> int fontSize = 50;
>> int vertMargin = 200;
>> int leftMargin = 160;
>> int eachOffset = fontSize * 3 / 2;
>>
>> Font f = new Font("Courier New", Font.BOLD, fontSize);
>> g.setFont(f);
>>
>> int count = 1;
>> String text = "123456789 123456789 123456789 123456789 123456789
>> 123456789 ";
>> while (vertMargin + (eachOffset * (count - 1)) < height - vertMargin)
>> {
>>       String line = String.format("Line %2d: %s", count, text);
>>       g.drawChars(line.toCharArray(), 0, line.length(), leftMargin,
>> vertMargin + eachOffset * (count - 1));
>>       ++count;
>> }
>>
>> g.dispose();
>>
>> Iterator<ImageWriter> imageWriters =
>> ImageIO.getImageWritersByFormatName("png");
>> ImageWriter writer = imageWriters.next();
>> ImageWriteParam param = writer.getDefaultWriteParam();
>> if (param.canWriteCompressed())
>> {
>>       param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
>>       param.setCompressionQuality(0); // best
>> }
>> try (ImageOutputStream ios = ImageIO.createImageOutputStream(new
>> File("test-" + System.getProperty("java.version") + ".png")))
>> {
>>       writer.setOutput(ios);
>>       writer.write(null, new IIOImage(bimg, null, null), param);
>> }
>> writer.dispose();
>>
>>
>> Tilman
>>
>> Am 23.04.2021 um 05:55 schrieb Tilman Hausherr:
>>> Yes, I can confirm this. I tried with two versions of amazon corretto,
>>> saving as PNG at 100 dpi.
>>>
>>> I need to do more tests with different PDF types to find out why/when
>>> that happens. The two PNG files have a different color count. Because
>>> PNG is non lossy it means that the higher color count exists before
>>> saving.
>>>
>>> Tilman
>>>
>>> Am 23.04.2021 um 01:24 schrieb Ethan Huang:
>>>> Hi Tilman,
>>>>
>>>> Thanks for the suggestion! I have tried with the version 2.0.23. I think
>>>> the behavior is the same for different PDFBox versions.
>>>> For sharing the file, would this Google Drive link work?
>>>>
>> https://drive.google.com/file/d/1Yizkg97z-xyHk9zQj9y9iqhCXr6PfN2S/view?usp=sharing
>>>>
>>>> I think there are some changes made in JDK 11 that are different from
>>>> JDK
>>>> 8, and the parts are used by PDFBox to render images from PDFs.
>>>> It would be great if you can point out anything relevant for us to
>>>> understand the cause.
>>>>
>>>>
>>>> On Wed, Apr 21, 2021 at 7:40 PM Tilman Hausherr <TH...@t-online.de>
>>>> wrote:
>>>>
>>>>> Please upload the files to a sharehoster. Also make sure you're using
>>>>> 2.0.23.
>>>>>
>>>>> Tilman
>>>>>
>>>>> Am 21.04.2021 um 23:44 schrieb Ethan Huang:
>>>>>> Hello community,
>>>>>>
>>>>>> When testing with JDK 11, we found it produces larger file size than
>>>>>> JDK 8 for rendering PDF pages to images. I know PDFBOX uses the
>>>>>> java.awt library to do the rendering but would like to learn more if
>>>>>> we know why it produces such a difference and if it is configurable.
>>>>>>
>>>>>> I have attached a test doc we have but I believe this is common to all
>>>>>> docs.
>>>>>>
>>>>>> JDK 8
>>>>>> The size of the image produced from the first page: 74137 bytes
>>>>>> The size of the image produced from the second page: 51874 bytes
>>>>>>
>>>>>> JDK 11
>>>>>> The size of the image produced from the first page: 102464 bytes
>>>>>> The size of the image produced from the second page: 69454 bytes
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Why does JDK 11 produce larger file size when rendering?

Posted by Ethan Huang <yu...@gmail.com>.
Hi Tilman,

I was distracted by other work.
Thanks for sharing the code you experimented with! I tried with your code
in JDK 8, 9, 10, 11, they all produced files with the same size, 368KB.
What are the JDK versions you tried?

Just checking if it is PDFBox related. I would say it is more likely Java
related but I am thinking if some Java changes would bring changes to
logics in PDFBox.

Here is the code I tried to experiment with PDFBox and different versions
of JDK. For the file I shared earlier, JDK 8 would produce smaller sizes,
although the code without PDFBox you shared above produces the same file
sizes in JDK8 and 11.
https://drive.google.com/file/d/1RLAT6doUXZSGH_81z45Bi5Ly-Cvjb87E/view?usp=sharing

On Fri, Apr 23, 2021 at 8:20 AM Tilman Hausherr <TH...@t-online.de>
wrote:

> It's definitively java and not PDFBox; I first did tests whether there
> are different rendering hints, but no. Even when not using antialiasing,
> there are differences in size. When using it there are differences in
> size but also in color count.
>
> Try this code that contains no PDFBox:
>
>
> int height = 3508;
> int width = 2480;
>
> BufferedImage bimg = new BufferedImage(width, height,
> BufferedImage.TYPE_INT_RGB);
> Graphics2D g = (Graphics2D) bimg.getGraphics();
> g.setColor(Color.WHITE);
> g.fillRect(0, 0, width, height);
> g.setColor(Color.BLACK);
>
> RenderingHints r = new RenderingHints(null);
> //r.put(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY);
> r.put(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
> g.setRenderingHints(r);
>
> int fontSize = 50;
> int vertMargin = 200;
> int leftMargin = 160;
> int eachOffset = fontSize * 3 / 2;
>
> Font f = new Font("Courier New", Font.BOLD, fontSize);
> g.setFont(f);
>
> int count = 1;
> String text = "123456789 123456789 123456789 123456789 123456789
> 123456789 ";
> while (vertMargin + (eachOffset * (count - 1)) < height - vertMargin)
> {
>      String line = String.format("Line %2d: %s", count, text);
>      g.drawChars(line.toCharArray(), 0, line.length(), leftMargin,
> vertMargin + eachOffset * (count - 1));
>      ++count;
> }
>
> g.dispose();
>
> Iterator<ImageWriter> imageWriters =
> ImageIO.getImageWritersByFormatName("png");
> ImageWriter writer = imageWriters.next();
> ImageWriteParam param = writer.getDefaultWriteParam();
> if (param.canWriteCompressed())
> {
>      param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
>      param.setCompressionQuality(0); // best
> }
> try (ImageOutputStream ios = ImageIO.createImageOutputStream(new
> File("test-" + System.getProperty("java.version") + ".png")))
> {
>      writer.setOutput(ios);
>      writer.write(null, new IIOImage(bimg, null, null), param);
> }
> writer.dispose();
>
>
> Tilman
>
> Am 23.04.2021 um 05:55 schrieb Tilman Hausherr:
> > Yes, I can confirm this. I tried with two versions of amazon corretto,
> > saving as PNG at 100 dpi.
> >
> > I need to do more tests with different PDF types to find out why/when
> > that happens. The two PNG files have a different color count. Because
> > PNG is non lossy it means that the higher color count exists before
> > saving.
> >
> > Tilman
> >
> > Am 23.04.2021 um 01:24 schrieb Ethan Huang:
> >> Hi Tilman,
> >>
> >> Thanks for the suggestion! I have tried with the version 2.0.23. I think
> >> the behavior is the same for different PDFBox versions.
> >> For sharing the file, would this Google Drive link work?
> >>
> https://drive.google.com/file/d/1Yizkg97z-xyHk9zQj9y9iqhCXr6PfN2S/view?usp=sharing
> >>
> >>
> >> I think there are some changes made in JDK 11 that are different from
> >> JDK
> >> 8, and the parts are used by PDFBox to render images from PDFs.
> >> It would be great if you can point out anything relevant for us to
> >> understand the cause.
> >>
> >>
> >> On Wed, Apr 21, 2021 at 7:40 PM Tilman Hausherr <TH...@t-online.de>
> >> wrote:
> >>
> >>> Please upload the files to a sharehoster. Also make sure you're using
> >>> 2.0.23.
> >>>
> >>> Tilman
> >>>
> >>> Am 21.04.2021 um 23:44 schrieb Ethan Huang:
> >>>> Hello community,
> >>>>
> >>>> When testing with JDK 11, we found it produces larger file size than
> >>>> JDK 8 for rendering PDF pages to images. I know PDFBOX uses the
> >>>> java.awt library to do the rendering but would like to learn more if
> >>>> we know why it produces such a difference and if it is configurable.
> >>>>
> >>>> I have attached a test doc we have but I believe this is common to all
> >>>> docs.
> >>>>
> >>>> JDK 8
> >>>> The size of the image produced from the first page: 74137 bytes
> >>>> The size of the image produced from the second page: 51874 bytes
> >>>>
> >>>> JDK 11
> >>>> The size of the image produced from the first page: 102464 bytes
> >>>> The size of the image produced from the second page: 69454 bytes
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Why does JDK 11 produce larger file size when rendering?

Posted by Tilman Hausherr <TH...@t-online.de>.
It's definitively java and not PDFBox; I first did tests whether there 
are different rendering hints, but no. Even when not using antialiasing, 
there are differences in size. When using it there are differences in 
size but also in color count.

Try this code that contains no PDFBox:


int height = 3508;
int width = 2480;

BufferedImage bimg = new BufferedImage(width, height, 
BufferedImage.TYPE_INT_RGB);
Graphics2D g = (Graphics2D) bimg.getGraphics();
g.setColor(Color.WHITE);
g.fillRect(0, 0, width, height);
g.setColor(Color.BLACK);

RenderingHints r = new RenderingHints(null);
//r.put(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY);
r.put(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
g.setRenderingHints(r);

int fontSize = 50;
int vertMargin = 200;
int leftMargin = 160;
int eachOffset = fontSize * 3 / 2;

Font f = new Font("Courier New", Font.BOLD, fontSize);
g.setFont(f);

int count = 1;
String text = "123456789 123456789 123456789 123456789 123456789 
123456789 ";
while (vertMargin + (eachOffset * (count - 1)) < height - vertMargin)
{
     String line = String.format("Line %2d: %s", count, text);
     g.drawChars(line.toCharArray(), 0, line.length(), leftMargin, 
vertMargin + eachOffset * (count - 1));
     ++count;
}

g.dispose();

Iterator<ImageWriter> imageWriters = 
ImageIO.getImageWritersByFormatName("png");
ImageWriter writer = imageWriters.next();
ImageWriteParam param = writer.getDefaultWriteParam();
if (param.canWriteCompressed())
{
     param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
     param.setCompressionQuality(0); // best
}
try (ImageOutputStream ios = ImageIO.createImageOutputStream(new 
File("test-" + System.getProperty("java.version") + ".png")))
{
     writer.setOutput(ios);
     writer.write(null, new IIOImage(bimg, null, null), param);
}
writer.dispose();


Tilman

Am 23.04.2021 um 05:55 schrieb Tilman Hausherr:
> Yes, I can confirm this. I tried with two versions of amazon corretto, 
> saving as PNG at 100 dpi.
>
> I need to do more tests with different PDF types to find out why/when 
> that happens. The two PNG files have a different color count. Because 
> PNG is non lossy it means that the higher color count exists before 
> saving.
>
> Tilman
>
> Am 23.04.2021 um 01:24 schrieb Ethan Huang:
>> Hi Tilman,
>>
>> Thanks for the suggestion! I have tried with the version 2.0.23. I think
>> the behavior is the same for different PDFBox versions.
>> For sharing the file, would this Google Drive link work?
>> https://drive.google.com/file/d/1Yizkg97z-xyHk9zQj9y9iqhCXr6PfN2S/view?usp=sharing 
>>
>>
>> I think there are some changes made in JDK 11 that are different from 
>> JDK
>> 8, and the parts are used by PDFBox to render images from PDFs.
>> It would be great if you can point out anything relevant for us to
>> understand the cause.
>>
>>
>> On Wed, Apr 21, 2021 at 7:40 PM Tilman Hausherr <TH...@t-online.de>
>> wrote:
>>
>>> Please upload the files to a sharehoster. Also make sure you're using
>>> 2.0.23.
>>>
>>> Tilman
>>>
>>> Am 21.04.2021 um 23:44 schrieb Ethan Huang:
>>>> Hello community,
>>>>
>>>> When testing with JDK 11, we found it produces larger file size than
>>>> JDK 8 for rendering PDF pages to images. I know PDFBOX uses the
>>>> java.awt library to do the rendering but would like to learn more if
>>>> we know why it produces such a difference and if it is configurable.
>>>>
>>>> I have attached a test doc we have but I believe this is common to all
>>>> docs.
>>>>
>>>> JDK 8
>>>> The size of the image produced from the first page: 74137 bytes
>>>> The size of the image produced from the second page: 51874 bytes
>>>>
>>>> JDK 11
>>>> The size of the image produced from the first page: 102464 bytes
>>>> The size of the image produced from the second page: 69454 bytes
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Why does JDK 11 produce larger file size when rendering?

Posted by Tilman Hausherr <TH...@t-online.de>.
Yes, I can confirm this. I tried with two versions of amazon corretto, 
saving as PNG at 100 dpi.

I need to do more tests with different PDF types to find out why/when 
that happens. The two PNG files have a different color count. Because 
PNG is non lossy it means that the higher color count exists before saving.

Tilman

Am 23.04.2021 um 01:24 schrieb Ethan Huang:
> Hi Tilman,
>
> Thanks for the suggestion! I have tried with the version 2.0.23. I think
> the behavior is the same for different PDFBox versions.
> For sharing the file, would this Google Drive link work?
> https://drive.google.com/file/d/1Yizkg97z-xyHk9zQj9y9iqhCXr6PfN2S/view?usp=sharing
>
> I think there are some changes made in JDK 11 that are different from JDK
> 8, and the parts are used by PDFBox to render images from PDFs.
> It would be great if you can point out anything relevant for us to
> understand the cause.
>
>
> On Wed, Apr 21, 2021 at 7:40 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Please upload the files to a sharehoster. Also make sure you're using
>> 2.0.23.
>>
>> Tilman
>>
>> Am 21.04.2021 um 23:44 schrieb Ethan Huang:
>>> Hello community,
>>>
>>> When testing with JDK 11, we found it produces larger file size than
>>> JDK 8 for rendering PDF pages to images. I know PDFBOX uses the
>>> java.awt library to do the rendering but would like to learn more if
>>> we know why it produces such a difference and if it is configurable.
>>>
>>> I have attached a test doc we have but I believe this is common to all
>>> docs.
>>>
>>> JDK 8
>>> The size of the image produced from the first page: 74137 bytes
>>> The size of the image produced from the second page: 51874 bytes
>>>
>>> JDK 11
>>> The size of the image produced from the first page: 102464 bytes
>>> The size of the image produced from the second page: 69454 bytes
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Why does JDK 11 produce larger file size when rendering?

Posted by Ethan Huang <yu...@gmail.com>.
Hi Tilman,

Thanks for the suggestion! I have tried with the version 2.0.23. I think
the behavior is the same for different PDFBox versions.
For sharing the file, would this Google Drive link work?
https://drive.google.com/file/d/1Yizkg97z-xyHk9zQj9y9iqhCXr6PfN2S/view?usp=sharing

I think there are some changes made in JDK 11 that are different from JDK
8, and the parts are used by PDFBox to render images from PDFs.
It would be great if you can point out anything relevant for us to
understand the cause.


On Wed, Apr 21, 2021 at 7:40 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Please upload the files to a sharehoster. Also make sure you're using
> 2.0.23.
>
> Tilman
>
> Am 21.04.2021 um 23:44 schrieb Ethan Huang:
> > Hello community,
> >
> > When testing with JDK 11, we found it produces larger file size than
> > JDK 8 for rendering PDF pages to images. I know PDFBOX uses the
> > java.awt library to do the rendering but would like to learn more if
> > we know why it produces such a difference and if it is configurable.
> >
> > I have attached a test doc we have but I believe this is common to all
> > docs.
> >
> > JDK 8
> > The size of the image produced from the first page: 74137 bytes
> > The size of the image produced from the second page: 51874 bytes
> >
> > JDK 11
> > The size of the image produced from the first page: 102464 bytes
> > The size of the image produced from the second page: 69454 bytes
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
>

Re: Why does JDK 11 produce larger file size when rendering?

Posted by Tilman Hausherr <TH...@t-online.de>.
Please upload the files to a sharehoster. Also make sure you're using 
2.0.23.

Tilman

Am 21.04.2021 um 23:44 schrieb Ethan Huang:
> Hello community,
>
> When testing with JDK 11, we found it produces larger file size than 
> JDK 8 for rendering PDF pages to images. I know PDFBOX uses the 
> java.awt library to do the rendering but would like to learn more if 
> we know why it produces such a difference and if it is configurable.
>
> I have attached a test doc we have but I believe this is common to all 
> docs.
>
> JDK 8
> The size of the image produced from the first page: 74137 bytes
> The size of the image produced from the second page: 51874 bytes
>
> JDK 11
> The size of the image produced from the first page: 102464 bytes
> The size of the image produced from the second page: 69454 bytes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org