You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Bob Swanson <rd...@swansongrp.com> on 2012/08/17 22:22:11 UTC

PDFBox Issue with JPG Images

Sorry for the long post, but I think that
the issue needs some discussion.

I wrote to this group several days ago
about the larger PDF file output sizes
when using 1.7.1 vs 1.6.0.

I have done some work and discovered that
the observation of larger PDF output
files happened between 1.6.0 and 1.7.0.
I was incorrect in suggesting that the problem
was with 1.7.1.

I tried some testcases and discovered that
a study of the source was needed.

Turns out that the issue may not really be
a "bug" but more of a "feature". However
the lack of consistency (and documentation?)
has caused some of the problems I've encountered.

There are 2 obvious ways to instantiate the
PDJpeg object, one with a FileInputStream, and
other with a BufferedImage.

With the FileInputStream instantiation, everything
looks the same between 1.6.0 and 1.7.0. Files are
the same size.

I cannot use this instantiation for my project,
because I have both monochrome (gray-scale) and
color images. The FileInputStream code sets all
images as color, and that causes the PDF
readers to choke. Some throw an error, others
just won't display the gray-scale image.

To get around that behavior, I changed my
production code to use the BufferedImage
instantiation. Apparently, with version
1.7.0, the PDFBox code checks for gray-scale
vs color, and sets the correct parameters.
This is a good thing, and removes my
work-around.

However, the larger PDF files I was observing
were due to a change in the handling of the
quality (compression) level of the JPG files
so instantiated.

The execution pathways for these two instantiation
methods are quite different, and this leads to
different behavior between the two. One obvious
difference I have already described, as being the
setting of the correct gray vs color parameter.

The other differences very much involve the
JPG quality setting. More quality; larger PDF
files (about 3x the size under 1.6.0).

In addition, there is a "compression" parameter used
in PDJpeg (or which can be specified in the constructor)
that is not passed onwards to the image writer in 1.7.0.
It appears to be a dead-end value.

The BufferedImage constructor seems to be setting
the quality to 1.0, while the previous system
appears to have set it to 0.75. These quality
differences may explain the larger output
files. However, there seems to be no way to
set the quality explicitly.


The image writer now used in 1.7.0 (ImageIOUtil.writeImage),
in addition, is invoked with no "resolution"
parameter, and thus seems to arbitrarily set a
72dpi resolution for the images. I do not
understand the function of this "resolution"
entry. Can someone explain it? Is it important?

Were there changes to the JavaDoc to explain the
settings for these two methods of instantiation
for PDJpeg? Unfortunately, I seem to have
mislayed the link for the JavaDoc for PDFBox
1.7.1.

The "bottom line" may be that there are no
guide-lines that I know of, for which instantiation to
use for PDJpeg. I have no information on the
other settings, such as the "resolution", and
the "quality" of the JPG files has been set
to a specific value, with no mechanism to alter
it (not necessarily a bug?).

[Part of the reason for asking the above questions, is
that I need the illustrations to be good enough to be
zoomed in closely, when a user reads the book with
a reader such as Acrobat. If I create what I hope are
good-quality JPG's, but the final image looks poor when
zoomed in, then all the extra work is wasted.]

Thanks, as always for all the good work that has
gone into PDFBox.

Bob Swanson



Re: PDFBox Issue with JPG Images

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 17.08.2012 22:22, schrieb Bob Swanson:
> Sorry for the long post, but I think that
> the issue needs some discussion.
>
> I wrote to this group several days ago
> about the larger PDF file output sizes
> when using 1.7.1 vs 1.6.0.
>
> I have done some work and discovered that
> the observation of larger PDF output
> files happened between 1.6.0 and 1.7.0.
> I was incorrect in suggesting that the problem
> was with 1.7.1.
>
> I tried some testcases and discovered that
> a study of the source was needed.
>
> Turns out that the issue may not really be
> a "bug" but more of a "feature". However
> the lack of consistency (and documentation?)
> has caused some of the problems I've encountered.
Every patch is welcome ;-)


> There are 2 obvious ways to instantiate the
> PDJpeg object, one with a FileInputStream, and
> other with a BufferedImage.
>
> With the FileInputStream instantiation, everything
> looks the same between 1.6.0 and 1.7.0. Files are
> the same size.
>
> I cannot use this instantiation for my project,
> because I have both monochrome (gray-scale) and
> color images. The FileInputStream code sets all
> images as color, and that causes the PDF
> readers to choke. Some throw an error, others
> just won't display the gray-scale image.
>
> To get around that behavior, I changed my
> production code to use the BufferedImage
> instantiation. Apparently, with version
> 1.7.0, the PDFBox code checks for gray-scale
> vs color, and sets the correct parameters.
> This is a good thing, and removes my
> work-around.
>
> However, the larger PDF files I was observing
> were due to a change in the handling of the
> quality (compression) level of the JPG files
> so instantiated.
Correct.

> The execution pathways for these two instantiation
> methods are quite different, and this leads to
> different behavior between the two. One obvious
> difference I have already described, as being the
> setting of the correct gray vs color parameter.
>
> The other differences very much involve the
> JPG quality setting. More quality; larger PDF
> files (about 3x the size under 1.6.0).
>
> In addition, there is a "compression" parameter used
> in PDJpeg (or which can be specified in the constructor)
> that is not passed onwards to the image writer in 1.7.0.
> It appears to be a dead-end value.
>
> The BufferedImage constructor seems to be setting
> the quality to 1.0, while the previous system
> appears to have set it to 0.75. These quality
> differences may explain the larger output
> files. However, there seems to be no way to
> set the quality explicitly.
I fixed that in revisions 1374921 and 1375078 in the current trunk. See [1] and 
[2] for further information.

> The image writer now used in 1.7.0 (ImageIOUtil.writeImage),
> in addition, is invoked with no "resolution"
> parameter, and thus seems to arbitrarily set a
> 72dpi resolution for the images. I do not
> understand the function of this "resolution"
> entry. Can someone explain it? Is it important?
It is the "scaling factor" which connects the image pixels and the paper dots.

> Were there changes to the JavaDoc to explain the
> settings for these two methods of instantiation
> for PDJpeg? Unfortunately, I seem to have
> mislayed the link for the JavaDoc for PDFBox
> 1.7.1.
As I already said: every patch is welcome ;-)


> The "bottom line" may be that there are no
> guide-lines that I know of, for which instantiation to
> use for PDJpeg. I have no information on the
> other settings, such as the "resolution", and
> the "quality" of the JPG files has been set
> to a specific value, with no mechanism to alter
> it (not necessarily a bug?).
>
> [Part of the reason for asking the above questions, is
> that I need the illustrations to be good enough to be
> zoomed in closely, when a user reads the book with
> a reader such as Acrobat. If I create what I hope are
> good-quality JPG's, but the final image looks poor when
> zoomed in, then all the extra work is wasted.]
>
> Thanks, as always for all the good work that has
> gone into PDFBox.
Thanks for your description/investigation. I really like detailed 
bugreports/feature-requests. Such are making our life easier. :-)

> Bob Swanson

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-1246
[2] https://issues.apache.org/jira/browse/PDFBOX-1392