You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Timo Boehme <ti...@ontochem.com> on 2014/08/26 14:34:19 UTC
Rendering problem example
Hi,
checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I
came across a journal which showed severe problems in both - but
different. Problems of 1.8 are gone, new ones showed up.
While the journal (Chemical&Engineering News, C&EN) does not provide
free PDF editions a sample edition can be downloaded via 'View a sample
issue' at http://cen.acs.org/static/about/digital.html (or directly via
http://www.cendigital.org/cendigital/sample/). I'm referring to volume
92, nr 27 from 2014-07-07 which I downloaded yesterday but the same
problems also showed up in other journal issues.
The problems (all on Linux, Java 1.6):
- PDFBOX 1.8 (svn 1620380)
- first letters of words in headlines are sometimes missing, e.g. on
page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
- bad character spacing because of substituted font
- PDFBOX trunk (svn 1620415)
- no missing letters but heavily distorted and displaced
letters in headlines (e.g. page 2)
- compared to 1.8 correct font is used
- picture colors are completely wrong;
logged warning: org.apache.pdfbox.filter.DCTFilter decode
WARNUNG: Inconsistent metadata read from JPEG stream
- transparent background instead of white
- PDFBOX trunk, no-awt svn 1620487
- font rendering ok
- picture/background problems as in trunk
Since these are multiples problems on different versions and the PDF is
not freely distributable I did not create a JIRA issue. Nevertheless it
is a widely distributed journal and a good test case for the rendering
quality. At least the JPEG rendering problem of the current trunk should
be solved.
Best,
Timo
--
Timo Boehme
OntoChem GmbH
H.-Damerow-Str. 4
06120 Halle/Saale
T: +49 345 4780474
F: +49 345 4780471
timo.boehme@ontochem.com
_____________________________________________________________________
OntoChem GmbH
Geschäftsführer: Dr. Lutz Weber
Sitz: Halle / Saale
Registergericht: Stendal
Registernummer: HRB 215461
_____________________________________________________________________
Re: Rendering problem example
Posted by Timo Boehme <ti...@ontochem.com>.
Dear Arakeri,
Am 26.08.2014 17:14, schrieb Santosh Arakeri:
> Please dont send me mail.
seems you have subscribed yourself to dev@pdfbox.apache.org
mailing list. For information on how to unsubscribe have a look into
your confirmation email or go to
https://pdfbox.apache.org/mailinglists.html
Best,
Timo
--
Timo Boehme
OntoChem GmbH
H.-Damerow-Str. 4
06120 Halle/Saale
T: +49 345 4780474
F: +49 345 4780471
timo.boehme@ontochem.com
_____________________________________________________________________
OntoChem GmbH
Geschäftsführer: Dr. Lutz Weber
Sitz: Halle / Saale
Registergericht: Stendal
Registernummer: HRB 215461
_____________________________________________________________________
Re: Rendering problem example
Posted by Santosh Arakeri <sa...@gmail.com>.
Please dont send me mail.
On Tue, Aug 26, 2014 at 6:04 PM, Timo Boehme <ti...@ontochem.com>
wrote:
> Hi,
>
> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I came
> across a journal which showed severe problems in both - but different.
> Problems of 1.8 are gone, new ones showed up.
>
> While the journal (Chemical&Engineering News, C&EN) does not provide free
> PDF editions a sample edition can be downloaded via 'View a sample issue'
> at http://cen.acs.org/static/about/digital.html (or directly via
> http://www.cendigital.org/cendigital/sample/). I'm referring to volume
> 92, nr 27 from 2014-07-07 which I downloaded yesterday but the same
> problems also showed up in other journal issues.
>
> The problems (all on Linux, Java 1.6):
> - PDFBOX 1.8 (svn 1620380)
> - first letters of words in headlines are sometimes missing, e.g. on
> page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
> - bad character spacing because of substituted font
>
> - PDFBOX trunk (svn 1620415)
> - no missing letters but heavily distorted and displaced
> letters in headlines (e.g. page 2)
> - compared to 1.8 correct font is used
> - picture colors are completely wrong;
> logged warning: org.apache.pdfbox.filter.DCTFilter decode
> WARNUNG: Inconsistent metadata read from JPEG stream
> - transparent background instead of white
>
> - PDFBOX trunk, no-awt svn 1620487
> - font rendering ok
> - picture/background problems as in trunk
>
> Since these are multiples problems on different versions and the PDF is
> not freely distributable I did not create a JIRA issue. Nevertheless it is
> a widely distributed journal and a good test case for the rendering
> quality. At least the JPEG rendering problem of the current trunk should be
> solved.
>
>
> Best,
> Timo
>
> --
>
> Timo Boehme
> OntoChem GmbH
> H.-Damerow-Str. 4
> 06120 Halle/Saale
> T: +49 345 4780474
> F: +49 345 4780471
> timo.boehme@ontochem.com
>
> _____________________________________________________________________
>
> OntoChem GmbH
> Geschäftsführer: Dr. Lutz Weber
> Sitz: Halle / Saale
> Registergericht: Stendal
> Registernummer: HRB 215461
> _____________________________________________________________________
>
>
Re: Rendering problem example
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 26.08.2014 15:32, schrieb Tilman Hausherr:
> We have problems recognizing YCCK / CMYK jpeg files.
>
> I can't find the issue quickly, it was a few weeks ago from a french
> person and about an image about a Porsche event.
https://issues.apache.org/jira/browse/PDFBOX-2128
Tilman
> Anyway, what worked for me (most of the time) is to use Apache Imaging
> to detect the image type.
>
> I wrote "most of the time" because it is not perfect, although better
> than java.
>
> https://issues.apache.org/jira/browse/IMAGING-136
> <https://issues.apache.org/jira/browse/IMAGING-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
>
> I'm not sure if Apache Imaging is still an active project.
>
> Tilman
>
> Am 26.08.2014 14:34, schrieb Timo Boehme:
>> Hi,
>>
>> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I
>> came across a journal which showed severe problems in both - but
>> different. Problems of 1.8 are gone, new ones showed up.
>>
>> While the journal (Chemical&Engineering News, C&EN) does not provide
>> free PDF editions a sample edition can be downloaded via 'View a
>> sample issue' at http://cen.acs.org/static/about/digital.html (or
>> directly via http://www.cendigital.org/cendigital/sample/). I'm
>> referring to volume 92, nr 27 from 2014-07-07 which I downloaded
>> yesterday but the same problems also showed up in other journal issues.
>>
>> The problems (all on Linux, Java 1.6):
>> - PDFBOX 1.8 (svn 1620380)
>> - first letters of words in headlines are sometimes missing, e.g. on
>> page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
>> - bad character spacing because of substituted font
>>
>> - PDFBOX trunk (svn 1620415)
>> - no missing letters but heavily distorted and displaced
>> letters in headlines (e.g. page 2)
>> - compared to 1.8 correct font is used
>> - picture colors are completely wrong;
>> logged warning: org.apache.pdfbox.filter.DCTFilter decode
>> WARNUNG: Inconsistent metadata read from JPEG stream
>> - transparent background instead of white
>>
>> - PDFBOX trunk, no-awt svn 1620487
>> - font rendering ok
>> - picture/background problems as in trunk
>>
>> Since these are multiples problems on different versions and the PDF
>> is not freely distributable I did not create a JIRA issue.
>> Nevertheless it is a widely distributed journal and a good test case
>> for the rendering quality. At least the JPEG rendering problem of the
>> current trunk should be solved.
>>
>>
>> Best,
>> Timo
>>
>
>
Re: Rendering problem example
Posted by Tilman Hausherr <TH...@t-online.de>.
We have problems recognizing YCCK / CMYK jpeg files.
I can't find the issue quickly, it was a few weeks ago from a french
person and about an image about a Porsche event.
Anyway, what worked for me (most of the time) is to use Apache Imaging
to detect the image type.
I wrote "most of the time" because it is not perfect, although better
than java.
https://issues.apache.org/jira/browse/IMAGING-136 <https://issues.apache.org/jira/browse/IMAGING-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
I'm not sure if Apache Imaging is still an active project.
Tilman
Am 26.08.2014 14:34, schrieb Timo Boehme:
> Hi,
>
> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I
> came across a journal which showed severe problems in both - but
> different. Problems of 1.8 are gone, new ones showed up.
>
> While the journal (Chemical&Engineering News, C&EN) does not provide
> free PDF editions a sample edition can be downloaded via 'View a
> sample issue' at http://cen.acs.org/static/about/digital.html (or
> directly via http://www.cendigital.org/cendigital/sample/). I'm
> referring to volume 92, nr 27 from 2014-07-07 which I downloaded
> yesterday but the same problems also showed up in other journal issues.
>
> The problems (all on Linux, Java 1.6):
> - PDFBOX 1.8 (svn 1620380)
> - first letters of words in headlines are sometimes missing, e.g. on
> page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
> - bad character spacing because of substituted font
>
> - PDFBOX trunk (svn 1620415)
> - no missing letters but heavily distorted and displaced
> letters in headlines (e.g. page 2)
> - compared to 1.8 correct font is used
> - picture colors are completely wrong;
> logged warning: org.apache.pdfbox.filter.DCTFilter decode
> WARNUNG: Inconsistent metadata read from JPEG stream
> - transparent background instead of white
>
> - PDFBOX trunk, no-awt svn 1620487
> - font rendering ok
> - picture/background problems as in trunk
>
> Since these are multiples problems on different versions and the PDF
> is not freely distributable I did not create a JIRA issue.
> Nevertheless it is a widely distributed journal and a good test case
> for the rendering quality. At least the JPEG rendering problem of the
> current trunk should be solved.
>
>
> Best,
> Timo
>