You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Timo Boehme <ti...@ontochem.com> on 2014/08/26 14:34:19 UTC

Rendering problem example

Hi,

checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I 
came across a journal which showed severe problems in both - but 
different. Problems of 1.8 are gone, new ones showed up.

While the journal (Chemical&Engineering News, C&EN) does not provide 
free PDF editions a sample edition can be downloaded via 'View a sample 
issue' at http://cen.acs.org/static/about/digital.html (or directly via 
http://www.cendigital.org/cendigital/sample/). I'm referring to volume 
92, nr 27 from 2014-07-07 which I downloaded yesterday but the same 
problems also showed up in other journal issues.

The problems (all on Linux, Java 1.6):
- PDFBOX 1.8 (svn 1620380)
   - first letters of words in headlines are sometimes missing, e.g. on
     page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
   - bad character spacing because of substituted font

- PDFBOX trunk (svn 1620415)
   - no missing letters but heavily distorted and displaced
     letters in headlines (e.g. page 2)
   - compared to 1.8 correct font is used
   - picture colors are completely wrong;
     logged warning: org.apache.pdfbox.filter.DCTFilter decode
                     WARNUNG: Inconsistent metadata read from JPEG stream
   - transparent background instead of white

- PDFBOX trunk, no-awt svn 1620487
   - font rendering ok
   - picture/background problems as in trunk

Since these are multiples problems on different versions and the PDF is 
not freely distributable I did not create a JIRA issue. Nevertheless it 
is a widely distributed journal and a good test case for the rendering 
quality. At least the JPEG rendering problem of the current trunk should 
be solved.


Best,
Timo

-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: Rendering problem example

Posted by Timo Boehme <ti...@ontochem.com>.
Dear Arakeri,

Am 26.08.2014 17:14, schrieb Santosh Arakeri:
> Please dont send me mail.

seems you have subscribed yourself to dev@pdfbox.apache.org
mailing list. For information on how to unsubscribe have a look into
your confirmation email or go to
https://pdfbox.apache.org/mailinglists.html


Best,
Timo


-- 

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boehme@ontochem.com

_____________________________________________________________________

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
_____________________________________________________________________


Re: Rendering problem example

Posted by Santosh Arakeri <sa...@gmail.com>.
Please dont send me mail.


On Tue, Aug 26, 2014 at 6:04 PM, Timo Boehme <ti...@ontochem.com>
wrote:

> Hi,
>
> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I came
> across a journal which showed severe problems in both - but different.
> Problems of 1.8 are gone, new ones showed up.
>
> While the journal (Chemical&Engineering News, C&EN) does not provide free
> PDF editions a sample edition can be downloaded via 'View a sample issue'
> at http://cen.acs.org/static/about/digital.html (or directly via
> http://www.cendigital.org/cendigital/sample/). I'm referring to volume
> 92, nr 27 from 2014-07-07 which I downloaded yesterday but the same
> problems also showed up in other journal issues.
>
> The problems (all on Linux, Java 1.6):
> - PDFBOX 1.8 (svn 1620380)
>   - first letters of words in headlines are sometimes missing, e.g. on
>     page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
>   - bad character spacing because of substituted font
>
> - PDFBOX trunk (svn 1620415)
>   - no missing letters but heavily distorted and displaced
>     letters in headlines (e.g. page 2)
>   - compared to 1.8 correct font is used
>   - picture colors are completely wrong;
>     logged warning: org.apache.pdfbox.filter.DCTFilter decode
>                     WARNUNG: Inconsistent metadata read from JPEG stream
>   - transparent background instead of white
>
> - PDFBOX trunk, no-awt svn 1620487
>   - font rendering ok
>   - picture/background problems as in trunk
>
> Since these are multiples problems on different versions and the PDF is
> not freely distributable I did not create a JIRA issue. Nevertheless it is
> a widely distributed journal and a good test case for the rendering
> quality. At least the JPEG rendering problem of the current trunk should be
> solved.
>
>
> Best,
> Timo
>
> --
>
>  Timo Boehme
>  OntoChem GmbH
>  H.-Damerow-Str. 4
>  06120 Halle/Saale
>  T: +49 345 4780474
>  F: +49 345 4780471
>  timo.boehme@ontochem.com
>
> _____________________________________________________________________
>
>  OntoChem GmbH
>  Geschäftsführer: Dr. Lutz Weber
>  Sitz: Halle / Saale
>  Registergericht: Stendal
>  Registernummer: HRB 215461
> _____________________________________________________________________
>
>

Re: Rendering problem example

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 26.08.2014 15:32, schrieb Tilman Hausherr:
> We have problems recognizing YCCK / CMYK jpeg files.
>
> I can't find the issue quickly, it was a few weeks ago from a french 
> person and about an image about a Porsche event.

https://issues.apache.org/jira/browse/PDFBOX-2128

Tilman




> Anyway, what worked for me (most of the time) is to use Apache Imaging 
> to detect the image type.
>
> I wrote "most of the time" because it is not perfect, although better 
> than java.
>
> https://issues.apache.org/jira/browse/IMAGING-136 
> <https://issues.apache.org/jira/browse/IMAGING-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
>
> I'm not sure if Apache Imaging is still an active project.
>
> Tilman
>
> Am 26.08.2014 14:34, schrieb Timo Boehme:
>> Hi,
>>
>> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I 
>> came across a journal which showed severe problems in both - but 
>> different. Problems of 1.8 are gone, new ones showed up.
>>
>> While the journal (Chemical&Engineering News, C&EN) does not provide 
>> free PDF editions a sample edition can be downloaded via 'View a 
>> sample issue' at http://cen.acs.org/static/about/digital.html (or 
>> directly via http://www.cendigital.org/cendigital/sample/). I'm 
>> referring to volume 92, nr 27 from 2014-07-07 which I downloaded 
>> yesterday but the same problems also showed up in other journal issues.
>>
>> The problems (all on Linux, Java 1.6):
>> - PDFBOX 1.8 (svn 1620380)
>>   - first letters of words in headlines are sometimes missing, e.g. on
>>     page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
>>   - bad character spacing because of substituted font
>>
>> - PDFBOX trunk (svn 1620415)
>>   - no missing letters but heavily distorted and displaced
>>     letters in headlines (e.g. page 2)
>>   - compared to 1.8 correct font is used
>>   - picture colors are completely wrong;
>>     logged warning: org.apache.pdfbox.filter.DCTFilter decode
>>                     WARNUNG: Inconsistent metadata read from JPEG stream
>>   - transparent background instead of white
>>
>> - PDFBOX trunk, no-awt svn 1620487
>>   - font rendering ok
>>   - picture/background problems as in trunk
>>
>> Since these are multiples problems on different versions and the PDF 
>> is not freely distributable I did not create a JIRA issue. 
>> Nevertheless it is a widely distributed journal and a good test case 
>> for the rendering quality. At least the JPEG rendering problem of the 
>> current trunk should be solved.
>>
>>
>> Best,
>> Timo
>>
>
>


Re: Rendering problem example

Posted by Tilman Hausherr <TH...@t-online.de>.
We have problems recognizing YCCK / CMYK jpeg files.

I can't find the issue quickly, it was a few weeks ago from a french 
person and about an image about a Porsche event.
Anyway, what worked for me (most of the time) is to use Apache Imaging 
to detect the image type.

I wrote "most of the time" because it is not perfect, although better 
than java.

https://issues.apache.org/jira/browse/IMAGING-136  <https://issues.apache.org/jira/browse/IMAGING-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>

I'm not sure if Apache Imaging is still an active project.

Tilman

Am 26.08.2014 14:34, schrieb Timo Boehme:
> Hi,
>
> checking the rendering capabilities of PDFBOX 1.8 vs. current trunk I 
> came across a journal which showed severe problems in both - but 
> different. Problems of 1.8 are gone, new ones showed up.
>
> While the journal (Chemical&Engineering News, C&EN) does not provide 
> free PDF editions a sample edition can be downloaded via 'View a 
> sample issue' at http://cen.acs.org/static/about/digital.html (or 
> directly via http://www.cendigital.org/cendigital/sample/). I'm 
> referring to volume 92, nr 27 from 2014-07-07 which I downloaded 
> yesterday but the same problems also showed up in other journal issues.
>
> The problems (all on Linux, Java 1.6):
> - PDFBOX 1.8 (svn 1620380)
>   - first letters of words in headlines are sometimes missing, e.g. on
>     page 2 "Getting ..." reads " et ing ...", "Overview" -> " verview"
>   - bad character spacing because of substituted font
>
> - PDFBOX trunk (svn 1620415)
>   - no missing letters but heavily distorted and displaced
>     letters in headlines (e.g. page 2)
>   - compared to 1.8 correct font is used
>   - picture colors are completely wrong;
>     logged warning: org.apache.pdfbox.filter.DCTFilter decode
>                     WARNUNG: Inconsistent metadata read from JPEG stream
>   - transparent background instead of white
>
> - PDFBOX trunk, no-awt svn 1620487
>   - font rendering ok
>   - picture/background problems as in trunk
>
> Since these are multiples problems on different versions and the PDF 
> is not freely distributable I did not create a JIRA issue. 
> Nevertheless it is a widely distributed journal and a good test case 
> for the rendering quality. At least the JPEG rendering problem of the 
> current trunk should be solved.
>
>
> Best,
> Timo
>