You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (Jira)" <ji...@apache.org> on 2020/12/04 11:50:00 UTC

[jira] [Commented] (PDFBOX-5033) CFF FontParser exits with illegal offset in font

    [ https://issues.apache.org/jira/browse/PDFBOX-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243958#comment-17243958 ] 

Maruan Sahyoun commented on PDFBOX-5033:
----------------------------------------

The fop sample document you're referring to has been generated before or after the fix was done in fop? FuturaStd-Book_full.pdf works fine.

> CFF FontParser exits with illegal offset in font
> ------------------------------------------------
>
>                 Key: PDFBOX-5033
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5033
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.20, 2.0.21
>            Reporter: Marius Heinzmann
>            Priority: Major
>
> Dear Devs,
> we've encountered an issue with version 2.0.20 and 2.0.21 of PDFbox when trying to parse a PDF for text extraction that seem to have existed before seeĀ FOP-2751.
> I reproduced this issue with the pdfbox-app and the FuturaStd-Book.pdf of FOP-2751:
> {noformat:title=Console output}
> java -jar pdfbox-app-2.0.21.jar ExtractText FuturaStd-Book.pdf 
> Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
> SCHWERWIEGEND: Can't read the embedded Type1C font FuturaStd-Book
> java.io.IOException: illegal offset value 2949166 in CFF font
>         at org.apache.fontbox.cff.CFFParser.readIndexDataOffsets(CFFParser.java:192)
>         at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:201)
>         at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:484)
>         at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:122)
>         at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:75)
>         at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:102)
>         at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:74)
>         at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
>         at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
>         at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
>         at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:397)
>         at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:325)
>         at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
>         at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
>         at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
> Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
> WARNUNG: New fonts found, font cache will be re-built
> Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
> WARNUNG: Building on-disk font cache, this may take a while
> Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
> WARNUNG: Finished building on-disk font cache, found 550 fonts
> Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
> WARNUNG: Using fallback font Courier for FuturaStd-Book
> {noformat}
> Other examples fonts causing this issue are:
>  * Can't read the embedded Type1C font COGXUZ+MetaPlusNormal-Caps
>  * Can't read the embedded Type1C font DJTRFS+MetaPlusBold-CapsItalic
>  * Can't read the embedded Type1C font EAFTRP+MetaPlusNormal-Caps
>  * Can't read the embedded Type1C font GQHJVM+MetaPlusNormal-CapsItalic
>  * Can't read the embedded Type1C font GUEVYR+MetaPlusBold-CapsItalic
>  * Can't read the embedded Type1C font HYTBMP+MetaPlusNormal-CapsItalic
>  * Can't read the embedded Type1C font IJCQXI+MetaPlusMedium-Caps
>  * Can't read the embedded Type1C font JRIYJF+MetaPlusNormal-Caps
>  * Can't read the embedded Type1C font JSQSJF+NeuzeitGro-Reg
>  * Can't read the embedded Type1C font KUZTXD+MetaPlusBook-Roman
>  * Can't read the embedded Type1C font LWIPLB+1496148105355.00001Arial.000-1
>  * Can't read the embedded Type1C font MCDJBA+MetaSerif-BoldIta
>  * Can't read the embedded Type1C font UNLUJK+Barmeno-Medium
> I couldn't find another issue about this. Is this already known?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org