You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Marius Heinzmann (Jira)" <ji...@apache.org> on 2020/12/04 09:47:00 UTC

[jira] [Created] (PDFBOX-5033) CFF FontParser exits with illegal offset in font

Marius Heinzmann created PDFBOX-5033:
----------------------------------------

             Summary: CFF FontParser exits with illegal offset in font
                 Key: PDFBOX-5033
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5033
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.21, 2.0.20
            Reporter: Marius Heinzmann


Dear Devs,

we've encountered an issue with version 2.0.20 and 2.0.21 of PDFbox when trying to parse a PDF for text extraction that seem to have existed before seeĀ FOP-2751.

I reproduced this issue with the pdfbox-app and the [^FuturaStd-Book.pdf]:
{noformat:title=Console output}
java -jar pdfbox-app-2.0.21.jar ExtractText FuturaStd-Book.pdf 
Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
SCHWERWIEGEND: Can't read the embedded Type1C font FuturaStd-Book
java.io.IOException: illegal offset value 2949166 in CFF font
        at org.apache.fontbox.cff.CFFParser.readIndexDataOffsets(CFFParser.java:192)
        at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:201)
        at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:484)
        at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:122)
        at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:75)
        at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:102)
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:74)
        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
        at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:397)
        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:325)
        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
        at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
        at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
        at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)

Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
WARNUNG: New fonts found, font cache will be re-built
Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNUNG: Building on-disk font cache, this may take a while
Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNUNG: Finished building on-disk font cache, found 550 fonts
Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
WARNUNG: Using fallback font Courier for FuturaStd-Book
{noformat}

Other examples are:
 * Can't read the embedded Type1C font COGXUZ+MetaPlusNormal-Caps
 * Can't read the embedded Type1C font DJTRFS+MetaPlusBold-CapsItalic
 * Can't read the embedded Type1C font EAFTRP+MetaPlusNormal-Caps
 * Can't read the embedded Type1C font GQHJVM+MetaPlusNormal-CapsItalic
 * Can't read the embedded Type1C font GUEVYR+MetaPlusBold-CapsItalic
 * Can't read the embedded Type1C font HYTBMP+MetaPlusNormal-CapsItalic
 * Can't read the embedded Type1C font IJCQXI+MetaPlusMedium-Caps
 * Can't read the embedded Type1C font JRIYJF+MetaPlusNormal-Caps
 * Can't read the embedded Type1C font JSQSJF+NeuzeitGro-Reg
 * Can't read the embedded Type1C font KUZTXD+MetaPlusBook-Roman
 * Can't read the embedded Type1C font LWIPLB+1496148105355.00001Arial.000-1
 * Can't read the embedded Type1C font MCDJBA+MetaSerif-BoldIta
 * Can't read the embedded Type1C font UNLUJK+Barmeno-Medium



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org