You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Marius Heinzmann (Jira)" <ji...@apache.org> on 2020/12/04 09:47:00 UTC
[jira] [Created] (PDFBOX-5033) CFF FontParser exits with illegal
offset in font
Marius Heinzmann created PDFBOX-5033:
----------------------------------------
Summary: CFF FontParser exits with illegal offset in font
Key: PDFBOX-5033
URL: https://issues.apache.org/jira/browse/PDFBOX-5033
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 2.0.21, 2.0.20
Reporter: Marius Heinzmann
Dear Devs,
we've encountered an issue with version 2.0.20 and 2.0.21 of PDFbox when trying to parse a PDF for text extraction that seem to have existed before seeĀ FOP-2751.
I reproduced this issue with the pdfbox-app and the [^FuturaStd-Book.pdf]:
{noformat:title=Console output}
java -jar pdfbox-app-2.0.21.jar ExtractText FuturaStd-Book.pdf
Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
SCHWERWIEGEND: Can't read the embedded Type1C font FuturaStd-Book
java.io.IOException: illegal offset value 2949166 in CFF font
at org.apache.fontbox.cff.CFFParser.readIndexDataOffsets(CFFParser.java:192)
at org.apache.fontbox.cff.CFFParser.readIndexData(CFFParser.java:201)
at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:484)
at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:122)
at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:75)
at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:102)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:74)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:515)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:144)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:397)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:325)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadDiskCache
WARNUNG: New fonts found, font cache will be re-built
Dez 04, 2020 11:06:00 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNUNG: Building on-disk font cache, this may take a while
Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init>
WARNUNG: Finished building on-disk font cache, found 550 fonts
Dez 04, 2020 11:06:02 AM org.apache.pdfbox.pdmodel.font.PDType1CFont <init>
WARNUNG: Using fallback font Courier for FuturaStd-Book
{noformat}
Other examples are:
* Can't read the embedded Type1C font COGXUZ+MetaPlusNormal-Caps
* Can't read the embedded Type1C font DJTRFS+MetaPlusBold-CapsItalic
* Can't read the embedded Type1C font EAFTRP+MetaPlusNormal-Caps
* Can't read the embedded Type1C font GQHJVM+MetaPlusNormal-CapsItalic
* Can't read the embedded Type1C font GUEVYR+MetaPlusBold-CapsItalic
* Can't read the embedded Type1C font HYTBMP+MetaPlusNormal-CapsItalic
* Can't read the embedded Type1C font IJCQXI+MetaPlusMedium-Caps
* Can't read the embedded Type1C font JRIYJF+MetaPlusNormal-Caps
* Can't read the embedded Type1C font JSQSJF+NeuzeitGro-Reg
* Can't read the embedded Type1C font KUZTXD+MetaPlusBook-Roman
* Can't read the embedded Type1C font LWIPLB+1496148105355.00001Arial.000-1
* Can't read the embedded Type1C font MCDJBA+MetaSerif-BoldIta
* Can't read the embedded Type1C font UNLUJK+Barmeno-Medium
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org