You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Manuel Gübeli (JIRA)" <ji...@apache.org> on 2017/02/07 11:21:41 UTC

[jira] [Created] (PDFBOX-3677) NullPointerException in Type1Parser.read

Manuel Gübeli created PDFBOX-3677:
-------------------------------------

             Summary: NullPointerException in Type1Parser.read
                 Key: PDFBOX-3677
                 URL: https://issues.apache.org/jira/browse/PDFBOX-3677
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox
    Affects Versions: 2.0.4, 2.0.3
         Environment: Windows 10, java version "1.8.0_25"
            Reporter: Manuel Gübeli
             Fix For: 2.0.5
         Attachments: StackTrace.txt

Text extraction from certain PDFs is not possible and PDF Box responses with NullPointerException. Text extraction from same PDF with version 1.8.13 is working. 

Originally the issue was discovered while using the newest Apache Tika 1.14 library. I can not down-grade to PDF Box 1.8.13 with Apache Tika 1.14.

Unfortunately I can not provide the PDFs that fail to you. However, I did some testing and found out that “Token token = lexer.nextToken();” return Null.

Feb 07, 2017 12:17:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
java.io.IOException: Found token=null but expected NAME

Caused by: java.io.EOFException
	at org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:302)
	at org.apache.pdfbox.pdfparser.COSParser.checkXRefOffset(COSParser.java:1177)
	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:202)





 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org