You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/02/07 17:17:41 UTC

[jira] [Comment Edited] (PDFBOX-3677) NullPointerException in Type1Parser.read

    [ https://issues.apache.org/jira/browse/PDFBOX-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856351#comment-15856351 ] 

Tilman Hausherr edited comment on PDFBOX-3677 at 2/7/17 5:17 PM:
-----------------------------------------------------------------

[~guebeli] please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.5-SNAPSHOT/

The change will avoid the NPE but the file will still bring trouble. You could extract the (type 1) font with the PDFDebugger command line application. Just go to the resources, the font, then click on each font until it fails. Then search that part of the tree for "FontDescriptor" and then "FontFile", there's the font file. Right-click to save. I suspect that the file is too short.

Please give feedback what happens now.


was (Author: tilman):
[~guebeli] please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.5-SNAPSHOT/

The change will avoid the NPE but the file will still bring trouble. You could extract the (type 1) font with the PDFDebugger command line application. Just go to the resources, the font, then click on each font until it fails. Then search that part of the tree for "FontDescriptor" and then "FontFile", there's the font file. Right-click to save. I suspect that the file is too short.

> NullPointerException in Type1Parser.read
> ----------------------------------------
>
>                 Key: PDFBOX-3677
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3677
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.3, 2.0.4
>         Environment: Windows 10, java version "1.8.0_25"
>            Reporter: Manuel Gübeli
>             Fix For: 2.0.5
>
>         Attachments: StackTrace.txt
>
>
> Text extraction from certain PDFs is not possible and PDF Box responses with NullPointerException. Text extraction from same PDF with version 1.8.13 is working. 
> Originally the issue was discovered while using the newest Apache Tika 1.14 library. I can not down-grade to PDF Box 1.8.13 with Apache Tika 1.14.
> Unfortunately I can not provide the PDFs that fail to you. However, I did some testing and found out that “Token token = lexer.nextToken();” return Null.
> Feb 07, 2017 12:17:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
> SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
> java.io.IOException: Found token=null but expected NAME
> Caused by: java.io.EOFException
> 	at org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:302)
> 	at org.apache.pdfbox.pdfparser.COSParser.checkXRefOffset(COSParser.java:1177)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:202)
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org