You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Peter Johnson (JIRA)" <ji...@apache.org> on 2018/11/02 17:12:00 UTC

[jira] [Created] (PDFBOX-4367) Error expected floating point number actual='18-5'

Peter Johnson created PDFBOX-4367:
-------------------------------------

             Summary: Error expected floating point number actual='18-5'
                 Key: PDFBOX-4367
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4367
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.12
         Environment: Mac OS X Sierra
            Reporter: Peter Johnson


Able to repeat with command line.  Unfortunately, the only files that repeat this are from a customer, and contain sensitive information.  The file opens without error in Acrobat Reader and Mac Preview.  The desired result is that any corrupt portions of the PDF are skipped, so that we can use what text is extractable.

Unfortunately, I still get an error when using the -force option.

We get the following stack trace:
{code:java}
C02V390UHTD6:Downloads pjohnson$ java -jar pdfbox-app-2.0.12.jar ExtractText 16cccd9af5032a303774f7b87fb95076.pdf
Nov 02, 2018 10:04:54 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSArray
WARNING: Corrupt object reference at offset 19727
Exception in thread "main" java.io.IOException: Error expected floating point number actual='18-5'
at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:78)
at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:110)
at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:947)
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:631)
at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:174)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:510)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:477)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:237)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:82)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60)
Caused by: java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:494)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:806)
at org.apache.pdfbox.cos.COSFloat.<init>(COSFloat.java:59)
... 14 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org