You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Finn Petersen (Jira)" <ji...@apache.org> on 2022/01/20 10:20:00 UTC

[jira] [Updated] (PDFBOX-5368) begincodespacerange possibly invalid

     [ https://issues.apache.org/jira/browse/PDFBOX-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Finn Petersen updated PDFBOX-5368:
----------------------------------
    Description: 
Hello,

 

I have a pdf where an error is logged when extracting the text with pdfbox
{code:java}
java -jar app/target/pdfbox-app-3.0.0-SNAPSHOT.jar export:text --input=304169_0070732133582_000___000_30_25532590_PDF_content.pdf 
Jan 20, 2022 11:18:23 AM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap
SEVERE: Could not read ToUnicode CMap in font Tahoma
java.io.IOException: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
        at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:282)
        at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:130)
        at org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:74)
        at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:214)
        at org.apache.pdfbox.pdmodel.font.PDFont.loadUnicodeCmap(PDFont.java:144)
        at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:113)
        at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:70)
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:96)
        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
        at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:362)
        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
        at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:296)
        at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:198)
        at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:61)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76)
Caused by: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
        at org.apache.fontbox.cmap.CodespaceRange.<init>(CodespaceRange.java:50)
        at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:278)
        ... 44 more
{code}
 

Pdf is attached.

 

Best regards,

Finn

  was:
Hello,

 

I have a pdf where an error is logged when extracting the text with pdfbox
{code:java}
java -jar app/target/pdfbox-app-3.0.0-SNAPSHOT.jar export:text --input=304169_0070732133582_000___000_30_25532590_PDF_content.pdf 
Jan 20, 2022 11:18:23 AM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap
SEVERE: Could not read ToUnicode CMap in font Tahoma
java.io.IOException: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
        at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:282)
        at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:130)
        at org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:74)
        at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:214)
        at org.apache.pdfbox.pdmodel.font.PDFont.loadUnicodeCmap(PDFont.java:144)
        at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:113)
        at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:70)
        at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:96)
        at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
        at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
        at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:362)
        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
        at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:296)
        at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:198)
        at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:61)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
        at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76)
Caused by: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
        at org.apache.fontbox.cmap.CodespaceRange.<init>(CodespaceRange.java:50)
        at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:278)
        ... 44 more
{code}
 

Pdf is attached.

 

Best regards,

Finn
{{}}


> begincodespacerange possibly invalid
> ------------------------------------
>
>                 Key: PDFBOX-5368
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5368
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>            Reporter: Finn Petersen
>            Priority: Minor
>         Attachments: 304169_0070732133582_000___000_30_25532590_PDF_content.pdf
>
>
> Hello,
>  
> I have a pdf where an error is logged when extracting the text with pdfbox
> {code:java}
> java -jar app/target/pdfbox-app-3.0.0-SNAPSHOT.jar export:text --input=304169_0070732133582_000___000_30_25532590_PDF_content.pdf 
> Jan 20, 2022 11:18:23 AM org.apache.pdfbox.pdmodel.font.PDFont loadUnicodeCmap
> SEVERE: Could not read ToUnicode CMap in font Tahoma
> java.io.IOException: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
>         at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:282)
>         at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:130)
>         at org.apache.pdfbox.pdmodel.font.CMapManager.parseCMap(CMapManager.java:74)
>         at org.apache.pdfbox.pdmodel.font.PDFont.readCMap(PDFont.java:214)
>         at org.apache.pdfbox.pdmodel.font.PDFont.loadUnicodeCmap(PDFont.java:144)
>         at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:113)
>         at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:70)
>         at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:96)
>         at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:143)
>         at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
>         at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
>         at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamEngine.java:189)
>         at org.apache.pdfbox.contentstream.operator.DrawObject.process(DrawObject.java:78)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:511)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:489)
>         at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:158)
>         at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:153)
>         at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:362)
>         at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:288)
>         at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:235)
>         at org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:296)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:198)
>         at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:61)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>         at picocli.CommandLine.access$1300(CommandLine.java:145)
>         at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
>         at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
>         at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
>         at picocli.CommandLine.execute(CommandLine.java:2078)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76)
> Caused by: java.lang.IllegalArgumentException: The start and the end values must not have different lengths.
>         at org.apache.fontbox.cmap.CodespaceRange.<init>(CodespaceRange.java:50)
>         at org.apache.fontbox.cmap.CMapParser.parseBegincodespacerange(CMapParser.java:278)
>         ... 44 more
> {code}
>  
> Pdf is attached.
>  
> Best regards,
> Finn



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org