You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/04/08 17:28:00 UTC

[jira] [Resolved] (PDFBOX-5156) Error in identification of PDF comment symbol % as a token separator with PDF names

     [ https://issues.apache.org/jira/browse/PDFBOX-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr resolved PDFBOX-5156.
-------------------------------------
    Fix Version/s: 3.0.0 PDFBox
         Assignee: Tilman Hausherr
       Resolution: Fixed

Thanks [~pwyatt]!

> Error in identification of PDF comment symbol % as a token separator with PDF names
> -----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5156
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5156
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Peter Wyatt
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>
> The DARPA-funded SafeDocs research program has developed a Compacted PDF Syntax text case to stress-test PDF lexical analyzers/parsers. See [https://github.com/pdf-association/safedocs/tree/main/CompactedSyntax]. The output of this test PDF was examined in detail using the PDFBOX debugger "view internal structure" feature for both the body and content stream and this is the only error... so well done! 
> PDFBOX 3.0.0-RC1 was tested using this highly targeted test PDF and there is an error in the lexical analysis (token separators) between PDF name objects and PDF comments. As specified in ISO 32000-2:
>  * clause 7.2.3: "The delimiter characters (, ), <, >, [, ], /, and % are special (LEFT PARENTHESIS (28h), RIGHT PARENTHESIS (29h), LESS-THAN SIGN (3Ch), GREATER-THAN SIGN (3Eh), LEFT SQUARE BRACKET (5Bh), RIGHT SQUARE BRACKET (5Dh), SOLIDUS (2Fh) and PERCENT SIGN (25h), respectively). They delimit syntactic entities such as arrays, names, and comments. ... Any of these delimiters terminates the entity preceding it and is not included in the entity."
>  * clause 7.2.4 "Any occurrence of the PERCENT SIGN (25h) outside a string or inside a content stream (see 7.8.2, "Content streams") introduces a comment."
> Offset 3561 (as reported in the output below) is in the middle of this fragment of PDF: {{<</Root 1 0 R/Info%comment after name}}
> Note also that other/earlier versions of PDFBOX were not tested.
> {{java -jar pdfbox-app-3.0.0-RC1.jar debug safedocs\CompactedSyntax\CompactedPDFSyntaxTest.pdf}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser parseDirObject}}
> {{WARNING: Skipped unexpected dir object = 'after' at offset 3561}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSDictionaryNameValuePair}}
> {{WARNING: Bad dictionary declaration at offset 3562}}
> {{Apr. 08, 2021 9:41:24 AM org.apache.pdfbox.pdfparser.BaseParser parseCOSDictionary}}
> {{WARNING: Invalid dictionary, found: 'n' but expected: '/' at offset 3562}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org