You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2016/10/01 17:57:20 UTC
[jira] [Comment Edited] (PDFBOX-3519) COSName is not ascii
[ https://issues.apache.org/jira/browse/PDFBOX-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15538916#comment-15538916 ]
John Hewson edited comment on PDFBOX-3519 at 10/1/16 5:56 PM:
--------------------------------------------------------------
Ok, so:
{{#82l#82r#96#BE#92#A9}}
Consists of the following tokens:
{{#82}} {{l}} {{#82}} {{r}} {{#96}} {{#BE}} {{#92}} {{#A9}}
The #xx are UTF-8 escapes, however, none of the byte sequences are valid UTF-8 byte sequences. So this name is invalid.
I've noticed that CJK PDFs, such as the one here, sometimes contain invalid escape codes which are not UTF-8. I assume in this case the codes actually correspond to "90ms-RKSJ-H", which uses the "Adobe-Japan1" encoding (aka "Shift-JIS").
was (Author: jahewson):
Ok, so:
{{#82l#82r#96#BE#92#A9}}
Consists of the following tokens:
{{#82}} {{l}} {{#82}} {{r}} {{#96}} {{#BE}} {{#92}} {{#A9}}
The #xx are UTF-8 escapes, however, none of the byte sequences are valid UTF-8 byte sequences. So this name is invalid.
I've noticed that CJK PDFs, such as the one here, sometimes contain invalid escape codes which are not UTF-8 - I assume that they actually correspond to "90ms-RKSJ-H", which uses the "Adobe-Japan1" encoding (aka "Shift-JIS").
> COSName is not ascii
> --------------------
>
> Key: PDFBOX-3519
> URL: https://issues.apache.org/jira/browse/PDFBOX-3519
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.3
> Reporter: simon steiner
>
> Trunk seems ok
> PDF is from PDFBOX-783
> {code}
> public static void main( String[] args ) throws IOException {
> PDDocument doc = PDDocument.load(new File("A02Gj780LZ.pdf"));
> COSDictionary x = doc.getPage(0).getResources().getCOSObject();
> read(x);
> doc.close();
> }
> private static void read(COSBase b) {
> if (b instanceof COSObject) {
> read(((COSObject) b).getObject());
> } else if (b instanceof COSDictionary) {
> for (COSBase x : ((COSDictionary) b).getValues()) {
> read(x);
> }
> } else if (b instanceof COSName) {
> if(((COSName) b).getName().charAt(0) > 256)
> throw new RuntimeException(((COSName) b).getName());
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org