You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2018/01/20 16:45:00 UTC

[jira] [Created] (PDFBOX-4076) PDFBox cannot properly handle PDF Name objects containing bytes with values outside the US_ASCII range

Tilman Hausherr created PDFBOX-4076:
---------------------------------------

             Summary: PDFBox cannot properly handle PDF Name objects containing bytes with values outside the US_ASCII range
                 Key: PDFBOX-4076
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4076
             Project: PDFBox
          Issue Type: Bug
            Reporter: Tilman Hausherr


As reported by ~mkl in SO answer

{quote}The first error in PDF Name handling is that PDFBox internally represents them as strings after a mixed UTF-8 / CP-1252 decoding strategy. This is wrong, according to the PDF specification a name object is an atomic symbol uniquely defined by a sequence of any characters (8-bit values) except null (character code 0).

(...)

The second error is, though, that while serializing the PDF it only properly encodes the characters in the strings representing names which are from US_ASCII, all else are replaced by '?'{quote}

sample code

{code:java}
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
document.getDocumentCatalog().getCOSObject().setString(COSName.getPDFName("äöüß"), "äöüß");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos);
document.close();
document = PDDocument.load(baos.toByteArray());
System.out.println(document.getDocumentCatalog().getCOSObject().keySet());
document.close();
{code}
output:


{noformat}
[COSName{Type}, COSName{Version}, COSName{Pages}, COSName{????}]
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org