You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Kévin Sailly <ke...@gmail.com> on 2012/03/10 09:21:09 UTC

Encoding troubles

Hello,

I am building a webapp that receiving text from rich text (tinyMCE)
componnent, planning to build a pdf from this text (hard copy-paste from
Word doc is allow) using PdfBox.

As my server is runing on Debian, every thing is encoded/decoded in UTF-8
(server, jvm and database storing the text), nevertheless (as you see my
poor english), i am french, and then text too. At end, I would like to
proceed any language.

First unescaping xml rich text with StringEscapeUtils from apache, the
result is that characters like rightquote cannot be correctly rendered on
PDFs.

Re-encoding text in UTF-16 was appearing for me to be a solution but i am
facing this bug :
https://issues.apache.org/jira/browse/PDFBOX-1242

As nobody seems to correct this bug, is any work around i could use?

Thanks,
Kévin