You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jspwiki.apache.org by Christophe Dupriez <ch...@poisoncentre.be> on 2008/01/24 16:48:54 UTC

Internationalization of Wiki2PDFServlet.java: GOT IT!

Hi Pål!

The output of tidy already contains question marks in place of M$ characters: http://www.cs.tut.fi/~jkorpela/www/windows-chars.html

I tried to add switches to JTidy:
        Tidy tidy = new Tidy();
        tidy.setXmlOut(true);
        tidy.setRawOut(true);
        tidy.setTidyMark(false);
        tidy.setCharEncoding(3);  -- 3 = UTF-8 in JTidy R7
        Document xmlDocument = tidy.parseDOM(in, null);
But it was not enough. The real solution implies (also?) to set the encoding of JTidy input string to "UTF-8" and NOT to the encoding of the HTTP response (which is here ISO-8859-1). Response encoding seems to be ignored by PDF readers but probably has to be set to "UTF-8" also:
        InputStream in = new ByteArrayInputStream(("<title>" + nameOfPage + "</title>" + htmlOfPage)
            .getBytes("UTF-8"));

Please find herewith the modified source code. I would deeply appreciate that you publish a new JAR as it would permit me to normalize my setting (I currently patch the Jar with the compiled class!)

Have a nice evening!

Christophe Dupriez