You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@jakarta.apache.org by Roland Weber <os...@dubioso.net> on 2007/08/19 13:37:31 UTC

Re: svn commit: r567258

Hi Sebastian,

> The u-umlaut characters were replaced by ?
> 
> [But I don't know exactly how the mangled version was generated.]
> 
> The output is currently generated in iso-8859-1 (or iso-8859-15); the
> input is specified using either an actual u-umlaut, or &#252;

That's a nasty one to track down. Apart from encoding specs in
the style sheet, there's also the encoding in the <?xml?> line
of the source file to consider. The source file specifies
ISO-8859-1. I wonder whether svn might screw up the charset
on co/ci. Isn't there also a tool that does some postprocessing
in order to normalize the XML? If an XML processor generates
UTF instead of the specified ISO-8859-1, and the next processor
expects ISO-* as input, the data could get screwed up. You'd
have to chase all the chain from input to final output.

> I'll see about adding a check - should be easy enough to generate a
> dummy html file from an xml containing some accented characters and
> check that the result is as expected.

That's probably the best approach.

cheers,
  Roland


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org


Re: svn commit: r567258

Posted by Roland Weber <os...@dubioso.net>.
The JDK version used may also have to do with it:
http://issues.apache.org/bugzilla/show_bug.cgi?id=38781

cheers,
  Roland

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@jakarta.apache.org
For additional commands, e-mail: general-help@jakarta.apache.org