You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by David Linsin <D....@cluetec.de> on 2004/10/01 14:58:55 UTC

[Betwixt] UTF-8 / UTF-16

Hello,

I'd like to know how Betwixt handles UTF-16 character encoding. The Java's
XMLEncoder only handles UTF-8 character encoding. None UTF-8 characters are
represented by a platform dependent value. I'd like to know how Betwixt
handles this.

Thank you for your help.

----------------------
David Linsin


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [Betwixt] UTF-8 / UTF-16

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
hi david

ah, the vexed issue of platform dependent encodings :)

betwixt doesn't address this issue directly but AFAIK it shouldn't 
really need to. betwixt deals only with java strings (which are 
unicode) and leaves all matters of encoding to the output streams. the 
output encoding is limited only by the range of writers available. you 
should be able to output UTF-16 from betwixt (providing that your java 
platform supports it) by configuring the writer appropriately before 
it's passed to the BeanWriter.

at the risk of being pedantic, AUIU your statement about UTF-8 is not 
strictly correct: both UTF-16 and UTF-8 are encodings for UNICODE (and 
therefore any character expressible in UTF-16 is also expressible in 
UTF-8). i suspect the problem is with the fact that java's default 
encoding is not UTF-8 and is platform dependent. therefore, unless care 
is taken to explicitly specify the appropriate encoding, the output 
will contain platform dependent encodings for some characters 
(typically the non-latin ones).

- robert

On 1 Oct 2004, at 13:58, David Linsin wrote:

> Hello,
>
> I'd like to know how Betwixt handles UTF-16 character encoding. The 
> Java's
> XMLEncoder only handles UTF-8 character encoding. None UTF-8 
> characters are
> represented by a platform dependent value. I'd like to know how Betwixt
> handles this.
>
> Thank you for your help.
>
> ----------------------
> David Linsin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org