You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Andy Heninger <an...@jtcsv.com> on 2001/02/25 00:43:00 UTC
Serialization (was Re: XMLPlatform Utils)

Hi Dean,

  The only approach that I've been able to think of for getting halfway
efficient transcoding to the target encoding on serialization is to
optimize for the case where no NCRs are required, and let the other cases
pay a nasty cost.  People (servers) needing good throughput would either
need to know that the range of characters appearing in their messages
worked with their chosen encoding, or would need to use a Unicode encoding
(utf-8 or 16), but these are perfectly reasonable restrictions.

  So we could build up a decent sized bufferload of xml in utf-16 format,
as if everything was going to work, transcode it all at once, and if it
goes OK, fine.  If not, we'll have to back up and convert it a character
at a time, because not all platforms (Windows!) will tell you where the
conversion failed.  And, associated with the buffer, we'll have to keep
track of ranges of locations that allow NCRs.  All of the different
platform transcoding libraries will transcode a block - they just can't
all report detailed error information.

Andy Heninger
IBM XML Technology Group, Cupertino, CA
heninger@us.ibm.com


----- Original Message -----
From: "Dean Roddey" <dr...@charmedquark.com>
To: <xe...@xml.apache.org>
Sent: Friday, February 23, 2001 5:47 PM
Subject: Re: XMLPlatform Utils


> When I did the original code for this, I really couldn't see any faster
way,
> other than to build support for this into the transcoding systems at a
> fundamental level, so that they will do it as they are going. But that
has
> about zero likelihood of happening, since for us it would mean ICU,
Win32,
> and all of the Unix transcoders implementing it. Without that, all you
can
> really do is to transcode until you can't represent something. Any
attempt
> to get around this would probably start down the road of creating a
> mini-ICU, which would be an awful waste of effort given that there are
> already transcoding systems we are using.
>
> Maybe I'm missing something, but I thought about it a good bit at the
time,
> and didn't really see any fast way that didn't either involve a lot of
> replication of what the transcoders do, or getting the transcoders to do
> this extra stuff.
>