You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Steven Green <St...@surfcast.com> on 2001/09/03 14:07:08 UTC

Transcoder Default Value with Unrep_Replace

I am using XMLFormatter to convert from Unicode to other character sets such
as iso8859-1, Win1252 and UTF8.  I use the flag UnRep_Replace so that
illegal characters are replaced by a default character.

However XML88591Transcoder and XMLASCIITranscoder replaces unknown
characters with 0x1a (EOF) whereas XML256TableTranscoder replaces them with
0x3f ('?') and XMLUTF8Transcoder replaces with chSpace.  This seems
inconsistent... ideally there should be a way to specify the character used
or at least a way to query a transcoder to find its default value.
Currently the values are mostly hard-coded in each transcoder with code like
*outPtr++ = 0x3f; and there is no way to find out what they are without
looking at the source code.

Alternatively... another flag to just ignore illegal characters could be
useful.  Also it would be good to add some more doxygen comments to the
XMLFormatter class to describe what the different values for UnRepFlags and
EscapeFlags really mean.

- Steven Green
Surfcast Aps.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org