You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-users@xalan.apache.org by th...@ascentialsoftware.com on 2002/06/12 01:16:10 UTC

Supported encoding

I am trying to get the exact list of the encoding supported by Xalan
(without the ICU library). I did not find any obvious answer in the
documentation or FAQ. Any help will be appreciated.
Here is what I found so far, and the questions that I have related to that.
1) In h:/xml-xalan/c/src/PlatformSupport/XalanTranscodingServices.cpp, the
initMaximumCharacterValueMap function seems to define the following list of
encoding:
WINDOWS-1250, UTF-[8, 16], US-ASCII, ISO-8859-[1-9], ISO-2022-JP, SHIFT_JIS,
EUC-JP, GB2312, BIG5, EUC-KR, ISO-2022-KR, KOI8-R, EBCDIC-CP-[US, CA, NL,
DK, NO, FI, SE, IT, ES, GB, FR, AR1, HE, CH, ROECE, YU, IS, AR2]
Does it means that all those encoding are supported by Xalan (I can process
an XML document with the encoding attribute of the xml top level processing
instruction set to any of those values)?
2) I also found the following sentence in Xerces FAQ: 
Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small Endian),
UCS4 (Big/Small Endian), EBCDIC code pages IBM037 and IBM1140 encoding,
ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can parse input
XML files in these above mentioned encoding.
How does this relate to the Xalan encoding support?
3) Is the list of supported encoding is the same for the XML document being
transformed and the style sheet itself?
Thanks in advance for the help.
Thomas