You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4cxx-user@logging.apache.org by Curt Arnold <ca...@houston.rr.com> on 2004/07/07 08:30:05 UTC
Unicode issues (was Re: Questions about include/log4cxx/helpers/tchar.h)
Another scenario that seems troubling is when a filename is specified
in an XML configuration file that contains a character not
representable in the current locale code page when using a non-Unicode
version of log4cxx. I assume that the filename would be mangled.
I don't think using UTF-8 internally would be practical since that
would probably require mucking with the locale settings (on at least
some platforms) and a logging library should not be doing that. My
current take is log4cxx should expose logging methods for char*,
wchar_t*, std::string& and std::wstring& and the "standard" build
should use wchar_t internally (or possibly unsigned short). Adding
logging methods for unsigned short* and std::basic_string<unsigned
short> might also be desirable.
Macros could be used to configure a build that uses char internals, but
that would not be the product build and would be expected to behave
undesirably when it encounters characters that cannot be represented in
the current code page.
http://xml.apache.org/xerces-c/build-misc.html#XMLChInfo has some
details on why they do not use wchar_t internally but use unsigned
shorts containing UTF-16 encoding. It greatly simplifies the fairly
significant amount of character checking needed in XML if all internal
representations are UTF-16 on all platforms. However, I think
compatibility with the wchar_t RTL functions would be significantly
more important for log4cxx and the need for checking specific code
points would be nearly nonexistent. The key criteria would be that the
internal character type be able to represent all the potentially
encountered characters. Exposing debug(unsigned short*), etc, could be
useful when applications that use Xerces-C have need to log content in
XMLCh*.