You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4cxx-user@logging.apache.org by Marshall Powers <mp...@appsecinc.com> on 2007/06/25 18:20:33 UTC
Problem with iconv charsets...
I'm trying to use APR-1.2.7 in Log4Cxx 0.10 on AIX 5.3. When I run my
program, I get an exception "APR_LOCALE_CHARSET" in the createDefaultEncoder
method. I think this problem is related to the iconv that is installed on
this machine. When I run iconv -l, among the various charsets I see
"ISO8859-1". However, in the source for Log4Cxx and APR, the only string
literals I see are for "ISO-8859-1" (note the extra dash). Is there any
simple way to work around this problem? Is this potentially a portability
issue with APR/log4cxx (that is, if I distribute some app that uses APR, and
my user doesn't have "ISO-8859-1" in their iconv, is my app going to crash?)
Thanks,
Marshall Powers
Re: Problem with iconv charsets...
Posted by Martin Sebor <se...@roguewave.com>.
William A. Rowe, Jr. wrote:
> William A. Rowe, Jr. wrote:
>> Some thoughts;
>>
>> * At run-time this should probably be determined by parsing first the
>> LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
>> envvar if neither LC_ variable is defined. The codepage follows
>> the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
>
> FYI - I pondered LC_COLLATE, but it didn't seem to particularly apply.
>
> The obvious question, if LC_CTYPE specifies a language/no charset, then
> do we drill down to LC_ALL, LANG etc?
The character set of a locale is determined by the LC_CTYPE
category. On POSIX platforms it can be retrieved by passing
the CODESET constant to nl_langinfo().
Martin
Re: Problem with iconv charsets...
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
William A. Rowe, Jr. wrote:
> Some thoughts;
>
> * At run-time this should probably be determined by parsing first the
> LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
> envvar if neither LC_ variable is defined. The codepage follows
> the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
FYI - I pondered LC_COLLATE, but it didn't seem to particularly apply.
The obvious question, if LC_CTYPE specifies a language/no charset, then
do we drill down to LC_ALL, LANG etc?
Re: Problem with iconv charsets...
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Jeff Trawick wrote:
>
> apr_os_default_encoding() needs to return something that can be passed
> to apr_xlate_open() on the current platform in order to translate
> compiled-in strings.
Similarly, it's used to present filesystem names. So, this needs to be
taken a step further, perhaps? Multiple apr_os_*_encoding() results by
their function?
Re: Problem with iconv charsets...
Posted by Jeff Trawick <tr...@gmail.com>.
On 6/26/07, William A. Rowe, Jr. <wr...@rowe-clan.net> wrote:
> Eric Covener wrote:
> > On 6/25/07, William A. Rowe, Jr. <wr...@rowe-clan.net> wrote:
> >> * At run-time this should probably be determined by parsing first the
> >> LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
> >> envvar if neither LC_ variable is defined. The codepage follows
> >> the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
> >
> > Wouldn't runtime checks would mean xlate/xlate.c needs to find a new
> > way to figure out what the codepage of the source code was (to
> > translate compiled-in strings)?
> >
> > Perhaps APR_DEFAULT_CHARSET could be split into two different
> > identifiers APR_CURRENT_CHARSET/APR_BUILD_CHARSET that xlate callers
> > would have to think about.
>
> I'm confused. APR messages are all english (regrettably) in US-ASCII.
Taking that and massaging just a bit: APR strings in the source code
are either in US-ASCII or EBCDIC (simplifying just a bit on the
latter).
apr_os_default_encoding() needs to return something that can be passed
to apr_xlate_open() on the current platform in order to translate
compiled-in strings.
Re: Problem with iconv charsets...
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Eric Covener wrote:
> On 6/25/07, William A. Rowe, Jr. <wr...@rowe-clan.net> wrote:
>> * At run-time this should probably be determined by parsing first the
>> LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
>> envvar if neither LC_ variable is defined. The codepage follows
>> the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
>
> Wouldn't runtime checks would mean xlate/xlate.c needs to find a new
> way to figure out what the codepage of the source code was (to
> translate compiled-in strings)?
>
> Perhaps APR_DEFAULT_CHARSET could be split into two different
> identifiers APR_CURRENT_CHARSET/APR_BUILD_CHARSET that xlate callers
> would have to think about.
I'm confused. APR messages are all english (regrettably) in US-ASCII.
clib-errstring messages should respect LC_CTYPE for most modern, dynamic
c libraries, no?
Re: Problem with iconv charsets...
Posted by Eric Covener <co...@gmail.com>.
On 6/25/07, William A. Rowe, Jr. <wr...@rowe-clan.net> wrote:
> * At run-time this should probably be determined by parsing first the
> LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
> envvar if neither LC_ variable is defined. The codepage follows
> the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
Wouldn't runtime checks would mean xlate/xlate.c needs to find a new
way to figure out what the codepage of the source code was (to
translate compiled-in strings)?
Perhaps APR_DEFAULT_CHARSET could be split into two different
identifiers APR_CURRENT_CHARSET/APR_BUILD_CHARSET that xlate callers
would have to think about.
--
Eric Covener
covener@gmail.com
Re: Problem with iconv charsets...
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Marshall Powers wrote:
> The string literal "ISO-8859-1" appears in APR and log4cxx source code. For
> example, from apr-1.2.7/misc/unix/charset.c:
>
> APR_DECLARE(const char*) apr_os_default_encoding (apr_pool_t *pool)
> {
> #ifdef __MVS__
> # ifdef __CODESET__
> return __CODESET__;
> # else
> return "IBM-1047";
> # endif
> #endif
>
> if ('}' == 0xD0) {
> return "IBM-1047";
> }
>
> if ('{' == 0xFB) {
> return "EDF04";
> }
>
> if ('A' == 0xC1) {
> return "EBCDIC"; /* not useful */
> }
>
> if ('A' == 0x41) {
> return "ISO-8859-1"; /* not necessarily true */
> }
>
> Are these files generated by configure scripts/ant build files? It doesn't
> seem like they are...
Nope. That is raw, native hackery in an effort not to think through the
problem set. As with all APR code, patches are welcome.
Some thoughts;
* At run-time this should probably be determined by parsing first the
LC_CTYPE, or LC_ALL in it's absense, or the fallback to the LANG
envvar if neither LC_ variable is defined. The codepage follows
the period, e.g. LANG=en_US.UTF-8 would be parsed as 'UTF-8'.
* It's reasonably trivial, if iconv is present, to validate the -fallback-
charset name against iconv within autoconf, presuming this even should
be ISO-8859-1
Comments?
RE: Problem with iconv charsets...
Posted by Marshall Powers <mp...@appsecinc.com>.
The string literal "ISO-8859-1" appears in APR and log4cxx source code. For
example, from apr-1.2.7/misc/unix/charset.c:
APR_DECLARE(const char*) apr_os_default_encoding (apr_pool_t *pool)
{
#ifdef __MVS__
# ifdef __CODESET__
return __CODESET__;
# else
return "IBM-1047";
# endif
#endif
if ('}' == 0xD0) {
return "IBM-1047";
}
if ('{' == 0xFB) {
return "EDF04";
}
if ('A' == 0xC1) {
return "EBCDIC"; /* not useful */
}
if ('A' == 0x41) {
return "ISO-8859-1"; /* not necessarily true */
}
return "unknown";
}
Also, in log4cxx/src/charsetencoder.cpp:
CharsetEncoderPtr CharsetEncoder::getEncoder(const std::string& charset) {
if (StringHelper::equalsIgnoreCase(charset, "US-ASCII", "us-ascii") ||
StringHelper::equalsIgnoreCase(charset, "ISO646-US", "iso646-US") ||
StringHelper::equalsIgnoreCase(charset, "ANSI_X3.4-1968",
"ansi_x3.4-1968")) {
return new USASCIICharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "ISO-8859-1",
"iso-8859-1") ||
StringHelper::equalsIgnoreCase(charset, "ISO-LATIN-1",
"iso-latin-1")) {
return new ISOLatinCharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-8", "utf-8")) {
return new UTF8CharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-16BE",
"utf-16be")
|| StringHelper::equalsIgnoreCase(charset, "UTF-16", "utf-16")) {
return new UTF16BECharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-16LE",
"utf-16le")) {
return new UTF16LECharsetEncoder();
}
#if defined(_WIN32)
throw IllegalArgumentException(charset);
#else
return new APRCharsetEncoder(charset.c_str());
#endif
}
Are these files generated by configure scripts/ant build files? It doesn't
seem like they are...
-----Original Message-----
From: dev-return-18563-mpowers=appsecinc.com@apr.apache.org
[mailto:dev-return-18563-mpowers=appsecinc.com@apr.apache.org] On Behalf Of
William A. Rowe, Jr.
Sent: 2007-Jun-25 Mon 12:33 PM
To: Marshall Powers
Cc: dev@apr.apache.org; 'Log4CXX User'
Subject: Re: Problem with iconv charsets...
Marshall Powers wrote:
> I'm trying to use APR-1.2.7 in Log4Cxx 0.10 on AIX 5.3. When I run my
> program, I get an exception "APR_LOCALE_CHARSET" in the
createDefaultEncoder
> method. I think this problem is related to the iconv that is installed on
> this machine. When I run iconv -l, among the various charsets I see
> "ISO8859-1". However, in the source for Log4Cxx and APR, the only string
> literals I see are for "ISO-8859-1" (note the extra dash). Is there any
> simple way to work around this problem? Is this potentially a portability
> issue with APR/log4cxx (that is, if I distribute some app that uses APR,
and
> my user doesn't have "ISO-8859-1" in their iconv, is my app going to
crash?)
Unfortunately, aliases are within the domain of iconv.
The question is, where did it pull out "ISO-8859-1" from as the default
locale on your box? If *that* is from apr-util, we need to unwind where
it was resolved. If that was an envvar set on your login, well, that would
be called shooting oneself in ones foot.
RE: Problem with iconv charsets...
Posted by Marshall Powers <mp...@appsecinc.com>.
The string literal "ISO-8859-1" appears in APR and log4cxx source code. For
example, from apr-1.2.7/misc/unix/charset.c:
APR_DECLARE(const char*) apr_os_default_encoding (apr_pool_t *pool)
{
#ifdef __MVS__
# ifdef __CODESET__
return __CODESET__;
# else
return "IBM-1047";
# endif
#endif
if ('}' == 0xD0) {
return "IBM-1047";
}
if ('{' == 0xFB) {
return "EDF04";
}
if ('A' == 0xC1) {
return "EBCDIC"; /* not useful */
}
if ('A' == 0x41) {
return "ISO-8859-1"; /* not necessarily true */
}
return "unknown";
}
Also, in log4cxx/src/charsetencoder.cpp:
CharsetEncoderPtr CharsetEncoder::getEncoder(const std::string& charset) {
if (StringHelper::equalsIgnoreCase(charset, "US-ASCII", "us-ascii") ||
StringHelper::equalsIgnoreCase(charset, "ISO646-US", "iso646-US") ||
StringHelper::equalsIgnoreCase(charset, "ANSI_X3.4-1968",
"ansi_x3.4-1968")) {
return new USASCIICharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "ISO-8859-1",
"iso-8859-1") ||
StringHelper::equalsIgnoreCase(charset, "ISO-LATIN-1",
"iso-latin-1")) {
return new ISOLatinCharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-8", "utf-8")) {
return new UTF8CharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-16BE",
"utf-16be")
|| StringHelper::equalsIgnoreCase(charset, "UTF-16", "utf-16")) {
return new UTF16BECharsetEncoder();
} else if (StringHelper::equalsIgnoreCase(charset, "UTF-16LE",
"utf-16le")) {
return new UTF16LECharsetEncoder();
}
#if defined(_WIN32)
throw IllegalArgumentException(charset);
#else
return new APRCharsetEncoder(charset.c_str());
#endif
}
Are these files generated by configure scripts/ant build files? It doesn't
seem like they are...
-----Original Message-----
From: dev-return-18563-mpowers=appsecinc.com@apr.apache.org
[mailto:dev-return-18563-mpowers=appsecinc.com@apr.apache.org] On Behalf Of
William A. Rowe, Jr.
Sent: 2007-Jun-25 Mon 12:33 PM
To: Marshall Powers
Cc: dev@apr.apache.org; 'Log4CXX User'
Subject: Re: Problem with iconv charsets...
Marshall Powers wrote:
> I'm trying to use APR-1.2.7 in Log4Cxx 0.10 on AIX 5.3. When I run my
> program, I get an exception "APR_LOCALE_CHARSET" in the
createDefaultEncoder
> method. I think this problem is related to the iconv that is installed on
> this machine. When I run iconv -l, among the various charsets I see
> "ISO8859-1". However, in the source for Log4Cxx and APR, the only string
> literals I see are for "ISO-8859-1" (note the extra dash). Is there any
> simple way to work around this problem? Is this potentially a portability
> issue with APR/log4cxx (that is, if I distribute some app that uses APR,
and
> my user doesn't have "ISO-8859-1" in their iconv, is my app going to
crash?)
Unfortunately, aliases are within the domain of iconv.
The question is, where did it pull out "ISO-8859-1" from as the default
locale on your box? If *that* is from apr-util, we need to unwind where
it was resolved. If that was an envvar set on your login, well, that would
be called shooting oneself in ones foot.
Re: Problem with iconv charsets...
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Marshall Powers wrote:
> I'm trying to use APR-1.2.7 in Log4Cxx 0.10 on AIX 5.3. When I run my
> program, I get an exception "APR_LOCALE_CHARSET" in the createDefaultEncoder
> method. I think this problem is related to the iconv that is installed on
> this machine. When I run iconv -l, among the various charsets I see
> "ISO8859-1". However, in the source for Log4Cxx and APR, the only string
> literals I see are for "ISO-8859-1" (note the extra dash). Is there any
> simple way to work around this problem? Is this potentially a portability
> issue with APR/log4cxx (that is, if I distribute some app that uses APR, and
> my user doesn't have "ISO-8859-1" in their iconv, is my app going to crash?)
Unfortunately, aliases are within the domain of iconv.
The question is, where did it pull out "ISO-8859-1" from as the default
locale on your box? If *that* is from apr-util, we need to unwind where
it was resolved. If that was an envvar set on your login, well, that would
be called shooting oneself in ones foot.