You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Rudolfs Mazurs <ru...@gmail.com> on 2019/01/11 10:23:32 UTC
What is a correct way to set a locale for xerces?
Hi,
I have a service that is using xerces-c and has to be run stared under C
locale for LANG and LC_*. I need xerces to be able to parse xml with UTF-8
characters, so I used this workaround:
setlocale(LC_CTYPE,"en_US.UTF-8");
XMLPlatformUtils::Initialize();
And while it seems to work, I noticed that Initialize constructor has a
parameter “const char *const locale”, which I assume [1] overrides any
system variables. However,
XMLPlatformUtils::Initialize("en_US.UTF-8");
this code compiles, but still throws exceptions when it encounters UTF-8
characters.
Am I correct to assume that parameter “locale” in constructor “Initialize”
overrides LC_CTYPE, LC_ALL and LANG variables when choosing how to
interpret characters? Or is there a bug in the constructor or is the
“locale” parameter written wrong?
Is my current work around a "good practice"?
[1] http://xerces.apache.org/xerces-c/apiDocs-3/classXMLPlatformUtils.html
Re: What is a correct way to set a locale for xerces?
Posted by Rudolfs Mazurs <ru...@gmail.com>.
piektd., 2019. g. 11. janv., plkst. 13:25 — lietotājs Roger Leigh (<
rleigh@codelibre.net>) rakstīja:
> My understanding is that this locale parameter only affects the
> selection of the message catalogue used for printing messages. Since
> there is only a single en_US message catalogue, overriding it won't do
> anything useful. So in terms of UTF-8 processing, I think this is a red
> herring.
>
That is a shame. It looked like a good option.
> Which transcoder have you configured Xerces-C to use? I notice that GNU
> iconv does some querying of the current charset with setlocale (but
> doesn't use the simpler and more correct nl_langinfo). If you're using
> gnuiconv, maybe try ICU instead?
>
I am using gnuiconv. Tried to use iconv, but that one silently drops
non-ascii characters. My build system doesn't have ICU components. I guess
I'll have to have a conversation with the responsible colleagues.
For the software I maintain, we were forced to mandate the use of UTF-8
> locale for correct operation.
>
Thanks for the advice!
Re: What is a correct way to set a locale for xerces?
Posted by Roger Leigh <rl...@codelibre.net>.
On 11/01/2019 10:23, Rudolfs Mazurs wrote:
> Hi,
> I have a service that is using xerces-c and has to be run stared under C
> locale for LANG and LC_*. I need xerces to be able to parse xml with UTF-8
> characters, so I used this workaround:
>
> setlocale(LC_CTYPE,"en_US.UTF-8");
> XMLPlatformUtils::Initialize();
>
> And while it seems to work, I noticed that Initialize constructor has a
> parameter “const char *const locale”, which I assume [1] overrides any
> system variables. However,
My understanding is that this locale parameter only affects the
selection of the message catalogue used for printing messages. Since
there is only a single en_US message catalogue, overriding it won't do
anything useful. So in terms of UTF-8 processing, I think this is a red
herring.
I would have hoped that Xerces-C would behave in a locale-independent
manner and work the same in all locales except maybe with respect to the
locale-defined stream encoding (which might be part of the problem).
Which transcoder have you configured Xerces-C to use? I notice that GNU
iconv does some querying of the current charset with setlocale (but
doesn't use the simpler and more correct nl_langinfo). If you're using
gnuiconv, maybe try ICU instead?
For the software I maintain, we were forced to mandate the use of UTF-8
locale for correct operation.
Regards,
Roger