You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@axis.apache.org by John Hawkins <HA...@uk.ibm.com> on 2004/12/22 10:54:20 UTC

Re: platform support: internationalization and EBCDIC vs ASCII




Well, you've taken on something that we have been thinking about for some
time but avoided :-)

When we did think about this the natural choice is ICU4C it's full of
libraries that do this kind of stuff. (although I have never used it).
ICU4C is already needed for the date-time stuff that Fred is going to do so
there are no issues there. We kinda got approval to use it from dims - we
could get no response from apache licensing - despite trying v. hard !

ICU4C is a big API but it does appear to be regarded quite well, as far as
I can tell. If you think you can get away with out it then fair enough (you
seem to be saying below that you can?)

I think you're ideas of transcoding flag in the config file are good.





John Hawkins



Nadir Amra <am...@us.ibm.com> wrote on 22/12/2004 08:37:47:

> Correct me if I am wrong....and sorry for the long note but it is
> necessary.
>
> The AXIS code has a restriction that the locale of the process must be
> UTF-8 assumes everything is in UTF-8.  Thus the code works specifically
in
> processes where the locale is set to UTF-8 or to a single byte ASCII
> character set such as the Latin-1 locales, since the character set is a
> subset of UTF-8).  For those locales that are not single byte or UTF-8,
> code does not work so well.  Obviously the code does not work on
> EBCDIC-based systems such as OS/400.
>
> I need this restriction removed in version 1.5.
>
> To remove the restriction, the code needs to be sensitive to the locale
of
> the process that the client is running in and assume any data received
> from the client that is to be passed to a web service is in the character

> set of the locale, and thus needs to be converted to UTF-8.  Similarly,
> any data received from the web service needs to be converted to the
> character set of the running process, since the various C-runtime string
> functions are dependent on the locale of the process in order for the
> functions to work properly.
>
> The XML parsers can handle the data coming in from the Web service no
> matter what the encoding, and there is no problem on that side of things.

> I am assuming the data obtained by the XML parser is being transcoded to
> UTF-8.
>
> In addition, there are hard-code literal strings that is assumed to be in

> ASCII.  This would also need to be changed.
>
> I plan spending a lot of time in the next 4 weeks to get the
> infrastructure built into the code to allow the code to run on OS/400.
> Hopefully, the work I put in can easily be extended to other platforms so

> that if someone wanted to run in a Japanese locale, it would work with
> minor changes.
>
> My thoughts are that a user can indicate whether transcoding should be
> enabled via a configuration property in the property file.  When that
> happens, the code will create transcoders to convert data from the locale

> of the process to UTF-8 and from UTF-8 to the locale of the process.  I
> still have to investigate if it is possible to use the XML parser
> transcoders, or even if that is possible.  I am looking for direction
from
> you all to see how what a good implementation would be and where in the
> code do you think this support would need to be added.
>
> As far as the literal strings that should be in Latin-1 character set,
> this is easily worked around by putting the string in a buffer and
> converted using the PLATFORM_STRTOASC() macro (currently in each
> PlatformSpecificXXXX.hpp file).  For ASCII-based systems, these macros
are
> identity macros.  In addition, if data in a buffer is known to be in the
> latin-1 character set and needs to be converted to the character set of
> the process, PLATFORM_ASCTOSTR() can be used.  Again, for ASCII-based
> systems,  these macros are identity macros.  I plan on doing this as a
> first stage, which should be a benign change.
>
> What are your thoughts?
>