You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-user@axis.apache.org by Tim Bartley <tb...@au1.ibm.com> on 2005/02/28 03:11:19 UTC

Multi-byte locale limitations?

Hi,

A little while ago a thread was on the list alluding to issues with 
running Axis in processes running in non-UTF8 or single-byte locales. Can 
some one please elaborate on what limitations Axis has running in such 
locales - particularly on the client side.

Maybe I have misinterpreted the issue - perhaps it was more about 
serializing/deserializing string data encoded in non-UTF-8 character sets 
- this would be no issue for me - it's easy enough for me to ensure 
everything going in or out is encoded in UTF8.

Thanks and regards,

Tim
--
IBM Tivoli Access Manager Development
Gold Coast Development Lab, Australia
+61-7-5552-4001 phone
+61-7-5571-0420 fax

Re: Multi-byte locale limitations?

Posted by Toshiyuki Kimura <to...@apache.org>.
If you make a custom serializer/deserializer for any locales
to encode to ISO-8859-1, you will be able to send/receive any
multi-byte characters on your client. But even if in this case,
you'd better use - '<? xml version="1.0" encoding="utf-8" ?>',
due to the specs of XML, Web Services.

In the Axis (Java), we can use any multi-byte characters - such
as Japanese, Chinese and Korean, if you use UTF-8 as the encoding.
Please see the related information below:
<http://issues.apache.org/jira/browse/AXIS-1815?page=all>

Thanks,
Toshi

On Sun, 27 Feb 2005, Nadir Amra wrote:

> A lot of the string functions depend on the locale.
>
> I do not have a definitate answer, but from what I know, things will work
> if you are running in a locale that has a character set that is the same
> in UTF-8.  This, running ISO-8859-1 locale will work since the character
> set is the same in UTF-8.
>
> However, I would not be confident if a process was running in japanese or
> chinese locale, even if the data is in UTF-8.  But others may have a
> better handle on this. It may work.
>
>
> Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 09:24:29 PM:
>
>>
>> Thanks,
>>
>> If I ensure that everything I pass in or out of Axis is UTF-8 will I
>> still encounter problems if the process itself is running in a non-
>> UTF-8 (or subset) locale? Why does the process locale even matter?
>>
>> Tim
>> --
>> IBM Tivoli Access Manager Development
>> Gold Coast Development Lab, Australia
>> +61-7-5552-4001 phone
>> +61-7-5571-0420 fax
>>
>> Nadir Amra <am...@us.ibm.com>
>> 28/02/2005 13:18
>>
>> Please respond to
>> "Apache AXIS C User List"
>>
>> To
>>
>> "Apache AXIS C User List" <ax...@ws.apache.org>
>>
>> cc
>>
>> Subject
>>
>> Re: Multi-byte locale limitations?
>>
>>
>>
>>
>> The AXIS code currently assumes that everything coming in and everything
>
>> going out is UTF-8.  Thus, if you are running in a locale that is a
> subset
>> of UTF-8 (e.g. ISO-8859-1 character set), then you should be OK.
> However,
>> if you are running in a locale such as japanese or chinese, then you are
>
>> out of luck.
>>
>> This is something that will hopefully be fixed in the near future.
>>
>> Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 08:11:19 PM:
>>
>>>
>>> Hi,
>>>
>>> A little while ago a thread was on the list alluding to issues with
>>> running Axis in processes running in non-UTF8 or single-byte
>>> locales. Can some one please elaborate on what limitations Axis has
>>> running in such locales - particularly on the client side.
>>>
>>> Maybe I have misinterpreted the issue - perhaps it was more about
>>> serializing/deserializing string data encoded in non-UTF-8 character
>>> sets - this would be no issue for me - it's easy enough for me to
>>> ensure everything going in or out is encoded in UTF8.
>>>
>>> Thanks and regards,
>>>
>>> Tim
>>> --
>>> IBM Tivoli Access Manager Development
>>> Gold Coast Development Lab, Australia
>>> +61-7-5552-4001 phone
>>> +61-7-5571-0420 fax
>

Re: Multi-byte locale limitations?

Posted by John Hawkins <ha...@uk.ibm.com>.
And what if the server is not UTF8? I guess it's supposed to send back 
UTF8 because it got UTF coming in?







Nadir Amra <am...@us.ibm.com> 
28/02/2005 03:31
Please respond to
"Apache AXIS C User List"


To
"Apache AXIS C User List" <ax...@ws.apache.org>
cc

Subject
Re: Multi-byte locale limitations?






A lot of the string functions depend on the locale.

I do not have a definitate answer, but from what I know, things will work 
if you are running in a locale that has a character set that is the same 
in UTF-8.  This, running ISO-8859-1 locale will work since the character 
set is the same in UTF-8.

However, I would not be confident if a process was running in japanese or 
chinese locale, even if the data is in UTF-8.  But others may have a 
better handle on this. It may work.


Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 09:24:29 PM:

> 
> Thanks, 
> 
> If I ensure that everything I pass in or out of Axis is UTF-8 will I
> still encounter problems if the process itself is running in a non-
> UTF-8 (or subset) locale? Why does the process locale even matter? 
> 
> Tim
> --
> IBM Tivoli Access Manager Development
> Gold Coast Development Lab, Australia
> +61-7-5552-4001 phone
> +61-7-5571-0420 fax 
> 

> 
> Nadir Amra <am...@us.ibm.com> 
> 28/02/2005 13:18 
> 
> Please respond to
> "Apache AXIS C User List"
> 
> To
> 
> "Apache AXIS C User List" <ax...@ws.apache.org> 
> 
> cc
> 
> Subject
> 
> Re: Multi-byte locale limitations?
> 
> 
> 
> 
> The AXIS code currently assumes that everything coming in and everything 


> going out is UTF-8.  Thus, if you are running in a locale that is a 
subset 
> of UTF-8 (e.g. ISO-8859-1 character set), then you should be OK. 
However, 
> if you are running in a locale such as japanese or chinese, then you are 


> out of luck. 
> 
> This is something that will hopefully be fixed in the near future. 
> 
> Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 08:11:19 PM:
> 
> > 
> > Hi, 
> > 
> > A little while ago a thread was on the list alluding to issues with 
> > running Axis in processes running in non-UTF8 or single-byte 
> > locales. Can some one please elaborate on what limitations Axis has 
> > running in such locales - particularly on the client side. 
> > 
> > Maybe I have misinterpreted the issue - perhaps it was more about 
> > serializing/deserializing string data encoded in non-UTF-8 character
> > sets - this would be no issue for me - it's easy enough for me to 
> > ensure everything going in or out is encoded in UTF8. 
> > 
> > Thanks and regards, 
> > 
> > Tim
> > --
> > IBM Tivoli Access Manager Development
> > Gold Coast Development Lab, Australia
> > +61-7-5552-4001 phone
> > +61-7-5571-0420 fax



Re: Multi-byte locale limitations?

Posted by Nadir Amra <am...@us.ibm.com>.
A lot of the string functions depend on the locale.

I do not have a definitate answer, but from what I know, things will work 
if you are running in a locale that has a character set that is the same 
in UTF-8.  This, running ISO-8859-1 locale will work since the character 
set is the same in UTF-8.

However, I would not be confident if a process was running in japanese or 
chinese locale, even if the data is in UTF-8.  But others may have a 
better handle on this. It may work.


Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 09:24:29 PM:

> 
> Thanks, 
> 
> If I ensure that everything I pass in or out of Axis is UTF-8 will I
> still encounter problems if the process itself is running in a non-
> UTF-8 (or subset) locale? Why does the process locale even matter? 
> 
> Tim
> --
> IBM Tivoli Access Manager Development
> Gold Coast Development Lab, Australia
> +61-7-5552-4001 phone
> +61-7-5571-0420 fax 
> 

> 
> Nadir Amra <am...@us.ibm.com> 
> 28/02/2005 13:18 
> 
> Please respond to
> "Apache AXIS C User List"
> 
> To
> 
> "Apache AXIS C User List" <ax...@ws.apache.org> 
> 
> cc
> 
> Subject
> 
> Re: Multi-byte locale limitations?
> 
> 
> 
> 
> The AXIS code currently assumes that everything coming in and everything 

> going out is UTF-8.  Thus, if you are running in a locale that is a 
subset 
> of UTF-8 (e.g. ISO-8859-1 character set), then you should be OK. 
However, 
> if you are running in a locale such as japanese or chinese, then you are 

> out of luck. 
> 
> This is something that will hopefully be fixed in the near future. 
> 
> Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 08:11:19 PM:
> 
> > 
> > Hi, 
> > 
> > A little while ago a thread was on the list alluding to issues with 
> > running Axis in processes running in non-UTF8 or single-byte 
> > locales. Can some one please elaborate on what limitations Axis has 
> > running in such locales - particularly on the client side. 
> > 
> > Maybe I have misinterpreted the issue - perhaps it was more about 
> > serializing/deserializing string data encoded in non-UTF-8 character
> > sets - this would be no issue for me - it's easy enough for me to 
> > ensure everything going in or out is encoded in UTF8. 
> > 
> > Thanks and regards, 
> > 
> > Tim
> > --
> > IBM Tivoli Access Manager Development
> > Gold Coast Development Lab, Australia
> > +61-7-5552-4001 phone
> > +61-7-5571-0420 fax


Re: Multi-byte locale limitations?

Posted by Tim Bartley <tb...@au1.ibm.com>.
Thanks,

If I ensure that everything I pass in or out of Axis is UTF-8 will I still 
encounter problems if the process itself is running in a non-UTF-8 (or 
subset) locale? Why does the process locale even matter?

Tim
--
IBM Tivoli Access Manager Development
Gold Coast Development Lab, Australia
+61-7-5552-4001 phone
+61-7-5571-0420 fax



Nadir Amra <am...@us.ibm.com> 
28/02/2005 13:18
Please respond to
"Apache AXIS C User List"


To
"Apache AXIS C User List" <ax...@ws.apache.org>
cc

Subject
Re: Multi-byte locale limitations?






The AXIS code currently assumes that everything coming in and everything 
going out is UTF-8.  Thus, if you are running in a locale that is a subset 

of UTF-8 (e.g. ISO-8859-1 character set), then you should be OK.  However, 

if you are running in a locale such as japanese or chinese, then you are 
out of luck. 

This is something that will hopefully be fixed in the near future. 

Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 08:11:19 PM:

> 
> Hi, 
> 
> A little while ago a thread was on the list alluding to issues with 
> running Axis in processes running in non-UTF8 or single-byte 
> locales. Can some one please elaborate on what limitations Axis has 
> running in such locales - particularly on the client side. 
> 
> Maybe I have misinterpreted the issue - perhaps it was more about 
> serializing/deserializing string data encoded in non-UTF-8 character
> sets - this would be no issue for me - it's easy enough for me to 
> ensure everything going in or out is encoded in UTF8. 
> 
> Thanks and regards, 
> 
> Tim
> --
> IBM Tivoli Access Manager Development
> Gold Coast Development Lab, Australia
> +61-7-5552-4001 phone
> +61-7-5571-0420 fax


Re: Multi-byte locale limitations?

Posted by Nadir Amra <am...@us.ibm.com>.
The AXIS code currently assumes that everything coming in and everything 
going out is UTF-8.  Thus, if you are running in a locale that is a subset 
of UTF-8 (e.g. ISO-8859-1 character set), then you should be OK.  However, 
if you are running in a locale such as japanese or chinese, then you are 
out of luck. 

This is something that will hopefully be fixed in the near future. 

Tim Bartley <tb...@au1.ibm.com> wrote on 02/27/2005 08:11:19 PM:

> 
> Hi, 
> 
> A little while ago a thread was on the list alluding to issues with 
> running Axis in processes running in non-UTF8 or single-byte 
> locales. Can some one please elaborate on what limitations Axis has 
> running in such locales - particularly on the client side. 
> 
> Maybe I have misinterpreted the issue - perhaps it was more about 
> serializing/deserializing string data encoded in non-UTF-8 character
> sets - this would be no issue for me - it's easy enough for me to 
> ensure everything going in or out is encoded in UTF8. 
> 
> Thanks and regards, 
> 
> Tim
> --
> IBM Tivoli Access Manager Development
> Gold Coast Development Lab, Australia
> +61-7-5552-4001 phone
> +61-7-5571-0420 fax