You are viewing a plain text version of this content. The canonical link for it is here.

Posted to soap-user@ws.apache.org by Rajesh J Advani <ra...@wipro.com> on 2001/05/01 18:38:34 UTC

Support for serialization of Japanese strings

Hi,

Before I elaborate on my question, is there a searchable archive of this
list somewhere? 

In case my question hasn't been asked before, here goes - 

I need to send Strings, holding Japanese data, to the SOAP server. If I
let the installation use the default Serializer classes, what reaches
the Server, is a series of question marks (like so - '????'). 

My guess is that the marshalling/unmarshalling does not support Unicode,
or multiple encodings, or some such.

I see two options for myself - 

1. Use a different Serializer class that supports different encodings.
2. Convert the String into a byte array, and use the Base64Serializer
class.

Now, I don't like option number 2, in fact I don't even know if this
will work, so is there any Serializer class I can use?

Thanks in advance.

-- 
Rajesh J Advani
-----------------------------------
Intolerance will not be tolerated!

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org

Re: Support for serialization of Japanese strings

Posted by Rajesh J Advani <ra...@wipro.com>.

> http://marc.theaimsgroup.com/?l=soap-user&r=1&w=2

Thanks. I think it would be a good idea to put this single line in the
FAQ/Info served by the ezmlm program.
It's better than having nothing.
 
> Conversion of the Japanese strings to Java's 
> 16-bit Unicode strings happens in your application. 
> It is then sent UTF-8 encoded by the Apache-SOAP 
> client over the network, and AFAIK properly decoded 
> into 16-bit Unicode before it is handed to the 
> XML parser, and then the unmarshalling process. More 
> or less the same is true for the marshalling process, 
> which produces 16-bit Unicode content directly.

> I've been unable to reproduce any reported problems 
> with transport-layer conversion. That doesn't mean 
> that they're not out there, though. See 
> http://marc.theaimsgroup.com/?l=soap-dev&m=98832035626643&w=2 
> for a test application.

One of my problems seemed to be that though I'm using a Japanese Windows
95, my locale was set to English(US), and so I got those '???'s. 

Fixing that to Japanese, showed me the correct string when I used the
TcpTunnelGui program. So the problem was not the Serialiser or the XML
Parser.

It _was_ the Transport layer. On doing the above fixes, the server
started giving me an IndexOutOfBoundsException from
BufferedReader.read()
So I went through the source code of the RPCRouterServlet and
BufferedReader classes, and realised what my problem was.

It's very important to state your platform and versions of the software
you're using when you're making a query.

I'm using Weblogic 5.1, and the SOAP installation BEA provides at their
XML Techtrack. That uses Apache-SOAP 2.0
Setting up the system to use Apache-SOAP 2.1 fixed the problem.

The content-length was being wrongly set, previously, so the
IndexOutOfBoundsException.

> Now, you're offering the suggestion that the problem 
> does not lie in the transport layer, but in the 
> serialisation step. I don't think that's possible, 
> as all data is kept strictly in 16-bit Unicode Java 
> characters until encoded as UTF-8 by the transport 
> layer. Another culprit could be the XML parser, which 
> is also never exposed to anything but 16-bit
> characters, but I doubt that too.

Well I was obviously wrong.

> Do you have any specific test cases 
> that can demonstrate this problem?
> Are you really getting the ASCII code 
> for the question mark on the other
> end, or is the content perhaps viewed 
> through a non-Unicode capable medium 
> (a text field in a window on a system 
> with no Unicode font installed or configured, 
> would very likely result in question marks, 
> for instance)?

Japanese OS, and doing 'dir' gave me some Japanese characters well
enough. I guess the JVM was messing things up because of the wrong
locale.

I guess I've bored you enough.

Thanks.

-- 
Rajesh J Advani
-----------------------------------
Intolerance will not be tolerated!

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org

Re: Support for serialization of Japanese strings

Posted by Rajesh J Advani <ra...@wipro.com>.

> http://marc.theaimsgroup.com/?l=soap-user&r=1&w=2

Thanks. I think it would be a good idea to put this single line in the
FAQ/Info served by the ezmlm program.
It's better than having nothing.
 
> Conversion of the Japanese strings to Java's 
> 16-bit Unicode strings happens in your application. 
> It is then sent UTF-8 encoded by the Apache-SOAP 
> client over the network, and AFAIK properly decoded 
> into 16-bit Unicode before it is handed to the 
> XML parser, and then the unmarshalling process. More 
> or less the same is true for the marshalling process, 
> which produces 16-bit Unicode content directly.

> I've been unable to reproduce any reported problems 
> with transport-layer conversion. That doesn't mean 
> that they're not out there, though. See 
> http://marc.theaimsgroup.com/?l=soap-dev&m=98832035626643&w=2 
> for a test application.

One of my problems seemed to be that though I'm using a Japanese Windows
95, my locale was set to English(US), and so I got those '???'s. 

Fixing that to Japanese, showed me the correct string when I used the
TcpTunnelGui program. So the problem was not the Serialiser or the XML
Parser.

It _was_ the Transport layer. On doing the above fixes, the server
started giving me an IndexOutOfBoundsException from
BufferedReader.read()
So I went through the source code of the RPCRouterServlet and
BufferedReader classes, and realised what my problem was.

It's very important to state your platform and versions of the software
you're using when you're making a query.

I'm using Weblogic 5.1, and the SOAP installation BEA provides at their
XML Techtrack. That uses Apache-SOAP 2.0
Setting up the system to use Apache-SOAP 2.1 fixed the problem.

The content-length was being wrongly set, previously, so the
IndexOutOfBoundsException.

> Now, you're offering the suggestion that the problem 
> does not lie in the transport layer, but in the 
> serialisation step. I don't think that's possible, 
> as all data is kept strictly in 16-bit Unicode Java 
> characters until encoded as UTF-8 by the transport 
> layer. Another culprit could be the XML parser, which 
> is also never exposed to anything but 16-bit
> characters, but I doubt that too.

Well I was obviously wrong.

> Do you have any specific test cases 
> that can demonstrate this problem?
> Are you really getting the ASCII code 
> for the question mark on the other
> end, or is the content perhaps viewed 
> through a non-Unicode capable medium 
> (a text field in a window on a system 
> with no Unicode font installed or configured, 
> would very likely result in question marks, 
> for instance)?

Japanese OS, and doing 'dir' gave me some Japanese characters well
enough. I guess the JVM was messing things up because of the wrong
locale.

I guess I've bored you enough.

Thanks.

-- 
Rajesh J Advani
-----------------------------------
Intolerance will not be tolerated!

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org

Re: Support for serialization of Japanese strings

Posted by Wouter Cloetens <wo...@mind.be>.

On Tue, May 01, 2001 at 10:08:34PM +0530, Rajesh J Advani wrote:
> Before I elaborate on my question, is there a searchable archive of this
> list somewhere? 

http://marc.theaimsgroup.com/?l=soap-user&r=1&w=2

> I need to send Strings, holding Japanese data, to the SOAP server. If I
> let the installation use the default Serializer classes, what reaches
> the Server, is a series of question marks (like so - '????'). 
> 
> My guess is that the marshalling/unmarshalling does not support Unicode,
> or multiple encodings, or some such.

Conversion of the Japanese strings to Java's 16-bit Unicode strings happens
in your application. It is then sent UTF-8 encoded by the Apache-SOAP client
over the network, and AFAIK properly decoded into 16-bit Unicode before it
is handed to the XML parser, and then the unmarshalling process. More or less
the same is true for the marshalling process, which produces 16-bit Unicode
content directly.

I've been unable to reproduce any reported problems with transport-layer
conversion. That doesn't mean that they're not out there, though. 
See http://marc.theaimsgroup.com/?l=soap-dev&m=98832035626643&w=2 for
a test application.

Now, you're offering the suggestion that the problem does not lie in the
transport layer, but in the serialisation step. I don't think that's
possible, as all data is kept strictly in 16-bit Unicode Java characters
until encoded as UTF-8 by the transport layer. Another culprit could be
the XML parser, which is also never exposed to anything but 16-bit
characters, but I doubt that too.

Do you have any specific test cases that can demonstrate this problem?
Are you really getting the ASCII code for the question mark on the other
end, or is the content perhaps viewed through a non-Unicode capable
medium (a text field in a window on a system with no Unicode font installed
or configured, would very likely result in question marks, for instance)?

bfn, Wouter

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org

Re: Support for serialization of Japanese strings

Posted by Wouter Cloetens <wo...@mind.be>.

On Tue, May 01, 2001 at 10:08:34PM +0530, Rajesh J Advani wrote:
> Before I elaborate on my question, is there a searchable archive of this
> list somewhere? 

http://marc.theaimsgroup.com/?l=soap-user&r=1&w=2

> I need to send Strings, holding Japanese data, to the SOAP server. If I
> let the installation use the default Serializer classes, what reaches
> the Server, is a series of question marks (like so - '????'). 
> 
> My guess is that the marshalling/unmarshalling does not support Unicode,
> or multiple encodings, or some such.

Conversion of the Japanese strings to Java's 16-bit Unicode strings happens
in your application. It is then sent UTF-8 encoded by the Apache-SOAP client
over the network, and AFAIK properly decoded into 16-bit Unicode before it
is handed to the XML parser, and then the unmarshalling process. More or less
the same is true for the marshalling process, which produces 16-bit Unicode
content directly.

I've been unable to reproduce any reported problems with transport-layer
conversion. That doesn't mean that they're not out there, though. 
See http://marc.theaimsgroup.com/?l=soap-dev&m=98832035626643&w=2 for
a test application.

Now, you're offering the suggestion that the problem does not lie in the
transport layer, but in the serialisation step. I don't think that's
possible, as all data is kept strictly in 16-bit Unicode Java characters
until encoded as UTF-8 by the transport layer. Another culprit could be
the XML parser, which is also never exposed to anything but 16-bit
characters, but I doubt that too.

Do you have any specific test cases that can demonstrate this problem?
Are you really getting the ASCII code for the question mark on the other
end, or is the content perhaps viewed through a non-Unicode capable
medium (a text field in a window on a system with no Unicode font installed
or configured, would very likely result in question marks, for instance)?

bfn, Wouter

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org