You are viewing a plain text version of this content. The canonical link for it is here.
Posted to soap-user@ws.apache.org by Mike Spreitzer <ms...@us.ibm.com> on 2001/04/10 21:34:59 UTC

Can SOAP 2.1 and Xerces 1.2.2 transport a String containing an arbitrary Unicode character?

I just tried to send (from client to server) a java.lang.String containing 
U+C5 and U+F6.  The SOAP 2.1 TcpTunnelGui prints this:
...
<id xsi:type="xsd:string">᠀ngstrtem 
xsi:type="xsd:string">street1</item>
...

(ellipsis mine).  Note that the <id> "eats" into the following <item>!  At 
the server, I get this message in stdout/stderr:

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0) was 
found in the element content of the document.
        at 
org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1016)
        at 
org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:643)
        at 
org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1355)
        at 
org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:380)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:908)
        at 
org.apache.soap.util.xml.XercesParserLiaison.read(XercesParserLiaison.java:85)
        at 
org.apache.soap.transport.TransportMessage.unmarshall(TransportMessage.java:267)
        at 
org.apache.soap.server.ServerUtils.readEnvelopeFromInputStream(ServerUtils.java:118)
        at 
org.apache.soap.server.http.ServerHTTPUtils.readEnvelopeFromRequest(ServerHTTPUtils.java:150)
        at 
org.apache.soap.server.http.PubRPCRouterServlet.doPost(PubRPCRouterServlet.java:220)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
        at 
org.apache.tomcat.core.ServletWrapper.doService(ServletWrapper.java:405)
        at org.apache.tomcat.core.Handler.service(Handler.java:287)
        at 
org.apache.tomcat.core.ServletWrapper.service(ServletWrapper.java:372)
        at 
org.apache.tomcat.core.ContextManager.internalService(ContextManager.java:797)
        at 
org.apache.tomcat.core.ContextManager.service(ContextManager.java:743)
        at 
org.apache.tomcat.service.http.HttpConnectionHandler.processConnection(HttpConnectionHandler.java:213)
        at 
org.apache.tomcat.service.TcpWorkerThread.runIt(PoolTcpEndpoint.java:416)
        at 
org.apache.tomcat.util.ThreadPool$ControlRunnable.run(ThreadPool.java:498)
        at java.lang.Thread.run(Thread.java:498)

I'm running my SOAP server in Tomcat 3.2.2b3 "direct".  My SOAP client is 
also a servlet in a different instance of Tomcat 3.2.2b3.  Both look like 
they have Apache SOAP 2.1's soap.jar in their classpath, along with Xerces 
1.2.2.  I'm using IBM JDK (or whatever it's called now) 1.3, on vanilla 
AIX 4.3.3.  The TcpTunnelGui is using IBM JDK 1.3 on Win2K.

Any clues appreciated.

Thanks,
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org


Re: Can SOAP 2.1 and Xerces 1.2.2 transport a String containing an arbitrary Unicode character?

Posted by Wouter Cloetens <wo...@mind.be>.
Mike,

I wouldn't trust the tunnel GUI on this. It may not be able to properly render
the string. I'm not sure how the byte data translation to Unicode, then to
the native GUI occurs, but it probably isn't designed with any awareness of
non-ASCII characters in mind. This might explain the 'eating in' effect
(incorrect multi-byte character encodings chewing at following characters). Could
you grab the data flow and present it in a hex/ascii dump in some way?
 
Apache-SOAP is aware of XML-hostile characters like < and > and properly
escapes them. I don't think a 0x00 character inside the parser stream can be
explained by this. I'd be very interested in finding out about potential
breakage due to incorrect encoding/decoding...

bfn, Wouter

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org


Re: Can SOAP 2.1 and Xerces 1.2.2 transport a String containing an arbitrary Unicode character?

Posted by Wouter Cloetens <wo...@mind.be>.
Mike,

I wouldn't trust the tunnel GUI on this. It may not be able to properly render
the string. I'm not sure how the byte data translation to Unicode, then to
the native GUI occurs, but it probably isn't designed with any awareness of
non-ASCII characters in mind. This might explain the 'eating in' effect
(incorrect multi-byte character encodings chewing at following characters). Could
you grab the data flow and present it in a hex/ascii dump in some way?
 
Apache-SOAP is aware of XML-hostile characters like < and > and properly
escapes them. I don't think a 0x00 character inside the parser stream can be
explained by this. I'd be very interested in finding out about potential
breakage due to incorrect encoding/decoding...

bfn, Wouter

---------------------------------------------------------------------
To unsubscribe, e-mail: soap-user-unsubscribe@xml.apache.org
For additional commands, email: soap-user-help@xml.apache.org