You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@axis.apache.org by Mike Burati <mb...@bowstreet.com> on 2005/02/02 18:05:39 UTC

UTF-8 char issue with 1.2RC2 ?

I was looking into an issue a coworker ran into calling the Xmethods Babelfish translation service (specifically to test out non-ascii chars), and decided to try a simple WSDL2Java based client against the service to see how it compared to how he was using AXIS (1.2RC2), and I was able to reproduce nearly the same problem.
 
I figured I'd ask here before digging too deep on this one - hoping that maybe someone can spot either what I'm doing wrong or what the service itself is doing wrong, or what may be a bug in 1.2RC2?  I'm sure I've seen non-ascii chars work just fine in AXIS response envelopes before, so I'm not sure what's up with this one...
 
WSDL:  http://www.xmethods.net/sd/2001/BabelFishService.wsdl
 
Service is at:  http://services.xmethods.net:80/perl/soaplite.cgi
 
Sample request envelope
 
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:BabelFish soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="urn:xmethodsBabelFish"><translationmode xsi:type="xsd:string">en_fr</translationmode><sourcedata xsi:type="xsd:string">I'm going to the beach.</sourcedata></ns1:BabelFish></soapenv:Body></soapenv:Envelope>
 
Sample response envelope:
 
<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><namesp1:BabelFishResponse xmlns:namesp1="urn:xmethodsBabelFish"><return xsi:type="xsd:string">je vais à la plage </return></namesp1:BabelFishResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>

With an HTTP content type header of:
Content-Type: text/xml; charset=utf-8

The AXIS Fault I'm getting back in the simple WSDL2Java generated client (unit test hand-modified to just run as a standalone client without Junit) is slightly different than what my coworker is getting, but both are the same below Message.getSOAPEnvelope in the stack, where it's trying to parse the XML for the SOAP envelope in both cases, via the DeserializationContext for the response envelope.
 
The encoding for the response envelope is marked UTF-8 as is the HTTP response headers associated with that envelope.  The envelope looks fairly valid by eye and a simple Java program that just posted the above request to the same service and read and dumped out the response using a UTF-8 encoding didn't complain about any of the bytes in the response.

Note:  I don't get the error if the response contains just ascii chars.
 
Here's one exception I'm getting in the standalone test (MustUnderstand checker asks for the SOAP env):
 
	{http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException: Character conversion error: &quot;Malformed UTF-8 char -- is an XML encoding declaration missing?&quot; (line number may be too low).
	at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
	at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
	at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
	at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1048)
	at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:520)
	at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
	at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
	at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:226)
	at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:645)
	at org.apache.axis.Message.getSOAPEnvelope(Message.java:424)

And here's another another almost identical one from another batch of client code using AXIS directly (without WSDL2Java)

AxisFault
 faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
 faultSubcode:
 faultString: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 seq
uence.
 faultActor:
 faultNode:
 faultDetail:
        {http://xml.apache.org/axis/}stackTrace:java.io.UTFDataFormatException:
Invalid byte 2 of 3-byte UTF-8 sequence.
        at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
nown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
Dispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
known Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(Unknown Source)
        at org.apache.axis.encoding.DeserializationContext.parse(Deserialization
Context.java:226)
        at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:645)
        at org.apache.axis.Message.getSOAPEnvelope(Message.java:424)
        at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnders
tandChecker.java:62)
        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:173)
        at org.apache.axis.client.Call.invokeEngine(Call.java:2719)
        at org.apache.axis.client.Call.invoke(Call.java:2702)
        at org.apache.axis.client.Call.invoke(Call.java:2378)
        at org.apache.axis.client.Call.invoke(Call.java:2301)
        at org.apache.axis.client.Call.invoke(Call.java:1758)

Re: UTF-8 char issue with 1.2RC2 ?

Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Mike Burati wrote:

>I was looking into an issue a coworker ran into calling the Xmethods Babelfish translation service (specifically to test out non-ascii chars), and decided to try a simple WSDL2Java based client against the service to see how it compared to how he was using AXIS (1.2RC2), and I was able to reproduce nearly the same problem.
> 
>I figured I'd ask here before digging too deep on this one - hoping that maybe someone can spot either what I'm doing wrong or what the service itself is doing wrong, or what may be a bug in 1.2RC2?  I'm sure I've seen non-ascii chars work just fine in AXIS response envelopes before, so I'm not sure what's up with this one...
>  
>
i can confirm it that this is UTF8 related problem with the service and 
not with AXIS.

i have run the same test with XSUL SOAP stack using SDI and it failed in 
my UTF8 decoder as well.

alek

ps. here is what i got and looks like  a` is incorrectly encoded as 
\u00e0 ? server sends Content-Type: text/xml; charset=utf-8 and xml 
declaration has UTF-8 si really it should have been utf8 ...

<?xml version=\"1.0\" encoding=\"UTF-8\"?><SOAP-ENV:Envelope 
xmlns:SOAP-ENC=\"http://schemas.xmlsoap.org/soap/encoding/\" 
SOAP-ENV:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\" 
xmlns:SOAP-ENV=\"http://schemas.xmlsoap.org/soap/envelope/\" 
xmlns:xsi=\"http://www.w3.org/1999/XMLSchema-instance\" 
xmlns:xsd=\"http://www.w3.org/1999/XMLSchema\"><SOAP-ENV:Body><namesp1:BabelFishResponse 
xmlns:namesp1=\"urn:xmethodsBabelFish\"><return 
xsi:type=\"xsd:string\">je vais \u00e0 la plage 
</return></namesp1:BabelFishResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>---

could not read XML document prolog; nested exception is:
java.io.UTFDataFormatException: UTF8 encoding
    at xsul.util.Utf8Reader.read(Utf8Reader.java:108)

> 
>WSDL:  http://www.xmethods.net/sd/2001/BabelFishService.wsdl
> 
>Service is at:  http://services.xmethods.net:80/perl/soaplite.cgi
> 
>Sample request envelope
> 
><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><soapenv:Body><ns1:BabelFish soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:ns1="urn:xmethodsBabelFish"><translationmode xsi:type="xsd:string">en_fr</translationmode><sourcedata xsi:type="xsd:string">I'm going to the beach.</sourcedata></ns1:BabelFish></soapenv:Body></soapenv:Envelope>
> 
>Sample response envelope:
> 
><?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Body><namesp1:BabelFishResponse xmlns:namesp1="urn:xmethodsBabelFish"><return xsi:type="xsd:string">je vais à la plage </return></namesp1:BabelFishResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
>
>With an HTTP content type header of:
>Content-Type: text/xml; charset=utf-8
>
>The AXIS Fault I'm getting back in the simple WSDL2Java generated client (unit test hand-modified to just run as a standalone client without Junit) is slightly different than what my coworker is getting, but both are the same below Message.getSOAPEnvelope in the stack, where it's trying to parse the XML for the SOAP envelope in both cases, via the DeserializationContext for the response envelope.
> 
>The encoding for the response envelope is marked UTF-8 as is the HTTP response headers associated with that envelope.  The envelope looks fairly valid by eye and a simple Java program that just posted the above request to the same service and read and dumped out the response using a UTF-8 encoding didn't complain about any of the bytes in the response.
>
>Note:  I don't get the error if the response contains just ascii chars.
> 
>Here's one exception I'm getting in the standalone test (MustUnderstand checker asks for the SOAP env):
> 
>	{http://xml.apache.org/axis/}stackTrace:org.xml.sax.SAXParseException: Character conversion error: &quot;Malformed UTF-8 char -- is an XML encoding declaration missing?&quot; (line number may be too low).
>	at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
>	at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
>	at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
>	at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1048)
>	at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:520)
>	at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
>	at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
>	at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
>	at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:226)
>	at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:645)
>	at org.apache.axis.Message.getSOAPEnvelope(Message.java:424)
>
>And here's another another almost identical one from another batch of client code using AXIS directly (without WSDL2Java)
>
>AxisFault
> faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
> faultSubcode:
> faultString: java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 seq
>uence.
> faultActor:
> faultNode:
> faultDetail:
>        {http://xml.apache.org/axis/}stackTrace:java.io.UTFDataFormatException:
>Invalid byte 2 of 3-byte UTF-8 sequence.
>        at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
>        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
>        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unk
>nown Source)
>        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContent
>Dispatcher.dispatch(Unknown Source)
>        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Un
>known Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>        at javax.xml.parsers.SAXParser.parse(Unknown Source)
>        at org.apache.axis.encoding.DeserializationContext.parse(Deserialization
>Context.java:226)
>        at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:645)
>        at org.apache.axis.Message.getSOAPEnvelope(Message.java:424)
>        at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke(MustUnders
>tandChecker.java:62)
>        at org.apache.axis.client.AxisClient.invoke(AxisClient.java:173)
>        at org.apache.axis.client.Call.invokeEngine(Call.java:2719)
>        at org.apache.axis.client.Call.invoke(Call.java:2702)
>        at org.apache.axis.client.Call.invoke(Call.java:2378)
>        at org.apache.axis.client.Call.invoke(Call.java:2301)
>        at org.apache.axis.client.Call.invoke(Call.java:1758)
>
>  
>


-- 
The best way to predict the future is to invent it - Alan Kay