You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-user@axis.apache.org by Iwan Tomlow <iw...@seagha.com> on 2007/03/28 13:22:00 UTC

Invalid byte 2 of 2-byte UTF-8 sequence?

Hi,

when connecting an Axis-C client (v1.5) to an Axis-java webservice, the return message has the following error:

<soapenv:Fault>
  <faultcode>soapenv:Server.userException</faultcode>
   <faultstring>java.io.UTFDataFormatException: Invalid byte 2 of 2-byte UTF-8 sequence.</faultstring>

The only difference with all other working requests, seems the company name containing "GÜTER".
When debugging, the U-umlaut is showing correctly everywhere, but apparently the receiving webservice isn't getting it correctly.
The source message generated by Axis-C++ looks like this in TCPMonitor:

Content-Type: text/xml; charset=UTF-8

<?xml version='1.0' encoding='utf-8' ?>
...
                  <ns1:company>ROTTMANN HEINRICH G TER</ns1:company>

So it seems the character is indeed being encoded incorrectly by Axis-C?


Does anyone have any hints to get this working?

By the way, I receive this company-name via another webservice, and I notice that they sent me this:

<employerName>ROTTMANN HEINRICH G&#xDC;TER</employerName>

Maybe there is a way to configure Axis-C to do the same, sending a character reference to avoid the encoding-mismatch?

Kind regards,
Iwan Tomlow

---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-user-help@ws.apache.org


RE: Invalid byte 2 of 2-byte UTF-8 sequence?

Posted by Nadir Amra <am...@us.ibm.com>.
Iwan,

It would be a fix if you were running only in a process that is consistent 
with ISO-8859-1 encoding, but that is not the case.  From the client-side, 
the easiest fix would be to provide a  toUTF8() function as OS/400 does. 
The proper fix for client/server side is to ensure all data stored in 
various classes is wchar and to translate wchar to utf8. 
 
Nadir K. Amra


"Iwan Tomlow" <iw...@seagha.com> wrote on 03/29/2007 09:04:32 AM:

> Thanks for pointing that out, I should have been able to find that one 
myself.
> Most reasonable thing for me to do seems to apply the proposed 
> workaround from AXISCPP-964, i.e.
> 
> In SoapSerializer.cpp, turn
> serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL);
> into
> serialize( "<?xml version='1.0' encoding='ISO-8859-1' ?>", NULL); 
> 
> I'm no expert on character-encoding, but since "AxisChar" is defined
> in Gdefine.hpp as a normal "char" anyway, isn't this simply the best
> way to fix the issue completely?
> 
> Kind regards,
> Iwan
> 
> 
> -----Original Message-----
> From: Nadir Amra [mailto:amra@us.ibm.com] 
> Sent: donderdag 29 maart 2007 0:52
> To: Apache AXIS C User List
> Cc: Apache AXIS C User List
> Subject: Re: Invalid byte 2 of 2-byte UTF-8 sequence?
> 
> Iwan,
> 
> This is an existing problem.  See AXISCPP-964.  If anyone want to 
> provide a patch, I can include the patch. If you have a solution to 
> fix your particular problem, then please provide the patch. 
> 
> Nadir K. Amra
> 
> 
> "Iwan Tomlow" <iw...@seagha.com> wrote on 03/28/2007 06:22:00 AM:
> 
> > Hi,
> > 
> > when connecting an Axis-C client (v1.5) to an Axis-java webservice, 
> > the return message has the following error:
> > 
> > <soapenv:Fault>
> >   <faultcode>soapenv:Server.userException</faultcode>
> >    <faultstring>java.io.UTFDataFormatException: Invalid byte 2 of 2- 
> > byte UTF-8 sequence.</faultstring>
> > 
> > The only difference with all other working requests, seems the company 

> > name containing "GÜTER".
> > When debugging, the U-umlaut is showing correctly everywhere, but 
> > apparently the receiving webservice isn't getting it correctly.
> > The source message generated by Axis-C++ looks like this in 
TCPMonitor:
> > 
> > Content-Type: text/xml; charset=UTF-8
> > 
> > <?xml version='1.0' encoding='utf-8' ?> ...
> >                   <ns1:company>ROTTMANN HEINRICH G TER</ns1:company>
> > 
> > So it seems the character is indeed being encoded incorrectly by 
Axis-C?
> > 
> > 
> > Does anyone have any hints to get this working?
> > 
> > By the way, I receive this company-name via another webservice, and I 
> > notice that they sent me this:
> > 
> > <employerName>ROTTMANN HEINRICH G&#xDC;TER</employerName>
> > 
> > Maybe there is a way to configure Axis-C to do the same, sending a 
> > character reference to avoid the encoding-mismatch?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-user-help@ws.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
> For additional commands, e-mail: axis-c-user-help@ws.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-user-help@ws.apache.org


RE: Invalid byte 2 of 2-byte UTF-8 sequence?

Posted by Iwan Tomlow <iw...@seagha.com>.
Thanks for pointing that out, I should have been able to find that one myself.
Most reasonable thing for me to do seems to apply the proposed workaround from AXISCPP-964, i.e.

In SoapSerializer.cpp, turn
serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL);
into
serialize( "<?xml version='1.0' encoding='ISO-8859-1' ?>", NULL);  

I'm no expert on character-encoding, but since "AxisChar" is defined in Gdefine.hpp as a normal "char" anyway, isn't this simply the best way to fix the issue completely?

Kind regards,
Iwan


-----Original Message-----
From: Nadir Amra [mailto:amra@us.ibm.com] 
Sent: donderdag 29 maart 2007 0:52
To: Apache AXIS C User List
Cc: Apache AXIS C User List
Subject: Re: Invalid byte 2 of 2-byte UTF-8 sequence?

Iwan,

This is an existing problem.  See AXISCPP-964.  If anyone want to provide a patch, I can include the patch. If you have a solution to fix your particular problem, then please provide the patch. 

Nadir K. Amra


"Iwan Tomlow" <iw...@seagha.com> wrote on 03/28/2007 06:22:00 AM:

> Hi,
> 
> when connecting an Axis-C client (v1.5) to an Axis-java webservice, 
> the return message has the following error:
> 
> <soapenv:Fault>
>   <faultcode>soapenv:Server.userException</faultcode>
>    <faultstring>java.io.UTFDataFormatException: Invalid byte 2 of 2- 
> byte UTF-8 sequence.</faultstring>
> 
> The only difference with all other working requests, seems the company 
> name containing "GÜTER".
> When debugging, the U-umlaut is showing correctly everywhere, but 
> apparently the receiving webservice isn't getting it correctly.
> The source message generated by Axis-C++ looks like this in TCPMonitor:
> 
> Content-Type: text/xml; charset=UTF-8
> 
> <?xml version='1.0' encoding='utf-8' ?> ...
>                   <ns1:company>ROTTMANN HEINRICH G TER</ns1:company>
> 
> So it seems the character is indeed being encoded incorrectly by Axis-C?
> 
> 
> Does anyone have any hints to get this working?
> 
> By the way, I receive this company-name via another webservice, and I 
> notice that they sent me this:
> 
> <employerName>ROTTMANN HEINRICH G&#xDC;TER</employerName>
> 
> Maybe there is a way to configure Axis-C to do the same, sending a 
> character reference to avoid the encoding-mismatch?


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-user-help@ws.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-user-help@ws.apache.org


Re: Invalid byte 2 of 2-byte UTF-8 sequence?

Posted by Nadir Amra <am...@us.ibm.com>.
Iwan,

This is an existing problem.  See AXISCPP-964.  If anyone want to provide 
a patch, I can include the patch. If you have a solution to fix your 
particular problem, then please provide the patch. 

Nadir K. Amra


"Iwan Tomlow" <iw...@seagha.com> wrote on 03/28/2007 06:22:00 AM:

> Hi,
> 
> when connecting an Axis-C client (v1.5) to an Axis-java webservice, 
> the return message has the following error:
> 
> <soapenv:Fault>
>   <faultcode>soapenv:Server.userException</faultcode>
>    <faultstring>java.io.UTFDataFormatException: Invalid byte 2 of 2-
> byte UTF-8 sequence.</faultstring>
> 
> The only difference with all other working requests, seems the 
> company name containing "GÜTER".
> When debugging, the U-umlaut is showing correctly everywhere, but 
> apparently the receiving webservice isn't getting it correctly.
> The source message generated by Axis-C++ looks like this in TCPMonitor:
> 
> Content-Type: text/xml; charset=UTF-8
> 
> <?xml version='1.0' encoding='utf-8' ?>
> ...
>                   <ns1:company>ROTTMANN HEINRICH G TER</ns1:company>
> 
> So it seems the character is indeed being encoded incorrectly by Axis-C?
> 
> 
> Does anyone have any hints to get this working?
> 
> By the way, I receive this company-name via another webservice, and 
> I notice that they sent me this:
> 
> <employerName>ROTTMANN HEINRICH G&#xDC;TER</employerName>
> 
> Maybe there is a way to configure Axis-C to do the same, sending a 
> character reference to avoid the encoding-mismatch?


---------------------------------------------------------------------
To unsubscribe, e-mail: axis-c-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-c-user-help@ws.apache.org