You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@axis.apache.org by Amandeep Singh <as...@quark.com> on 2008/06/09 22:18:32 UTC
Invalid UTF-8 character encoding in SOAP response
Hi All,
I am using axis 1.3. If the response contains a CJK character in UTF-8,
axis converts it into an xml entity. On the receiver side, xml parsing
fails saying that it is an invalid xml entity.
The character used has UTF-8 value F0AA989A. And axis converts it into
����. And parser fails at first entity.
Any ideas/hints would be greatly appreciated?
Thanks,
Aman
RE: Invalid UTF-8 character encoding in SOAP response
Posted by Amandeep Singh <as...@quark.com>.
Posting solution.
The issue is with UTF8Encoder class of axis. The class does not consider
surrogate characters. The solution is to override that class to handle
surrogates.
Is this fixed in latest version of axis? Just curious.
Thanks,
Aman
-----Original Message-----
From: Amandeep Singh [mailto:asingh@quark.com]
Sent: Monday, June 09, 2008 3:09 PM
To: axis-user@ws.apache.org
Subject: RE: Invalid UTF-8 character encoding in SOAP response
Thanks Andreas.
My bad. The entity being produced is ��
So, anyone who has axis 1 experience, any suggestions as to how to force
axis to output correct entity?
Thanks,
Aman
-----Original Message-----
From: Andreas Veithen [mailto:andreas.veithen@skynet.be]
Sent: Monday, June 09, 2008 2:59 PM
To: axis-user@ws.apache.org
Subject: Re: Invalid UTF-8 character encoding in SOAP response
Aman,
D869 DE1A is actually the surrogate pair for the character with code
point 2A61A, which is encoded as F0AA989A in UTF-8 (see
http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi)
. The two other character references (��) correspond to
another character. I'm not an expert, but the XML specs don't mention
surrogate pairs and I think that the correct way of encoding the
character as a character reference should be 𪘚 in this case.
This definitely looks like a bug in the XML parser. I would try to
replace the XML parser by a new version of the same parser or by
another parser. I'm not familiar with Axis 1, so I don't know what
kind of parser (SAX or StAX) it uses. Maybe somebody else on the list
can give a hint?
Andreas
On 9 juin 08, at 22:18, Amandeep Singh wrote:
> Hi All,
>
> I am using axis 1.3. If the response contains a CJK character in
> UTF-8, axis converts it into an xml entity. On the receiver side,
> xml parsing fails saying that it is an invalid xml entity.
>
> The character used has UTF-8 value F0AA989A. And axis converts it
> into ����. And parser fails at first
> entity.
>
> Any ideas/hints would be greatly appreciated?
>
> Thanks,
> Aman
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org
RE: Invalid UTF-8 character encoding in SOAP response
Posted by Amandeep Singh <as...@quark.com>.
Thanks Andreas.
My bad. The entity being produced is ��
So, anyone who has axis 1 experience, any suggestions as to how to force
axis to output correct entity?
Thanks,
Aman
-----Original Message-----
From: Andreas Veithen [mailto:andreas.veithen@skynet.be]
Sent: Monday, June 09, 2008 2:59 PM
To: axis-user@ws.apache.org
Subject: Re: Invalid UTF-8 character encoding in SOAP response
Aman,
D869 DE1A is actually the surrogate pair for the character with code
point 2A61A, which is encoded as F0AA989A in UTF-8 (see
http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi)
. The two other character references (��) correspond to
another character. I'm not an expert, but the XML specs don't mention
surrogate pairs and I think that the correct way of encoding the
character as a character reference should be 𪘚 in this case.
This definitely looks like a bug in the XML parser. I would try to
replace the XML parser by a new version of the same parser or by
another parser. I'm not familiar with Axis 1, so I don't know what
kind of parser (SAX or StAX) it uses. Maybe somebody else on the list
can give a hint?
Andreas
On 9 juin 08, at 22:18, Amandeep Singh wrote:
> Hi All,
>
> I am using axis 1.3. If the response contains a CJK character in
> UTF-8, axis converts it into an xml entity. On the receiver side,
> xml parsing fails saying that it is an invalid xml entity.
>
> The character used has UTF-8 value F0AA989A. And axis converts it
> into ����. And parser fails at first
> entity.
>
> Any ideas/hints would be greatly appreciated?
>
> Thanks,
> Aman
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org
Re: Invalid UTF-8 character encoding in SOAP response
Posted by Andreas Veithen <an...@skynet.be>.
Aman,
D869 DE1A is actually the surrogate pair for the character with code
point 2A61A, which is encoded as F0AA989A in UTF-8 (see http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi)
. The two other character references (��) correspond to
another character. I'm not an expert, but the XML specs don't mention
surrogate pairs and I think that the correct way of encoding the
character as a character reference should be 𪘚 in this case.
This definitely looks like a bug in the XML parser. I would try to
replace the XML parser by a new version of the same parser or by
another parser. I'm not familiar with Axis 1, so I don't know what
kind of parser (SAX or StAX) it uses. Maybe somebody else on the list
can give a hint?
Andreas
On 9 juin 08, at 22:18, Amandeep Singh wrote:
> Hi All,
>
> I am using axis 1.3. If the response contains a CJK character in
> UTF-8, axis converts it into an xml entity. On the receiver side,
> xml parsing fails saying that it is an invalid xml entity.
>
> The character used has UTF-8 value F0AA989A. And axis converts it
> into ����. And parser fails at first
> entity.
>
> Any ideas/hints would be greatly appreciated?
>
> Thanks,
> Aman
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-user-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-user-help@ws.apache.org