You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by "Rajani Gundimeda (JIRA)" <ax...@ws.apache.org> on 2012/05/17 22:59:09 UTC
[jira] [Commented] (AXIS-2342) Reopen issue: Character entities are
escaped too aggressively
[ https://issues.apache.org/jira/browse/AXIS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278256#comment-13278256 ]
Rajani Gundimeda commented on AXIS-2342:
----------------------------------------
We were facing this issue in our product and below is the workaround that I found.
I hope this helps and let me know if there is anything better that I can do.
We have a web service operation where the response parameter is a xml string (data type xs:string).
When the response xml string contains a multi byte character, axis escapes that to a multi char hex bytes.
For example
1) If the response contains a Japanese char 功 whose UTF-8 hex values are E5 8A 9F.
2) I have checked the java code just before we return the response string and its hex byte representation was E5 8A 9F
3) But( RPCProvider of) Axis 1.3 was escaping the 功 char to E5 160 178 i,e : 功 (which was not expected)
4) When I changed the java VM arguments as -Dfile.encoding=UTF-8, Axis escaped the chars to 0x529F, which is not expected either,
5) But when I set the java VM arguments as -Dfile.encoding=ISO-8859-1 and started the server. Axis was escaping 功 char to E5 8A 9F i,e : 功
I used -Dfile.encoding=ISO-8859-1 as work around.
But I wonder why didn't UTF-8 setting work?
Is there any else that I can do/configure Axis to support or escape the bytes to UTF-8 encoding only?
For the above testing I used : JBoss, Axis 1.3, Windows 7 (Default Windows code page was Cp1252)
Thanks
Rajani
> Reopen issue: Character entities are escaped too aggressively
> -------------------------------------------------------------
>
> Key: AXIS-2342
> URL: https://issues.apache.org/jira/browse/AXIS-2342
> Project: Axis
> Issue Type: Bug
> Components: Serialization/Deserialization
> Affects Versions: 1.0
> Environment: Operating System: All
> Platform: All
> Reporter: Thiago Jung Bauermann
> Attachments: AXIS_2342.diff, PATCH_2342.txt, TESTCASE_2342.txt, TEST_2342.diff
>
>
> We are using SOAP to send XML documents from client to server and back. The
> documents contain a lot of non-ASCII data. This is encoded as UTF-8 by us.
> However, when retrieved from an Axis server, Axis will escape almost all of our
> characters into character entities (so &#...;) This means messages become about
> three times as big as they have to for 'international' documents, which for us
> is a large performance problem. I narrowed down the problem to
> XMLUtils::xmlEncodeString
> that has the code:
> if (((int)chars[i]) > 127) {
> strBuf.append("&#");
> strBuf.append((int)chars[i]);
> strBuf.append(";");
> This seems unnecessary to me, as Axis will send all messages in UTF-8 anyway,
> for which no encoding is necessary (and should encoding be configurable, I feel
> this should be escaped elsewhere).
> Is there any reason for this code, I commented it out and it seemed to have no
> adverse effect on our application (apart from reduced network traffic)?
> Tested with 1.0, also looked up in the sources of 1.1-rc2.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@axis.apache.org
For additional commands, e-mail: java-dev-help@axis.apache.org