You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by "Wilbert Pol (JIRA)" <ax...@ws.apache.org> on 2007/07/19 13:04:04 UTC
[jira] Commented: (AXIS-2342) Reopen issue: Character entities are
escaped too aggressively
[ https://issues.apache.org/jira/browse/AXIS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513876 ]
Wilbert Pol commented on AXIS-2342:
-----------------------------------
We ran into this issue with axis 1.4 in a hybrid java/perl/.net environment trying to communicate a euro sign (unicode 20ac, utf8 e282ac). The axis 1.4 service advertised itself as outputting utf8 but the euro sign got encoded as € which imo looks more like a dirty hack.
What actually helped was removing all the special encoding code from the default case in the writeEncoded method in org.apache.axis.component.encoding.UTF8Encoder. This made axis output a nice utf8 euro sign. It looks like there's some final encoding going on at a higher level in axis, but I didn't bother to look into it further.
The relevant section of UTF8Encoder becomes:
case '\t':
writer.write(TAB);
break;
default:
if (character < 0x20) {
throw new IllegalArgumentException(Messages.getMessage(
"invalidXmlCharacter00",
Integer.toHexString(character),
xmlString.substring(0, i)));
} else {
writer.write(character);
}
break;
}
> Reopen issue: Character entities are escaped too aggressively
> -------------------------------------------------------------
>
> Key: AXIS-2342
> URL: https://issues.apache.org/jira/browse/AXIS-2342
> Project: Axis
> Issue Type: Bug
> Components: Serialization/Deserialization
> Affects Versions: 1.0
> Environment: Operating System: All
> Platform: All
> Reporter: Thiago Jung Bauermann
> Assignee: Axis Developers Mailing List
> Attachments: AXIS_2342.diff, PATCH_2342.txt, TEST_2342.diff, TESTCASE_2342.txt
>
>
> We are using SOAP to send XML documents from client to server and back. The
> documents contain a lot of non-ASCII data. This is encoded as UTF-8 by us.
> However, when retrieved from an Axis server, Axis will escape almost all of our
> characters into character entities (so &#...;) This means messages become about
> three times as big as they have to for 'international' documents, which for us
> is a large performance problem. I narrowed down the problem to
> XMLUtils::xmlEncodeString
> that has the code:
> if (((int)chars[i]) > 127) {
> strBuf.append("&#");
> strBuf.append((int)chars[i]);
> strBuf.append(";");
> This seems unnecessary to me, as Axis will send all messages in UTF-8 anyway,
> for which no encoding is necessary (and should encoding be configurable, I feel
> this should be escaped elsewhere).
> Is there any reason for this code, I commented it out and it seemed to have no
> adverse effect on our application (apart from reduced network traffic)?
> Tested with 1.0, also looked up in the sources of 1.1-rc2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: axis-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: axis-dev-help@ws.apache.org