You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commons-dev@ws.apache.org by "Grzegorz Grzybek (JIRA)" <ji...@apache.org> on 2007/10/23 09:04:57 UTC
[jira] Reopened: (WSCOMMONS-260) Invalid XML after serialization
[ https://issues.apache.org/jira/browse/WSCOMMONS-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grzegorz Grzybek reopened WSCOMMONS-260:
----------------------------------------
Hello!
Well, it's almost ok, but there is one more specific case - XML documents WITHOUT XML declaration have "UTF-8" encoding assumend. So there should be no:
write(new OutputStreamWriter(out))
in else clause, when this.inputEncoding is null or empty.
I've added simple XML file with utf-8 characters and NO XML declaration to other_encoding directory (not in Apache's SVN of course :) and the test fails (XML produced contains Windows-1250 characters on my Windows/Polish os).
I think, the write methods should look like this:
if (this.inputEncoding== null || "".equals(this.inputEncoding))
this.inputEncoding = "utf-8";
try {
write(new OutputStreamWriter(out,this.inputEncoding),options);
} catch (UnsupportedEncodingException e) {
//log the error and just write it without the encoding
//TO CHECK: maybe use "utf-8"?
write(new OutputStreamWriter(out));
}
but the this.inputEncoding = "utf-8" may not be such a good idea - I don't know wether this won't cause any problems in other parts of ws-commons-XmlSchema.
anyway - it's in accordance with XML Specification - to assume "UTF-8" encoding.
with best regards
Grzegorz Grzybek
> Invalid XML after serialization
> -------------------------------
>
> Key: WSCOMMONS-260
> URL: https://issues.apache.org/jira/browse/WSCOMMONS-260
> Project: WS-Commons
> Issue Type: Bug
> Components: XmlSchema
> Environment: any
> Reporter: Grzegorz Grzybek
> Assignee: Ajith Harshana Ranabahu
> Priority: Critical
>
> org.apache.ws.commons.schema.XmlSchema.write() methods use wrong wersion of OutputStreamWriter constructor. When Schema's XML Document contains characters outside standard 'us-ascii' charset, OutputStreamWriter assumes default platform encoding (e.g. Windows-1250 in Windows/Poland). Thus, when document is UTF-8, it is being serialized in Windows-1250 encoding.
> The immediate result is error during "pretty printing" of the serialized document.
> Please use OutputStreamWriter(OutputStream os, String encoding) constructor!!!
> with best regards
> Grzegorz Grzybek
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: commons-dev-help@ws.apache.org