You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org> on 2012/11/05 08:20:13 UTC
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Kristian Rosenvold created MSHARED-258:
------------------------------------------
Summary: PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
Key: MSHARED-258
URL: https://jira.codehaus.org/browse/MSHARED-258
Project: Maven Shared Components
Issue Type: Bug
Components: maven-shared-utils
Affects Versions: maven-shared-utils-0.1
Reporter: Kristian Rosenvold
When porting surefire to m-s-u I came across the case where
unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Jörg Schaible (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312868#comment-312868 ]
Jörg Schaible commented on MSHARED-258:
---------------------------------------
Facts in short: Null character encoding is always illegal in XML. Control characters encodings are illegal in XML 1.0, but valid in XML 1.1 (simply write appropriate XML header).
Long story: Jason used ages ago the code for Xpp3Dom in XStream and took the PrettyPrintWriter also in Plexus. The code evolved in XStream also over the years though ;-)
See special note and linked XML specs in http://xstream.codehaus.org/javadoc/com/thoughtworks/xstream/io/xml/PrettyPrintWriter.html
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ]
Kristian Rosenvold commented on MSHARED-258:
--------------------------------------------
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving � or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ]
Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:00 AM:
---------------------------------------------------------------------
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving \� or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
was (Author: krosenvold):
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving � or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312869#comment-312869 ]
Kristian Rosenvold commented on MSHARED-258:
--------------------------------------------
It would seem like maybe using xml 1.1 and stripping 0 would be an option that'd stay reasonably legal. There is really no use-case for preservin g the 0 as far as I know
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ]
Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:01 AM:
---------------------------------------------------------------------
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving &#0; or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
was (Author: krosenvold):
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving � or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in
xpp3dom attribute incorrect/different from p-u xpp3dom
Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
[ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ]
Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:01 AM:
---------------------------------------------------------------------
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving � or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
was (Author: krosenvold):
OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.
Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:
Encode illegally (which is the current solution, giving \� or just strip it away).
I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
> Key: MSHARED-258
> URL: https://jira.codehaus.org/browse/MSHARED-258
> Project: Maven Shared Components
> Issue Type: Bug
> Components: maven-shared-utils
> Affects Versions: maven-shared-utils-0.1
> Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally.
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters.
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira