You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@maven.apache.org by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org> on 2012/11/05 08:20:13 UTC

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Kristian Rosenvold created MSHARED-258:
------------------------------------------

             Summary: PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
                 Key: MSHARED-258
                 URL: https://jira.codehaus.org/browse/MSHARED-258
             Project: Maven Shared Components
          Issue Type: Bug
          Components: maven-shared-utils
    Affects Versions: maven-shared-utils-0.1
            Reporter: Kristian Rosenvold


When porting surefire to m-s-u I came across the case where
unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 

Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 

So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?

I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Jörg Schaible (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312868#comment-312868 ] 

Jörg Schaible commented on MSHARED-258:
---------------------------------------

Facts in short: Null character encoding is always illegal in XML. Control characters encodings are illegal in  XML 1.0, but valid in XML 1.1 (simply write appropriate XML header).

Long story: Jason used ages ago the code for Xpp3Dom in XStream and took the PrettyPrintWriter also in Plexus. The code evolved in XStream also over the years though ;-)

See special note and linked XML specs in http://xstream.codehaus.org/javadoc/com/thoughtworks/xstream/io/xml/PrettyPrintWriter.html
                
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ] 

Kristian Rosenvold commented on MSHARED-258:
--------------------------------------------

OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving &#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ] 

Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:00 AM:
---------------------------------------------------------------------

OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving \&#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                
      was (Author: krosenvold):
    OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving &#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                  
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312869#comment-312869 ] 

Kristian Rosenvold commented on MSHARED-258:
--------------------------------------------

It would seem like maybe using xml 1.1 and stripping 0 would be an option that'd stay reasonably legal. There is really no use-case for preservin g the 0 as far as I know
                
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ] 

Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:01 AM:
---------------------------------------------------------------------

OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving &amp;#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                
      was (Author: krosenvold):
    OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving &#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                  
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] (MSHARED-258) PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom

Posted by "Kristian Rosenvold (JIRA)" <ji...@codehaus.org>.
    [ https://jira.codehaus.org/browse/MSHARED-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=312867#comment-312867 ] 

Kristian Rosenvold edited comment on MSHARED-258 at 11/5/12 2:01 AM:
---------------------------------------------------------------------

OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving &#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                
      was (Author: krosenvold):
    OK, this turned out to be a can of worms, since there is no legal way to represent the 0 character in xml.

Since the xml file tries to stay compatible with ant, this is not really going to change. There are basically only 2 options:

Encode illegally (which is the current solution, giving \&#0; or just strip it away).

I think this issue is a won't fix, I'll let it simmer for a day or two before giving up ;)
                  
> PrettyPrintXmlWriter encoding of \u0000 in xpp3dom attribute incorrect/different from p-u xpp3dom
> -------------------------------------------------------------------------------------------------
>
>                 Key: MSHARED-258
>                 URL: https://jira.codehaus.org/browse/MSHARED-258
>             Project: Maven Shared Components
>          Issue Type: Bug
>          Components: maven-shared-utils
>    Affects Versions: maven-shared-utils-0.1
>            Reporter: Kristian Rosenvold
>
> When porting surefire to m-s-u I came across the case where
> unicode \u0000 gets encoded as &amp;#0; in an xml attribute in the prettyprintxmlwriter. This is probably the reason why the PrettyPrintXmlWriter was forked into surefire originally. 
> Now from SUREFIRE-456 it seems like it's specification-wise illegal to do this encoding, but it does actually preserve the character value of non-printable characters. 
> So the more I type on this issue, the more it seems like the forked PPXW actually does the best-effort "right" thing when it comes to xml encoding "any" string as long as we want to stay human readable....?
> I need some input on this one ;)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira