You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Sergey Bushik (JIRA)" <ji...@apache.org> on 2012/11/21 17:21:58 UTC

[jira] [Created] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Sergey Bushik created LANG-859:
----------------------------------

             Summary: org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
                 Key: LANG-859
                 URL: https://issues.apache.org/jira/browse/LANG-859
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 2.6
            Reporter: Sergey Bushik


According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502246#comment-13502246 ] 

Gary Gregory edited comment on LANG-859 at 11/21/12 7:52 PM:
-------------------------------------------------------------

In version 3.x, you can do:

{code:java}
    @Test
    public void testEscapeXmlAllCharacters() {
        // http://www.w3.org/TR/xml/#charsets says:
        // Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
        // excluding the surrogate blocks, FFFE, and FFFF. */
        CharSequenceTranslator escapeXml = StringEscapeUtils.ESCAPE_XML
                .with(NumericEntityEscaper.below(9), NumericEntityEscaper.between(0xB, 0xC), NumericEntityEscaper.between(0xE, 0x19),
                        NumericEntityEscaper.between(0xD800, 0xDFFF), NumericEntityEscaper.between(0xFFFE, 0xFFFF), NumericEntityEscaper.above(0x110000));

        assertEquals("&#0;&#1;&#2;&#3;&#4;&#5;&#6;&#7;&#8;", escapeXml.translate("\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008"));
        assertEquals("\t", escapeXml.translate("\t")); // 0x9
        assertEquals("\n", escapeXml.translate("\n")); // 0xA
        assertEquals("&#11;&#12;", escapeXml.translate("\u000B\u000C"));
        assertEquals("\r", escapeXml.translate("\r")); // 0xD
        assertEquals("Hello World!", escapeXml.translate("Hello World!"));
    }
{code}

See testEscapeXmlAllCharacters in https://svn.apache.org/repos/asf/commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/StringEscapeUtilsTest.java
                
      was (Author: garydgregory):
    In version 3.x, you can do:

{code:java}
    @Test
    public void testEscapeXmlAllCharacters() {
        // http://www.w3.org/TR/xml/#charsets says:
        // Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
        // excluding the surrogate blocks, FFFE, and FFFF. */
        CharSequenceTranslator escapeXml = StringEscapeUtils.ESCAPE_XML
                .with(NumericEntityEscaper.below(9), NumericEntityEscaper.between(0xB, 0xC), NumericEntityEscaper.between(0xE, 0x19),
                        NumericEntityEscaper.between(0xD800, 0xDFFF), NumericEntityEscaper.between(0xFFFE, 0xFFFF), NumericEntityEscaper.above(0x110000));

        assertEquals("&#0;&#1;&#2;&#3;&#4;&#5;&#6;&#7;&#8;", escapeXml.translate("\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008"));
        assertEquals("\t", escapeXml.translate("\t")); // 0x9
        assertEquals("\n", escapeXml.translate("\n")); // 0xA
        assertEquals("&#11;&#12;", escapeXml.translate("\u000B\u000C"));
        assertEquals("\r", escapeXml.translate("\r")); // 0xD
        assertEquals("Hello World!", escapeXml.translate("Hello World!"));
    }
{code}
                  
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502104#comment-13502104 ] 

Sergey Bushik edited comment on LANG-859 at 11/21/12 4:32 PM:
--------------------------------------------------------------

Fixed org.apache.commons.lang.Escape.escape() method for XML    

{code}
    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
{code}
                
      was (Author: tazija):
    Fixed {code}org.apache.commons.lang.Escape.escape(){code} method for XML    

{code}
    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
{code}
                  
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Gary Gregory (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502246#comment-13502246 ] 

Gary Gregory commented on LANG-859:
-----------------------------------

In version 3.x, you can do:

{code:java}
    @Test
    public void testEscapeXmlAllCharacters() {
        // http://www.w3.org/TR/xml/#charsets says:
        // Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
        // excluding the surrogate blocks, FFFE, and FFFF. */
        CharSequenceTranslator escapeXml = StringEscapeUtils.ESCAPE_XML
                .with(NumericEntityEscaper.below(9), NumericEntityEscaper.between(0xB, 0xC), NumericEntityEscaper.between(0xE, 0x19),
                        NumericEntityEscaper.between(0xD800, 0xDFFF), NumericEntityEscaper.between(0xFFFE, 0xFFFF), NumericEntityEscaper.above(0x110000));

        assertEquals("&#0;&#1;&#2;&#3;&#4;&#5;&#6;&#7;&#8;", escapeXml.translate("\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008"));
        assertEquals("\t", escapeXml.translate("\t")); // 0x9
        assertEquals("\n", escapeXml.translate("\n")); // 0xA
        assertEquals("&#11;&#12;", escapeXml.translate("\u000B\u000C"));
        assertEquals("\r", escapeXml.translate("\r")); // 0xD
        assertEquals("Hello World!", escapeXml.translate("Hello World!"));
    }
{code}
                
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502104#comment-13502104 ] 

Sergey Bushik commented on LANG-859:
------------------------------------

Fixed org.apache.commons.lang.Escape.escape() method for XML    

    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
                
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Bushik updated LANG-859:
-------------------------------

    Description: 
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

<pre>
public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}
</pre>

  was:
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

    
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> <pre>
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> </pre>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sebb (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated LANG-859:
----------------------

    Description: 
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

{code}
public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}
{code}


  was:
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

    
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502104#comment-13502104 ] 

Sergey Bushik edited comment on LANG-859 at 11/21/12 4:32 PM:
--------------------------------------------------------------

Fixed org.apache.commons.lang.Escape.escape() method for XML    

{code}
    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: notice escaping for invalid characters added
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
{code}
                
      was (Author: tazija):
    Fixed org.apache.commons.lang.Escape.escape() method for XML    

{code}
    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
{code}
                  
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502104#comment-13502104 ] 

Sergey Bushik edited comment on LANG-859 at 11/21/12 4:32 PM:
--------------------------------------------------------------

Fixed {code}org.apache.commons.lang.Escape.escape(){code} method for XML    

{code}
    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
{code}
                
      was (Author: tazija):
    Fixed org.apache.commons.lang.Escape.escape() method for XML    

    protected void escape(Writer writer, String text) throws IOException {
        int len = text.length();
        for (int i = 0; i < len; i++) {
            char c = text.charAt(i);
            String entity = entityName(c);
            if (entity == null) {
                // TODO: add escaping for invalid characters
                if (c > 0x7F || XMLChar.isInvalid(c)) {
                    writer.write("&#");
                    writer.write(Integer.toString(c, 10));
                    writer.write(';');
                } else {
                    writer.write(c);
                }
            } else {
                writer.write('&');
                writer.write(entity);
                writer.write(';');
            }
        }
    }
                  
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (LANG-859) org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification

Posted by "Sergey Bushik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Bushik updated LANG-859:
-------------------------------

    Description: 
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

  was:
According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;

<pre>
public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content of the document
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}
</pre>

    
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LANG-859
>                 URL: https://issues.apache.org/jira/browse/LANG-859
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters that are not allowed in the content of the XML document http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as &#x<hex-code>; or &#<dec-code>;
> public static void main(String[] args) throws Exception {
>     String xmlValidText = "good";
>     // Passes assertion
>     assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>     
>     char xmlInvalidChar = (char) 0x2;
>     String xmlInvalidText = String.valueOf(xmlInvalidChar);
>     // Fails assertion
>     assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
>     
>     System.out.println("Is valid: " + org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
>     String xml =
>             "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>             "<chars>" +
>             "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
>             "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + "</invalid>" +
>             "</chars>";
>     // An invalid XML character (Unicode: 0x2) was found in the element content of the document
>     Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
>     System.out.println(document);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira