You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4net-dev@logging.apache.org by "Nicko Cadell (JIRA)" <ji...@apache.org> on 2005/04/10 20:42:22 UTC

[jira] Created: (LOG4NET-22) XmlLayout allows output of invalid control characters

XmlLayout allows output of invalid control characters 
------------------------------------------------------

         Key: LOG4NET-22
         URL: http://issues.apache.org/jira/browse/LOG4NET-22
     Project: Log4net
        Type: Bug
  Components: Appenders  
    Versions: 1.2.9    
    Reporter: Nicko Cadell


XmlLayout allows output of invalid control characters.

Reported by Mike Blake-Knox with additional comments from Curt Arnold.


The XmlLayout encodes the character 0x1e as &#x1E; using the standard XML numeric character reference.

This character code is in a range which is not allowed to appear in XML 1.0 either as a un-encoded value or as a numeric character reference.

The valid character ranges are defined here in the XML recommendation:
http://www.w3.org/TR/REC-xml/#charsets

They are:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Numeric character references are not able to express characters from outside these ranges.

The System.Xml.XmlTextWriter does not verify if the unicode character is valid in XML, but it does encode it as a numeric character reference if it cannot be expressed in the output encoding.

To complicate matters further XML 1.1 does allow further, so called restricted characters, to be included in the output if they are encoded as numeric character references. These ranges are:

[#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]

See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Commented: (LOG4NET-22) XmlLayout allows output of invalid control characters

Posted by "Nicko Cadell (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LOG4NET-22?page=comments#action_62521 ]
     
Nicko Cadell commented on LOG4NET-22:
-------------------------------------

The System.Xml.XmlTextWriter does not know which version XML is being generated. There is no API to configure it one way or the other. The XmlLayout does not generate a full XML document, only a fragment which must be included in a document.

If the XML output in included in an XML 1.1 document then the numeric character references in the additional ranges allowed by the 1.1 spec will be valid. However this is outside of the scope of log4net to enforce.

The XmlLayout must be told which XML version is being targeted and must default to 1.0 not to 1.1.

For invalid characters such as 0x1e there are 3 possible solutions:

1) Discard the character from the output.

2) Replace the character with a numeric representation e.g. "0x1E".

3) Replace the character with an XML element e.g. <char code="30"/>

Regardless of the output version (1.0 or 1.1) selected one of the above choices will need to be made. XML version 1.1 does not allow a NULL (0x0) character to appear un-encoded or as a numeric character reference, therefore this will need to be represented in some way.

Note that the invalid characters cannot be included in a CDATA block, however there are issues with some parsers that do allow them there when they should not.

I favour option 3 above because information is not lost. In options 1 and 2 information is lost. In 2 the encoding is not reversible. With 3 the application reading the data requires additional smarts to pickup on the encoded values in element, but all the original information is preserved. If the app just asks for the text nodes, ignoring the child elements, then they will get back the same result as from 1.

> XmlLayout allows output of invalid control characters
> -----------------------------------------------------
>
>          Key: LOG4NET-22
>          URL: http://issues.apache.org/jira/browse/LOG4NET-22
>      Project: Log4net
>         Type: Bug
>   Components: Appenders
>     Versions: 1.2.9
>     Reporter: Nicko Cadell

>
> XmlLayout allows output of invalid control characters.
> Reported by Mike Blake-Knox with additional comments from Curt Arnold.
> The XmlLayout encodes the character 0x1e as &#x1E; using the standard XML numeric character reference.
> This character code is in a range which is not allowed to appear in XML 1.0 either as a un-encoded value or as a numeric character reference.
> The valid character ranges are defined here in the XML recommendation:
> http://www.w3.org/TR/REC-xml/#charsets
> They are:
> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> Numeric character references are not able to express characters from outside these ranges.
> The System.Xml.XmlTextWriter does not verify if the unicode character is valid in XML, but it does encode it as a numeric character reference if it cannot be expressed in the output encoding.
> To complicate matters further XML 1.1 does allow further, so called restricted characters, to be included in the output if they are encoded as numeric character references. These ranges are:
> [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
> See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (LOG4NET-22) XmlLayout allows output of invalid control characters

Posted by "Niall Daley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LOG4NET-22?page=all ]
     
Niall Daley resolved LOG4NET-22:
--------------------------------

    Fix Version: 1.2.10
     Resolution: Fixed
      Assign To: Niall Daley

By default characters that can not be specified in XML will now be masked by a ?. This can be changed by setting InvalidCharReplacement to a different string. Alternatively set Base64EncodeMessage or Base64EncodeProperties to true, as appropriate, to Base64 encode the data. This allows all values to be output safely.

> XmlLayout allows output of invalid control characters
> -----------------------------------------------------
>
>          Key: LOG4NET-22
>          URL: http://issues.apache.org/jira/browse/LOG4NET-22
>      Project: Log4net
>         Type: Bug
>   Components: Appenders
>     Versions: 1.2.9
>     Reporter: Nicko Cadell
>     Assignee: Niall Daley
>      Fix For: 1.2.10

>
> XmlLayout allows output of invalid control characters.
> Reported by Mike Blake-Knox with additional comments from Curt Arnold.
> The XmlLayout encodes the character 0x1e as &#x1E; using the standard XML numeric character reference.
> This character code is in a range which is not allowed to appear in XML 1.0 either as a un-encoded value or as a numeric character reference.
> The valid character ranges are defined here in the XML recommendation:
> http://www.w3.org/TR/REC-xml/#charsets
> They are:
> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> Numeric character references are not able to express characters from outside these ranges.
> The System.Xml.XmlTextWriter does not verify if the unicode character is valid in XML, but it does encode it as a numeric character reference if it cannot be expressed in the output encoding.
> To complicate matters further XML 1.1 does allow further, so called restricted characters, to be included in the output if they are encoded as numeric character references. These ranges are:
> [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
> See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira