You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4net-dev@logging.apache.org by "Atsushi Suzuki (JIRA)" <ji...@apache.org> on 2009/09/19 01:04:16 UTC

[jira] Created: (LOG4NET-229) Japanese characters get garbled with log4net.Layout.XmlLayoutSchemaLog4j

Japanese characters get garbled with log4net.Layout.XmlLayoutSchemaLog4j 
-------------------------------------------------------------------------

                 Key: LOG4NET-229
                 URL: https://issues.apache.org/jira/browse/LOG4NET-229
             Project: Log4net
          Issue Type: Bug
          Components: Appenders
    Affects Versions: 1.2.10
         Environment: log4net 1.2.10, .net 2.0
            Reporter: Atsushi Suzuki
             Fix For: v.Next


with XmlLayoutSchemaLog4j ,all (as far as I see) of Japanese characters are replaced with '?'
because log4net.Util.Transform.INVALIDCHARS regular expression is not correct.
this issue may be affect in other languages, as Chinese, Korean or like that.



http://issues.apache.org/jira/browse/LOG4NET-22 says that permitted chars are

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

, but regex for invalid characters are

private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

so 0x0800 ~ 0xD7FF are mistreated as invalid character.

and 0xD800 ~ 0xDFFF sould also be permitted because these characters are used to express 0x10000 ~ 0x10FFFF in UTF-16
(0xD800 ~ 0xDFFF in unicode are invalid, but in UTF-16 they are ok)

so regex INVALIDCHARS shold be "[^\x09\x0A\x0D\x20-\u00FF\uFFFD]"
(above code is NOT TESTED)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.