You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4net-dev@logging.apache.org by "Atsushi Suzuki (JIRA)" <ji...@apache.org> on 2009/09/19 01:04:16 UTC
[jira] Created: (LOG4NET-229) Japanese characters get garbled with
log4net.Layout.XmlLayoutSchemaLog4j
Japanese characters get garbled with log4net.Layout.XmlLayoutSchemaLog4j
-------------------------------------------------------------------------
Key: LOG4NET-229
URL: https://issues.apache.org/jira/browse/LOG4NET-229
Project: Log4net
Issue Type: Bug
Components: Appenders
Affects Versions: 1.2.10
Environment: log4net 1.2.10, .net 2.0
Reporter: Atsushi Suzuki
Fix For: v.Next
with XmlLayoutSchemaLog4j ,all (as far as I see) of Japanese characters are replaced with '?'
because log4net.Util.Transform.INVALIDCHARS regular expression is not correct.
this issue may be affect in other languages, as Chinese, Korean or like that.
http://issues.apache.org/jira/browse/LOG4NET-22 says that permitted chars are
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
, but regex for invalid characters are
private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);
so 0x0800 ~ 0xD7FF are mistreated as invalid character.
and 0xD800 ~ 0xDFFF sould also be permitted because these characters are used to express 0x10000 ~ 0x10FFFF in UTF-16
(0xD800 ~ 0xDFFF in unicode are invalid, but in UTF-16 they are ok)
so regex INVALIDCHARS shold be "[^\x09\x0A\x0D\x20-\u00FF\uFFFD]"
(above code is NOT TESTED)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.