You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4net-user@logging.apache.org by "Garg, Mayank" <ma...@hp.com> on 2009/01/20 18:02:04 UTC

Issues with Korean/Chinese locales in XmlLayout...

Hi,
I am using v1.2.10 and having issues with Korean and Chinese locales. I am using XmlLayout and in the XML log file, messages (contents of the element "message") in Korean and Chinese languages appear as "????". I am using the correct encoding "utf-8", however I tried with other possible encodings supporting Unicode but the problem did not get solved.

I debugged the code and found that in Transform class, MaskXmlInvalidCharacters function escapes the valid Korean and Chinese characters as "?" characters because it thinks they are invalid XML unicode characters, its basically the following array defined in Transform class which is creating the problem:

private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

Could anybody suggest a possible workaround for this? If I bypass the masking step, the message appears correctly in the log file. Or, is this a bug which should be solved in a certain manner?

One more issue that I am facing is that we had included the log4net dll long time back in our sources as a reference.
Now, if I try to rebuild the dll from the log4net sources and try replacing it with the older dll, our sources doesn't get built and the following compiler error is thrown:

Error     796       Unknown build error, 'The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

I have tried the following things:
1.Building log4net sources with VS2005 and keeping the conditional compilation sysmbols as NET;NET_2_0 as well as NET;NET_1_0
2.Building log4net sources with VS2008 and keeping the target framework as 3.5
But nothing seems to work.

The old log4net dll seems to have been built on VS2005 with conditional compilation sysmbols as NET;NET_2_0 and our sources are built using VS 2008 using target framework as 3.5.

What is it that I might be doing wrong? Initially I was getting errors related to strong names but I resolved it by generating the .snk file as mentioned on the Apache website and signing the dll.

Please help.

Regards,
Mayank








Re: Issues with Korean/Chinese locales in XmlLayout...

Posted by Curt Arnold <ca...@apache.org>.
There were several issues with the log4j XMLLayout that have been  
addressed in the last few years.  It is very likely that these same  
issues were inherited by log4net.

See the following bugs:

http://issues.apache.org/bugzilla/show_bug.cgi?id=35691
http://issues.apache.org/bugzilla/show_bug.cgi?id=34875
http://issues.apache.org/bugzilla/show_bug.cgi?id=43078

Trying to pre-escape the message to work around the issue with  
XMLLayout is an undesirable kludge.

Part of the issue is that the architecture tries to split XML  
serialization half way with a XML document to character string  
conversion (XMLLayout) and a character string to byte stream  
(character set encoding) when the right way would be to have a XML  
document to byte stream converter.  Since the XMLLayout may need to  
escape characters that can not be represented in the current code  
page.  The only consistent way of making sure that the output of  
XMLLayout is properly rendered is to escape character entities (for  
example: &#x6FE;) but US-ASCII characters.

These changes might be helpful:

svn diff -r311427:HEAD http://svn.apache.org/repos/asf/logging/log4j/trunk/src/main/java/org/apache/log4j/xml/XMLLayout.java
svn diff -r311427:HEAD http://svn.apache.org/repos/asf/logging/log4j/trunk/src/main/java/org/apache/log4j/helpers/Transform.java


RE: Issues with Korean/Chinese locales in XmlLayout...

Posted by Nick Durcholz <ND...@e-farmcredit.com>.
1) I'm not quite sure if I fully understand what is going on here; is the layout class you are using stripping off the leading ampersand?

2) You are exactly right.  If you escape the string before the logger ever gets ahold of it then you will have the xml escape'd characters in your text file too.  Only way around this is to not escape the log strings yourself.

--

At this point, it sounds like you need to correct the problem with XmlLayout and not escape the log strings yourself.  You probably already went down this path when debugging in the log4net source code.

However, you don't need to compile log4net yourself to fix this issue.  You can create your own XmlLayout class and reference it instead of the default log4net XmlLayout class in the config file.  Try this:


 *
In the log4net source code, take src\Layout\XmlLayout.cs and src\Layout\XmlLayoutBase.cs and copy them into a separate solution.
 *
Change the namespace of these classes and correct the underlying character escaping issue.
 *
In the config file for your app, replace the type name in <layout type="log4net.Layout.XmlLayout"> with the full Assembly Qualified Name of your fixed XmlLayout class.  e.g. <layout type="MyAssembly.XmlLayout, MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=...">



When you get this to work, it would save others time and headache in the future if you file a bug report on the log4net jira:

http://logging.apache.org/log4net/issue-tracking.html
________________________________
From: Garg, Mayank [mailto:mayank.garg@hp.com]
Sent: Thursday, January 22, 2009 12:00 PM
To: Log4NET User
Subject: RE: Issues with Korean/Chinese locales in XmlLayout...

Hi Nick,
Thanks for your reply.

Two queries:
1.I am able to do this but the string appears in a CDATA section as is, e.g. for a Korean string, the string appears as the following in the XML file:

<message><![CDATA[&amp;#xC751;&amp;#xC6A9;&amp;#x20; ]] ></message>

The sting passed to the code is: "&#xC751;&#xC6A9;&#x20;"

I saw in the code and there it calculates certain weights and depending on the values it either writes the string as an escaped string or a CDATA section. My actual string is longer and thus because of a large number of "&"s, my weightStringEscapes exceeds weightCData (which is only 12 because there are no CDATA end section characters in the string).

To avoid this, I forced my code to still write it as a escaped string and the result was the same except for the missing CDATA section's  begin and end characters.

I even tried writing decimal escapes i.e. writing &#<decimal value>; instead of &#x<hex value>; but the results are the same.

What is the problem here?

2. We use two type of appenders one text and one XML, won't this solution create a problem with the text file where the message would appear as XML escaped string. What is the workaround to prevent this?

Regards,
Mayank

________________________________
From: Nick Durcholz [mailto:NDURCHO@e-farmcredit.com]
Sent: Tuesday, January 20, 2009 7:44 PM
To: Log4NET User
Subject: RE: Issues with Korean/Chinese locales in XmlLayout...

To output Chinese or Korean characters using XmlLayout you could try escaping log messages before even sending them to the log4net ILog object.  In C#, something like the following is what I'm talking about:

public class Foo {
    private ILog log = LogManager.GetLogger(typeof(Foo));
    private void LogSomething(string message) {
        //replace the chinese yen symbol with its xml character escape sequence
        //see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
        //for more info on escape sequences
        log.Debug(message.Replace("\u00A5", "&#x00A5;"));
    }
}

That is a workaround that might work for you, but it sounds like an issue with XmlLayout class IMO.  XmlLayout is responsible for converting log messages to xml, so it should correctly escape unicode characters for you and not require the caller to have knowledge of the log output format.

For the build issue, it sounds like you are running into a strong naming issue.  The private key used to sign log4net.dll is not distributed with the source download packages.  If you compile an application against log4net.dll that is distributed by Apache and later replace that with a modified log4net.dll that you compiled, then you will get this error (because the strong name keys are not the same).

I would suggest that if you are going to modify the log4net source and compile it yourself that you create a new strong name key and use that to sign your modified version.  You will then need to recompile everything that references log4net in your application (make sure it references the customized dll at compile time).
________________________________
From: Garg, Mayank [mailto:mayank.garg@hp.com]
Sent: Tuesday, January 20, 2009 12:02 PM
To: log4net-user@logging.apache.org
Subject: Issues with Korean/Chinese locales in XmlLayout...
Hi,
I am using v1.2.10 and having issues with Korean and Chinese locales. I am using XmlLayout and in the XML log file, messages (contents of the element "message") in Korean and Chinese languages appear as "????". I am using the correct encoding "utf-8", however I tried with other possible encodings supporting Unicode but the problem did not get solved.

I debugged the code and found that in Transform class, MaskXmlInvalidCharacters function escapes the valid Korean and Chinese characters as "?" characters because it thinks they are invalid XML unicode characters, its basically the following array defined in Transform class which is creating the problem:

private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

Could anybody suggest a possible workaround for this? If I bypass the masking step, the message appears correctly in the log file. Or, is this a bug which should be solved in a certain manner?

One more issue that I am facing is that we had included the log4net dll long time back in our sources as a reference.
Now, if I try to rebuild the dll from the log4net sources and try replacing it with the older dll, our sources doesn't get built and the following compiler error is thrown:

Error     796       Unknown build error, 'The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

I have tried the following things:
1.Building log4net sources with VS2005 and keeping the conditional compilation sysmbols as NET;NET_2_0 as well as NET;NET_1_0
2.Building log4net sources with VS2008 and keeping the target framework as 3.5
But nothing seems to work.

The old log4net dll seems to have been built on VS2005 with conditional compilation sysmbols as NET;NET_2_0 and our sources are built using VS 2008 using target framework as 3.5.

What is it that I might be doing wrong? Initially I was getting errors related to strong names but I resolved it by generating the .snk file as mentioned on the Apache website and signing the dll.

Please help.

Regards,
Mayank








RE: Issues with Korean/Chinese locales in XmlLayout...

Posted by "Garg, Mayank" <ma...@hp.com>.
Hi Nick,
Thanks for your reply.

Two queries:
1.I am able to do this but the string appears in a CDATA section as is, e.g. for a Korean string, the string appears as the following in the XML file:

<message><![CDATA[&amp;#xC751;&amp;#xC6A9;&amp;#x20; ]] ></message>

The sting passed to the code is: "&#xC751;&#xC6A9;&#x20;"

I saw in the code and there it calculates certain weights and depending on the values it either writes the string as an escaped string or a CDATA section. My actual string is longer and thus because of a large number of "&"s, my weightStringEscapes exceeds weightCData (which is only 12 because there are no CDATA end section characters in the string).

To avoid this, I forced my code to still write it as a escaped string and the result was the same except for the missing CDATA section's  begin and end characters.

I even tried writing decimal escapes i.e. writing &#<decimal value>; instead of &#x<hex value>; but the results are the same.

What is the problem here?

2. We use two type of appenders one text and one XML, won't this solution create a problem with the text file where the message would appear as XML escaped string. What is the workaround to prevent this?

Regards,
Mayank

________________________________
From: Nick Durcholz [mailto:NDURCHO@e-farmcredit.com]
Sent: Tuesday, January 20, 2009 7:44 PM
To: Log4NET User
Subject: RE: Issues with Korean/Chinese locales in XmlLayout...

To output Chinese or Korean characters using XmlLayout you could try escaping log messages before even sending them to the log4net ILog object.  In C#, something like the following is what I'm talking about:

public class Foo {
    private ILog log = LogManager.GetLogger(typeof(Foo));
    private void LogSomething(string message) {
        //replace the chinese yen symbol with its xml character escape sequence
        //see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
        //for more info on escape sequences
        log.Debug(message.Replace("\u00A5", "&#x00A5;"));
    }
}

That is a workaround that might work for you, but it sounds like an issue with XmlLayout class IMO.  XmlLayout is responsible for converting log messages to xml, so it should correctly escape unicode characters for you and not require the caller to have knowledge of the log output format.

For the build issue, it sounds like you are running into a strong naming issue.  The private key used to sign log4net.dll is not distributed with the source download packages.  If you compile an application against log4net.dll that is distributed by Apache and later replace that with a modified log4net.dll that you compiled, then you will get this error (because the strong name keys are not the same).

I would suggest that if you are going to modify the log4net source and compile it yourself that you create a new strong name key and use that to sign your modified version.  You will then need to recompile everything that references log4net in your application (make sure it references the customized dll at compile time).
________________________________
From: Garg, Mayank [mailto:mayank.garg@hp.com]
Sent: Tuesday, January 20, 2009 12:02 PM
To: log4net-user@logging.apache.org
Subject: Issues with Korean/Chinese locales in XmlLayout...
Hi,
I am using v1.2.10 and having issues with Korean and Chinese locales. I am using XmlLayout and in the XML log file, messages (contents of the element "message") in Korean and Chinese languages appear as "????". I am using the correct encoding "utf-8", however I tried with other possible encodings supporting Unicode but the problem did not get solved.

I debugged the code and found that in Transform class, MaskXmlInvalidCharacters function escapes the valid Korean and Chinese characters as "?" characters because it thinks they are invalid XML unicode characters, its basically the following array defined in Transform class which is creating the problem:

private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

Could anybody suggest a possible workaround for this? If I bypass the masking step, the message appears correctly in the log file. Or, is this a bug which should be solved in a certain manner?

One more issue that I am facing is that we had included the log4net dll long time back in our sources as a reference.
Now, if I try to rebuild the dll from the log4net sources and try replacing it with the older dll, our sources doesn't get built and the following compiler error is thrown:

Error     796       Unknown build error, 'The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

I have tried the following things:
1.Building log4net sources with VS2005 and keeping the conditional compilation sysmbols as NET;NET_2_0 as well as NET;NET_1_0
2.Building log4net sources with VS2008 and keeping the target framework as 3.5
But nothing seems to work.

The old log4net dll seems to have been built on VS2005 with conditional compilation sysmbols as NET;NET_2_0 and our sources are built using VS 2008 using target framework as 3.5.

What is it that I might be doing wrong? Initially I was getting errors related to strong names but I resolved it by generating the .snk file as mentioned on the Apache website and signing the dll.

Please help.

Regards,
Mayank








RE: Issues with Korean/Chinese locales in XmlLayout...

Posted by Nick Durcholz <ND...@e-farmcredit.com>.
To output Chinese or Korean characters using XmlLayout you could try escaping log messages before even sending them to the log4net ILog object.  In C#, something like the following is what I'm talking about:

public class Foo {
    private ILog log = LogManager.GetLogger(typeof(Foo));
    private void LogSomething(string message) {
        //replace the chinese yen symbol with its xml character escape sequence
        //see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
        //for more info on escape sequences
        log.Debug(message.Replace("\u00A5", "&#x00A5;"));
    }
}

That is a workaround that might work for you, but it sounds like an issue with XmlLayout class IMO.  XmlLayout is responsible for converting log messages to xml, so it should correctly escape unicode characters for you and not require the caller to have knowledge of the log output format.

For the build issue, it sounds like you are running into a strong naming issue.  The private key used to sign log4net.dll is not distributed with the source download packages.  If you compile an application against log4net.dll that is distributed by Apache and later replace that with a modified log4net.dll that you compiled, then you will get this error (because the strong name keys are not the same).

I would suggest that if you are going to modify the log4net source and compile it yourself that you create a new strong name key and use that to sign your modified version.  You will then need to recompile everything that references log4net in your application (make sure it references the customized dll at compile time).
________________________________
From: Garg, Mayank [mailto:mayank.garg@hp.com]
Sent: Tuesday, January 20, 2009 12:02 PM
To: log4net-user@logging.apache.org
Subject: Issues with Korean/Chinese locales in XmlLayout...

Hi,
I am using v1.2.10 and having issues with Korean and Chinese locales. I am using XmlLayout and in the XML log file, messages (contents of the element "message") in Korean and Chinese languages appear as "????". I am using the correct encoding "utf-8", however I tried with other possible encodings supporting Unicode but the problem did not get solved.

I debugged the code and found that in Transform class, MaskXmlInvalidCharacters function escapes the valid Korean and Chinese characters as "?" characters because it thinks they are invalid XML unicode characters, its basically the following array defined in Transform class which is creating the problem:

private static Regex INVALIDCHARS=new Regex(@"[^\x09\x0A\x0D\x20-\xFF\u00FF-\u07FF\uE000-\uFFFD]",RegexOptions.Compiled);

Could anybody suggest a possible workaround for this? If I bypass the masking step, the message appears correctly in the log file. Or, is this a bug which should be solved in a certain manner?

One more issue that I am facing is that we had included the log4net dll long time back in our sources as a reference.
Now, if I try to rebuild the dll from the log4net sources and try replacing it with the older dll, our sources doesn't get built and the following compiler error is thrown:

Error     796       Unknown build error, 'The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

I have tried the following things:
1.Building log4net sources with VS2005 and keeping the conditional compilation sysmbols as NET;NET_2_0 as well as NET;NET_1_0
2.Building log4net sources with VS2008 and keeping the target framework as 3.5
But nothing seems to work.

The old log4net dll seems to have been built on VS2005 with conditional compilation sysmbols as NET;NET_2_0 and our sources are built using VS 2008 using target framework as 3.5.

What is it that I might be doing wrong? Initially I was getting errors related to strong names but I resolved it by generating the .snk file as mentioned on the Apache website and signing the dll.

Please help.

Regards,
Mayank