You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4cxx-user@logging.apache.org by LECHNER Martin <Ma...@frequentis.com> on 2005/02/01 11:44:47 UTC

Unicode Logging questions

Thanks to your help I was finaly able to compile the log4cxxd.dll.

I tried it out with logging of some unicode characters (japanese and german)
and have a few questions:

Is there a way to get readable chars instead of the unicode (\u4ECA) number
of the character?
What about uf8 encoding? 
Did I miss some settings somewhere?

Best regards

ML



The result was for text mode:
2005-01-31 16:28:53,440[0x0000071D] INFO  Main null - Configured test
Application: version:  
2005-01-31 16:28:55,002[0x0000071D] INFO  ***SCRAP*** null - Here we go!!! 
2005-01-31 16:28:55,033[0x0000071D] ERROR ***SCRAP*** null - \u4ECAƶ 

and for XML:
<log4j:event logger="Main" timestamp="1107185333440" level="INFO"
thread="0x0000071D">
<log4j:message><![CDATA[Configured test Application: version:
]]></log4j:message>
</log4j:event>
<log4j:event logger="***SCRAP***" timestamp="1107185335002" level="INFO"
thread="0x0000071D">
<log4j:message><![CDATA[Here we go!!!]]></log4j:message>
</log4j:event>
<log4j:event logger="***SCRAP***" timestamp="1107185335033" level="ERROR"
thread="0x0000071D">
<log4j:message><![CDATA[\u4ECAƶ]]></log4j:message>
</log4j:event>




Re: Re[2]: Unicode Logging questions

Posted by Curt Arnold <ca...@apache.org>.
On Feb 1, 2005, at 11:54 PM, Martin Lechner wrote:

> Hello Curt,
>
> Thanks for the info and tips.
> I can live with the \u representation for the moment, when I know that
> there will be readable output in the future.
>
> But I have another small question:
> I am currently replacing in an application std::string with
> std::wstring
> Before it was possible to log in streams, but with wstrings its not
> possible at the moment.
>
> works: LOG4CXX_DEBUG(loggerPtr_, "hello" << someString << 5 );
> works not: LOG4CXX_DEBUG(loggerPtr_, L"hello" << someWtring << 5 );
>
> Is this planned or do I have to use other ways to log things like
> this?
>

With the current CVS HEAD, neither is acceptable  The log4cxx 0.9.7 
macros were forced to create a stream to support this syntax even if 
the argument was just a char* or std::string.

The current CVS's provides a stream wrapper for logging in 
<log4cxx/stream.h>.  The logstream class provides STL stream-like 
semantics on top of a logger.  The implementation supports 
short-circuiting expressions when the level is not enabled.  Due to the 
non-atomic nature of stream operations, logstream is not thread-safe 
(and can't be made so), so do not share it between threads.  The 
expected usage pattern is to have a static LoggerPtr as a class member 
and logstream wrappers to be created on method entry.  In addition, you 
can insert either wchar_t or char strings into logstream, it will 
transcode on the fly.

There is a sample in examples/stream.cpp and unit tests in 
tests/src/streamtestcase.cpp.  Your code fragment would should look 
something like:

#include <log4cxx/stream.h>


class MyClass {
     private static log4cxx::LoggerPtr 
logger(log4cxx::Logger::getLogger("MyClass"));

     public doSomething() {
           log4cxx::logstream logstream(logger, log4cxx::Level::DEBUG);

           logstream << L"hello" << someWstring << 5 << LOG4CXX_ENDMSG;

       }
}

There has been a substantial amount of debate on the topic and still 
profoundly different opinions on the desirable semantics which I do not 
think can be reconciled in one implementation.  I don't expect that the 
semantics of log4cxx::logstream will change, but am open to 
implementation improvements and am willing to consider including other 
implementations, but I think we need to gather more experience and have 
more platforms to test before doing that.  You may want to review the 
list dev archives and/or the Jira entry 
(http://issues.apache.org/jira/browse/LOGCXX-18)


Re[2]: Unicode Logging questions

Posted by Martin Lechner <br...@sbox.tugraz.at>.
Hello Curt,

Thanks for the info and tips.
I can live with the \u representation for the moment, when I know that
there will be readable output in the future.

But I have another small question:
I am currently replacing in an application std::string with
std::wstring
Before it was possible to log in streams, but with wstrings its not
possible at the moment.

works: LOG4CXX_DEBUG(loggerPtr_, "hello" << someString << 5 );
works not: LOG4CXX_DEBUG(loggerPtr_, L"hello" << someWtring << 5 );

Is this planned or do I have to use other ways to log things like
this?

Best regards
ML


CA> That is the intent behind the Unicode rework, but not everything is
CA> fleshed out.  The "\u4ECA" construct is generated by 
CA> Transcoder::encode(const LogString& src, std::string& dst) when a 
CA> character in src cannot be represented in the current code page.


CA> On Feb 1, 2005, at 4:44 AM, LECHNER Martin wrote:


CA> That is the intent behind the Unicode rework, but not everything is
CA> fleshed out.  The "\u4ECA" construct is generated by 
CA> Transcoder::encode(const LogString& src, std::string& dst) when a 
CA> character in src cannot be represented in the current code page.

CA> In the case of the text file, you will get that form unless you specify
CA> an encoding that can represent the character (UTF-16 or UTF-8) using
CA> WriterAppender::setEncoding (or the equivalent configuration entry).
CA> However, support for arbitrary encodings hasn't been completed.  I
CA> expect to implement it using APR-iconv shortly.

CA> I can probably fix the XML faster.  Even though you could encode those
CA> characters as UTF-8 (since all XML processors are required to support
CA> UTF-8), it is probably preferable to represent non-USASCII characters
CA> as character entities (for example, &x4ECA;) since it would not be
CA> uncommon for users to try to open up a log file in a non-XML aware text
CA> editor or use a command line tool like cat.



-- 
Best regards,
 Martin                            mailto:bruce_np@sbox.tugraz.at


Re: Unicode Logging questions

Posted by Curt Arnold <ca...@apache.org>.
On Feb 1, 2005, at 4:44 AM, LECHNER Martin wrote:

> Thanks to your help I was finaly able to compile the log4cxxd.dll.
>
> I tried it out with logging of some unicode characters (japanese and 
> german)
> and have a few questions:
>
> Is there a way to get readable chars instead of the unicode (\u4ECA) 
> number
> of the character?

That is the intent behind the Unicode rework, but not everything is 
fleshed out.  The "\u4ECA" construct is generated by 
Transcoder::encode(const LogString& src, std::string& dst) when a 
character in src cannot be represented in the current code page.

In the case of the text file, you will get that form unless you specify 
an encoding that can represent the character (UTF-16 or UTF-8) using 
WriterAppender::setEncoding (or the equivalent configuration entry).  
However, support for arbitrary encodings hasn't been completed.  I 
expect to implement it using APR-iconv shortly.

I can probably fix the XML faster.  Even though you could encode those 
characters as UTF-8 (since all XML processors are required to support 
UTF-8), it is probably preferable to represent non-USASCII characters 
as character entities (for example, &x4ECA;) since it would not be 
uncommon for users to try to open up a log file in a non-XML aware text 
editor or use a command line tool like cat.