You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4cxx-user@logging.apache.org by Ken <yo...@gmail.com> on 2006/02/28 07:29:36 UTC

How to write Asian character to file?

Hi,
  Forgive me if it's bothersome........
  Following is my test source code, I use setEncoding() method to set the
encoding,
  but the Asian character in log still can not output correctly, I only can
get
  question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8, UTF-16BE,
UTF-16LE,
  UTF-16 with setEncoding() call, but all same......
  Anyone can tell me the right way to get the Asian character show in log
file?
  I got the SVN source code on Feb. 19, build the static lib by:
  ant -Ddebug=false -Dlib.type=static build


int main()
{
    PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] %l %p -
%m%n");
    RollingFileAppenderPtr rfa = new RollingFileAppender();
    rfa->setName("sizeROLLING");
    rfa->setLayout(layout);
    rfa->setFile("tsizeBased-test.log");

    SizeBasedTriggeringPolicyPtr sbtp = new SizeBasedTriggeringPolicy();
    sbtp->setMaxFileSize(1024 * 1024 * 10);

    FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
    swrp->setMaxIndex(10);
    swrp->setMinIndex(1);
    swrp->setFileNamePattern("tsizeBased-test.log.%i");

    rfa->setRollingPolicy(swrp);
    rfa->setTriggeringPolicy(sbtp);
    rfa->setEncoding("UTF-16");
    //cout << __LINE__ << ": " << rfa->getEncoding() << endl;

 Pool p;
 rfa->activateOptions(p);

 LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
 sizeroll -> setLevel(Level::DEBUG);
    sizeroll -> addAppender(rfa);

 logstream lc_logstream(sizeroll, Level::DEBUG);
 // lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
 lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
 lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
 LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
 LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
 LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));

 exit(1);
}

  Thanks in advance...



--
Ken

Re: How to write Asian character to file?

Posted by Curt Arnold <ca...@apache.org>.
On Mar 4, 2006, at 7:00 AM, Ken wrote:

> your word gave great help to me. I resolve the problem, call
> setEncoding("ISO-8859-1") and use literal with L, in log file can see
> the right result. It's my fault not log4cxx's.
> I am wondering why must call setEncoding("ISO-8859-1")? My LANG is
> already "en_US"?


ISO-8859-1 is incapable of representing Asian characters.  If setting  
ISO-8859-1 makes things work, it is only because the encoding  
expectations were messed up earlier.  It is most likely that:

a) your source file is encoded in UTF-8
b) you are not explicitly setting the encoding for gcc, it is  
assuming the en_US default of ISO-8859-1.

I'd suspect that if you were to use your Asian characters as part of  
a file name or output to the console using std::wcout, the characters  
would not be as you expected.

I think it is very likely that it is a problem on your end with the  
encoding during compilation.  However, if you are not able to resolve  
the problem, please log a JIRA issue (http://issues.apache.org/JIRA)  
and attach the source for a sample app and output to the issue using  
a .tar.gz or .zip.  Do not copy the source code into the message or  
in a email since that hides the encoding of the source file which is  
significant.  Attaching a tarball or zip file should also help  
preserve the encodings of the original files.



> Another question is for RollingFileAppender, in 0.9.7
> can keep many history files, but in 0.9.8 SVN code only 12 is allowed.
> I know rename many files not a good idea when rolling the file, but
> how about subfix 0 for the oldest file and subfix MAX for the newest
> file? currently is there any way to have history file more than 12?
>

The limit of 12 was ported over from log4j which has that limit.  The  
new RollingFileAppender design (introduced in the log4j 1.3 branch  
and ported to log4cxx) allows pluggable naming policies and I have  
considered adding one that uses incrementing suffixes and does not  
rename.

Re: How to write Asian character to file?

Posted by Ken <yo...@gmail.com>.
your word gave great help to me. I resolve the problem, call 
setEncoding("ISO-8859-1") and use literal with L, in log file can see
the right result. It's my fault not log4cxx's.
I am wondering why must call setEncoding("ISO-8859-1")? My LANG is
already "en_US"? Another question is for RollingFileAppender, in 0.9.7
can keep many history files, but in 0.9.8 SVN code only 12 is allowed.
I know rename many files not a good idea when rolling the file, but
how about subfix 0 for the oldest file and subfix MAX for the newest
file? currently is there any way to have history file more than 12?

2006/3/1, Curt Arnold <ca...@apache.org>:
>
> On Feb 28, 2006, at 5:11 AM, Ken wrote:
>
> >   What is right syntax if I want use char array variable with wide
> > character?
> >         char  lc_str[] =  "hello你好hello01";
> >         logstream << lc_str << LOG4CXX_ENDMSG;  // not work
> >   since L only can use with literal.
> >  a fool question, but I just can not figure it out. Hope somebody
> > will tell me.
> >
>
>        const wchar_t  lc_str[] =  L"hello你好hello01";
>
> There is a unit test for encoding support (tests/src/
> encodingtest.cpp) that should be run as part of the Ant build.   Did
> the unit tests pass on your platform?
>
> What happens if you try to output the test string from the unit test:
>
>         //   arbitrary, hopefully meaningless, characters from
>         //     Latin, Arabic, Armenian, Bengali, CJK and Cyrillic
>         const wchar_t greeting[] = { L'A', 0x0605, 0x0530, 0x986,
> 0x4E03, 0x400, 0 };
>
> If that works, I'd suspect that there is a mismatch between source
> file encoding and the encoding expectations of your compiler.  For
> example, your source file might be in UTF-8 and the compiler expects
> ISO-8859-1.  The string in the unit test would not be affected be an
> source file encoding mismatch.
>
> You mentioned that your LANG is "en_US".  Plain en_US on other Linux
> distributions indicates ISO-8859-1 as the default encoding  where
> en_US.UTF-8 would indicate UTF-8 as the default encoding.  ISO-8859-1
> can not represent Asian characters and log4cxx will substitute '?'
> for any character it can not represent in the current encoding.
> Seeing '?' in the output would be the expected (and desirable)
> behavior if the encoding is ISO-8859-1.
>
> What happens if you set the LANG environment variable to en_US.UTF-8
> before running the program?  What happens if you explicitly call
> setEncoding("UTF-8") on the appender?
>
>
>
>
>
>


--
Ken

Re: How to write Asian character to file?

Posted by Curt Arnold <ca...@apache.org>.
On Feb 28, 2006, at 5:11 AM, Ken wrote:

>   What is right syntax if I want use char array variable with wide  
> character?
>         char  lc_str[] =  "hello你好hello01";
>         logstream << lc_str << LOG4CXX_ENDMSG;  // not work
>   since L only can use with literal.
>  a fool question, but I just can not figure it out. Hope somebody  
> will tell me.
>

        const wchar_t  lc_str[] =  L"hello你好hello01";

There is a unit test for encoding support (tests/src/ 
encodingtest.cpp) that should be run as part of the Ant build.   Did  
the unit tests pass on your platform?

What happens if you try to output the test string from the unit test:

         //   arbitrary, hopefully meaningless, characters from
         //     Latin, Arabic, Armenian, Bengali, CJK and Cyrillic
         const wchar_t greeting[] = { L'A', 0x0605, 0x0530, 0x986,  
0x4E03, 0x400, 0 };

If that works, I'd suspect that there is a mismatch between source  
file encoding and the encoding expectations of your compiler.  For  
example, your source file might be in UTF-8 and the compiler expects  
ISO-8859-1.  The string in the unit test would not be affected be an  
source file encoding mismatch.

You mentioned that your LANG is "en_US".  Plain en_US on other Linux  
distributions indicates ISO-8859-1 as the default encoding  where  
en_US.UTF-8 would indicate UTF-8 as the default encoding.  ISO-8859-1  
can not represent Asian characters and log4cxx will substitute '?'  
for any character it can not represent in the current encoding.   
Seeing '?' in the output would be the expected (and desirable)  
behavior if the encoding is ISO-8859-1.

What happens if you set the LANG environment variable to en_US.UTF-8  
before running the program?  What happens if you explicitly call  
setEncoding("UTF-8") on the appender?






Re: How to write Asian character to file?

Posted by Ken <yo...@gmail.com>.
  What is right syntax if I want use char array variable with wide character?
        char  lc_str[] =  "hello你好hello01";
        logstream << lc_str << LOG4CXX_ENDMSG;  // not work
  since L only can use with literal.
 a fool question, but I just can not figure it out. Hope somebody will tell me.

在 06-2-28,Ken<yo...@gmail.com> 写道:
> Thanks you reply so fast. My platform info is:
>  Slackware 10.1.0  gcc (GCC) 3.3.4  Kernel: 2.6.8.1  host system LANG=en_US
>  wide literals in source code is Chinese.
>
>  I added FileAppender, RollingFileAppender and ConsoleAppender to logger,
>  never use setEncoding() call for any appender
>
>  following source code:
>        cout << "hello你好hello01" << endl;
>        lc_logstream << L"hello你好hello02" << LOG4CXX_ENDMSG;
>        lc_logstream << "hello你好hello03"  << LOG4CXX_ENDMSG;
>
>  output from FileAppender and RollingFileAppender are following(there are same)
>  hell?????hello02
>  ????hello03
>
>  output from ConsoleAppender is following
>  hello
>
>  but cout can output the right result on screen.
>  The generated files are plain text file, four question marks taken my
> 2 Chinese literals' place.
>  I thought did I need add some compile options with my test program or
> when build the Log4Cxx lib?
>

Re: How to write Asian character to file?

Posted by Ken <yo...@gmail.com>.
Thanks you reply so fast. My platform info is:
 Slackware 10.1.0  gcc (GCC) 3.3.4  Kernel: 2.6.8.1  host system LANG=en_US
 wide literals in source code is Chinese.

 I added FileAppender, RollingFileAppender and ConsoleAppender to logger,
 never use setEncoding() call for any appender

 following source code:
 	cout << "hello你好hello01" << endl;
	lc_logstream << L"hello你好hello02" << LOG4CXX_ENDMSG;
	lc_logstream << "hello你好hello03"  << LOG4CXX_ENDMSG;

 output from FileAppender and RollingFileAppender are following(there are same)
 hell?????hello02
 ????hello03

 output from ConsoleAppender is following
 hello

 but cout can output the right result on screen.
 The generated files are plain text file, four question marks taken my
2 Chinese literals' place.
 I thought did I need add some compile options with my test program or
when build the Log4Cxx lib?


2006/2/28, Curt Arnold <ca...@houston.rr.com>:
>
> On Feb 28, 2006, at 12:29 AM, Ken wrote:
>
> >
> > Hi,
> >   Forgive me if it's bothersome........
> >   Following is my test source code, I use setEncoding() method to
> > set the encoding,
> >   but the Asian character in log still can not output correctly, I
> > only can get
> >   question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8,
> > UTF-16BE, UTF-16LE,
> >   UTF-16 with setEncoding() call, but all same......
> >   Anyone can tell me the right way to get the Asian character show
> > in log file?
> >   I got the SVN source code on Feb. 19, build the static lib by:
> >   ant -Ddebug=false - Dlib.type=static build
> >
> > int main()
> > {
> >     PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] %
> > l %p - %m%n");
> >     RollingFileAppenderPtr rfa = new RollingFileAppender();
> >     rfa->setName("sizeROLLING");
> >     rfa->setLayout(layout);
> >     rfa->setFile("tsizeBased-test.log");
> >
> >     SizeBasedTriggeringPolicyPtr sbtp = new
> > SizeBasedTriggeringPolicy();
> >     sbtp->setMaxFileSize(1024 * 1024 * 10);
> >
> >     FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
> >     swrp->setMaxIndex(10);
> >     swrp->setMinIndex(1);
> >     swrp->setFileNamePattern("tsizeBased-test.log.%i");
> >
> >     rfa->setRollingPolicy(swrp);
> >     rfa->setTriggeringPolicy(sbtp);
> >     rfa->setEncoding("UTF-16");
> >     //cout << __LINE__ << ": " << rfa->getEncoding() << endl;
> >
> >  Pool p;
> >  rfa->activateOptions(p);
> >
> >  LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
> >  sizeroll -> setLevel(Level::DEBUG);
> >     sizeroll -> addAppender(rfa);
> >
> >  logstream lc_logstream(sizeroll, Level::DEBUG);
> >  // lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
> >  lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
> >  lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
> >  LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
> >  LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
> >  LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));
> >
> >  exit(1);
> > }
> >
> >   Thanks in advance...
> >
> >
> >
>
>
> Can you reproduce the problem with a simpler appender, for example,
> does the problem occur with a FileAppender?
>
> What platform are you running on?  The code used to support character
> encoding is different between platforms and if there is a bug, it may
> only appear for certain platforms.
>
> Have you examined the generated files with a hex editor?  There
> should be obvious differences between a ISO-8859-1 and a UTF-16
> encoded file at the byte level.  Obviously, the ISO-8859-1 file can
> only output placeholder characters since it can not represent asian
> characters.
>
> Have you had success compiling other programs containing asian string
> literals?  Are you sure that your compiler's encoding expectation is
> correct.  Could you try expliciting specifying the source code
> encoding to the compiler (--encoding flag for gcc).
>
> What happened when you used wide literals?
>
> p.s. Using LOG4CXX_STR and LogString should not be used for log
> requests.  They represent the internal string representation in
> log4cxx.  The log request methods use the external string types
> std::wstring and std::string.  LogString may be assignment compatible
> with the external string types, however the encoding expectations may
> be different.


--
Ken

Re: How to write Asian character to file?

Posted by Curt Arnold <ca...@houston.rr.com>.
On Feb 28, 2006, at 12:29 AM, Ken wrote:

>
> Hi,
>   Forgive me if it's bothersome........
>   Following is my test source code, I use setEncoding() method to  
> set the encoding,
>   but the Asian character in log still can not output correctly, I  
> only can get
>   question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8,  
> UTF-16BE, UTF-16LE,
>   UTF-16 with setEncoding() call, but all same......
>   Anyone can tell me the right way to get the Asian character show  
> in log file?
>   I got the SVN source code on Feb. 19, build the static lib by:
>   ant -Ddebug=false - Dlib.type=static build
>
> int main()
> {
>     PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] % 
> l %p - %m%n");
>     RollingFileAppenderPtr rfa = new RollingFileAppender();
>     rfa->setName("sizeROLLING");
>     rfa->setLayout(layout);
>     rfa->setFile("tsizeBased-test.log");
>
>     SizeBasedTriggeringPolicyPtr sbtp = new  
> SizeBasedTriggeringPolicy();
>     sbtp->setMaxFileSize(1024 * 1024 * 10);
>
>     FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
>     swrp->setMaxIndex(10);
>     swrp->setMinIndex(1);
>     swrp->setFileNamePattern("tsizeBased-test.log.%i");
>
>     rfa->setRollingPolicy(swrp);
>     rfa->setTriggeringPolicy(sbtp);
>     rfa->setEncoding("UTF-16");
>     //cout << __LINE__ << ": " << rfa->getEncoding() << endl;
>
>  Pool p;
>  rfa->activateOptions(p);
>
>  LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
>  sizeroll -> setLevel(Level::DEBUG);
>     sizeroll -> addAppender(rfa);
>
>  logstream lc_logstream(sizeroll, Level::DEBUG);
>  // lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
>  lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
>  lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
>  LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
>  LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
>  LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));
>
>  exit(1);
> }
>
>   Thanks in advance...
>
>
>


Can you reproduce the problem with a simpler appender, for example,  
does the problem occur with a FileAppender?

What platform are you running on?  The code used to support character  
encoding is different between platforms and if there is a bug, it may  
only appear for certain platforms.

Have you examined the generated files with a hex editor?  There  
should be obvious differences between a ISO-8859-1 and a UTF-16  
encoded file at the byte level.  Obviously, the ISO-8859-1 file can  
only output placeholder characters since it can not represent asian  
characters.

Have you had success compiling other programs containing asian string  
literals?  Are you sure that your compiler's encoding expectation is  
correct.  Could you try expliciting specifying the source code  
encoding to the compiler (--encoding flag for gcc).

What happened when you used wide literals?

p.s. Using LOG4CXX_STR and LogString should not be used for log  
requests.  They represent the internal string representation in  
log4cxx.  The log request methods use the external string types  
std::wstring and std::string.  LogString may be assignment compatible  
with the external string types, however the encoding expectations may  
be different.