You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4cxx-user@logging.apache.org by Ken <yo...@gmail.com> on 2006/02/28 07:29:36 UTC
How to write Asian character to file?
Hi,
Forgive me if it's bothersome........
Following is my test source code, I use setEncoding() method to set the
encoding,
but the Asian character in log still can not output correctly, I only can
get
question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8, UTF-16BE,
UTF-16LE,
UTF-16 with setEncoding() call, but all same......
Anyone can tell me the right way to get the Asian character show in log
file?
I got the SVN source code on Feb. 19, build the static lib by:
ant -Ddebug=false -Dlib.type=static build
int main()
{
PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] %l %p -
%m%n");
RollingFileAppenderPtr rfa = new RollingFileAppender();
rfa->setName("sizeROLLING");
rfa->setLayout(layout);
rfa->setFile("tsizeBased-test.log");
SizeBasedTriggeringPolicyPtr sbtp = new SizeBasedTriggeringPolicy();
sbtp->setMaxFileSize(1024 * 1024 * 10);
FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
swrp->setMaxIndex(10);
swrp->setMinIndex(1);
swrp->setFileNamePattern("tsizeBased-test.log.%i");
rfa->setRollingPolicy(swrp);
rfa->setTriggeringPolicy(sbtp);
rfa->setEncoding("UTF-16");
//cout << __LINE__ << ": " << rfa->getEncoding() << endl;
Pool p;
rfa->activateOptions(p);
LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
sizeroll -> setLevel(Level::DEBUG);
sizeroll -> addAppender(rfa);
logstream lc_logstream(sizeroll, Level::DEBUG);
// lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));
exit(1);
}
Thanks in advance...
--
Ken
Re: How to write Asian character to file?
Posted by Curt Arnold <ca...@apache.org>.
On Mar 4, 2006, at 7:00 AM, Ken wrote:
> your word gave great help to me. I resolve the problem, call
> setEncoding("ISO-8859-1") and use literal with L, in log file can see
> the right result. It's my fault not log4cxx's.
> I am wondering why must call setEncoding("ISO-8859-1")? My LANG is
> already "en_US"?
ISO-8859-1 is incapable of representing Asian characters. If setting
ISO-8859-1 makes things work, it is only because the encoding
expectations were messed up earlier. It is most likely that:
a) your source file is encoded in UTF-8
b) you are not explicitly setting the encoding for gcc, it is
assuming the en_US default of ISO-8859-1.
I'd suspect that if you were to use your Asian characters as part of
a file name or output to the console using std::wcout, the characters
would not be as you expected.
I think it is very likely that it is a problem on your end with the
encoding during compilation. However, if you are not able to resolve
the problem, please log a JIRA issue (http://issues.apache.org/JIRA)
and attach the source for a sample app and output to the issue using
a .tar.gz or .zip. Do not copy the source code into the message or
in a email since that hides the encoding of the source file which is
significant. Attaching a tarball or zip file should also help
preserve the encodings of the original files.
> Another question is for RollingFileAppender, in 0.9.7
> can keep many history files, but in 0.9.8 SVN code only 12 is allowed.
> I know rename many files not a good idea when rolling the file, but
> how about subfix 0 for the oldest file and subfix MAX for the newest
> file? currently is there any way to have history file more than 12?
>
The limit of 12 was ported over from log4j which has that limit. The
new RollingFileAppender design (introduced in the log4j 1.3 branch
and ported to log4cxx) allows pluggable naming policies and I have
considered adding one that uses incrementing suffixes and does not
rename.
Re: How to write Asian character to file?
Posted by Ken <yo...@gmail.com>.
your word gave great help to me. I resolve the problem, call
setEncoding("ISO-8859-1") and use literal with L, in log file can see
the right result. It's my fault not log4cxx's.
I am wondering why must call setEncoding("ISO-8859-1")? My LANG is
already "en_US"? Another question is for RollingFileAppender, in 0.9.7
can keep many history files, but in 0.9.8 SVN code only 12 is allowed.
I know rename many files not a good idea when rolling the file, but
how about subfix 0 for the oldest file and subfix MAX for the newest
file? currently is there any way to have history file more than 12?
2006/3/1, Curt Arnold <ca...@apache.org>:
>
> On Feb 28, 2006, at 5:11 AM, Ken wrote:
>
> > What is right syntax if I want use char array variable with wide
> > character?
> > char lc_str[] = "hello你好hello01";
> > logstream << lc_str << LOG4CXX_ENDMSG; // not work
> > since L only can use with literal.
> > a fool question, but I just can not figure it out. Hope somebody
> > will tell me.
> >
>
> const wchar_t lc_str[] = L"hello你好hello01";
>
> There is a unit test for encoding support (tests/src/
> encodingtest.cpp) that should be run as part of the Ant build. Did
> the unit tests pass on your platform?
>
> What happens if you try to output the test string from the unit test:
>
> // arbitrary, hopefully meaningless, characters from
> // Latin, Arabic, Armenian, Bengali, CJK and Cyrillic
> const wchar_t greeting[] = { L'A', 0x0605, 0x0530, 0x986,
> 0x4E03, 0x400, 0 };
>
> If that works, I'd suspect that there is a mismatch between source
> file encoding and the encoding expectations of your compiler. For
> example, your source file might be in UTF-8 and the compiler expects
> ISO-8859-1. The string in the unit test would not be affected be an
> source file encoding mismatch.
>
> You mentioned that your LANG is "en_US". Plain en_US on other Linux
> distributions indicates ISO-8859-1 as the default encoding where
> en_US.UTF-8 would indicate UTF-8 as the default encoding. ISO-8859-1
> can not represent Asian characters and log4cxx will substitute '?'
> for any character it can not represent in the current encoding.
> Seeing '?' in the output would be the expected (and desirable)
> behavior if the encoding is ISO-8859-1.
>
> What happens if you set the LANG environment variable to en_US.UTF-8
> before running the program? What happens if you explicitly call
> setEncoding("UTF-8") on the appender?
>
>
>
>
>
>
--
Ken
Re: How to write Asian character to file?
Posted by Curt Arnold <ca...@apache.org>.
On Feb 28, 2006, at 5:11 AM, Ken wrote:
> What is right syntax if I want use char array variable with wide
> character?
> char lc_str[] = "hello你好hello01";
> logstream << lc_str << LOG4CXX_ENDMSG; // not work
> since L only can use with literal.
> a fool question, but I just can not figure it out. Hope somebody
> will tell me.
>
const wchar_t lc_str[] = L"hello你好hello01";
There is a unit test for encoding support (tests/src/
encodingtest.cpp) that should be run as part of the Ant build. Did
the unit tests pass on your platform?
What happens if you try to output the test string from the unit test:
// arbitrary, hopefully meaningless, characters from
// Latin, Arabic, Armenian, Bengali, CJK and Cyrillic
const wchar_t greeting[] = { L'A', 0x0605, 0x0530, 0x986,
0x4E03, 0x400, 0 };
If that works, I'd suspect that there is a mismatch between source
file encoding and the encoding expectations of your compiler. For
example, your source file might be in UTF-8 and the compiler expects
ISO-8859-1. The string in the unit test would not be affected be an
source file encoding mismatch.
You mentioned that your LANG is "en_US". Plain en_US on other Linux
distributions indicates ISO-8859-1 as the default encoding where
en_US.UTF-8 would indicate UTF-8 as the default encoding. ISO-8859-1
can not represent Asian characters and log4cxx will substitute '?'
for any character it can not represent in the current encoding.
Seeing '?' in the output would be the expected (and desirable)
behavior if the encoding is ISO-8859-1.
What happens if you set the LANG environment variable to en_US.UTF-8
before running the program? What happens if you explicitly call
setEncoding("UTF-8") on the appender?
Re: How to write Asian character to file?
Posted by Ken <yo...@gmail.com>.
What is right syntax if I want use char array variable with wide character?
char lc_str[] = "hello你好hello01";
logstream << lc_str << LOG4CXX_ENDMSG; // not work
since L only can use with literal.
a fool question, but I just can not figure it out. Hope somebody will tell me.
在 06-2-28,Ken<yo...@gmail.com> 写道:
> Thanks you reply so fast. My platform info is:
> Slackware 10.1.0 gcc (GCC) 3.3.4 Kernel: 2.6.8.1 host system LANG=en_US
> wide literals in source code is Chinese.
>
> I added FileAppender, RollingFileAppender and ConsoleAppender to logger,
> never use setEncoding() call for any appender
>
> following source code:
> cout << "hello你好hello01" << endl;
> lc_logstream << L"hello你好hello02" << LOG4CXX_ENDMSG;
> lc_logstream << "hello你好hello03" << LOG4CXX_ENDMSG;
>
> output from FileAppender and RollingFileAppender are following(there are same)
> hell?????hello02
> ????hello03
>
> output from ConsoleAppender is following
> hello
>
> but cout can output the right result on screen.
> The generated files are plain text file, four question marks taken my
> 2 Chinese literals' place.
> I thought did I need add some compile options with my test program or
> when build the Log4Cxx lib?
>
Re: How to write Asian character to file?
Posted by Ken <yo...@gmail.com>.
Thanks you reply so fast. My platform info is:
Slackware 10.1.0 gcc (GCC) 3.3.4 Kernel: 2.6.8.1 host system LANG=en_US
wide literals in source code is Chinese.
I added FileAppender, RollingFileAppender and ConsoleAppender to logger,
never use setEncoding() call for any appender
following source code:
cout << "hello你好hello01" << endl;
lc_logstream << L"hello你好hello02" << LOG4CXX_ENDMSG;
lc_logstream << "hello你好hello03" << LOG4CXX_ENDMSG;
output from FileAppender and RollingFileAppender are following(there are same)
hell?????hello02
????hello03
output from ConsoleAppender is following
hello
but cout can output the right result on screen.
The generated files are plain text file, four question marks taken my
2 Chinese literals' place.
I thought did I need add some compile options with my test program or
when build the Log4Cxx lib?
2006/2/28, Curt Arnold <ca...@houston.rr.com>:
>
> On Feb 28, 2006, at 12:29 AM, Ken wrote:
>
> >
> > Hi,
> > Forgive me if it's bothersome........
> > Following is my test source code, I use setEncoding() method to
> > set the encoding,
> > but the Asian character in log still can not output correctly, I
> > only can get
> > question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8,
> > UTF-16BE, UTF-16LE,
> > UTF-16 with setEncoding() call, but all same......
> > Anyone can tell me the right way to get the Asian character show
> > in log file?
> > I got the SVN source code on Feb. 19, build the static lib by:
> > ant -Ddebug=false - Dlib.type=static build
> >
> > int main()
> > {
> > PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] %
> > l %p - %m%n");
> > RollingFileAppenderPtr rfa = new RollingFileAppender();
> > rfa->setName("sizeROLLING");
> > rfa->setLayout(layout);
> > rfa->setFile("tsizeBased-test.log");
> >
> > SizeBasedTriggeringPolicyPtr sbtp = new
> > SizeBasedTriggeringPolicy();
> > sbtp->setMaxFileSize(1024 * 1024 * 10);
> >
> > FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
> > swrp->setMaxIndex(10);
> > swrp->setMinIndex(1);
> > swrp->setFileNamePattern("tsizeBased-test.log.%i");
> >
> > rfa->setRollingPolicy(swrp);
> > rfa->setTriggeringPolicy(sbtp);
> > rfa->setEncoding("UTF-16");
> > //cout << __LINE__ << ": " << rfa->getEncoding() << endl;
> >
> > Pool p;
> > rfa->activateOptions(p);
> >
> > LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
> > sizeroll -> setLevel(Level::DEBUG);
> > sizeroll -> addAppender(rfa);
> >
> > logstream lc_logstream(sizeroll, Level::DEBUG);
> > // lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
> > lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
> > lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
> > LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
> > LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
> > LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));
> >
> > exit(1);
> > }
> >
> > Thanks in advance...
> >
> >
> >
>
>
> Can you reproduce the problem with a simpler appender, for example,
> does the problem occur with a FileAppender?
>
> What platform are you running on? The code used to support character
> encoding is different between platforms and if there is a bug, it may
> only appear for certain platforms.
>
> Have you examined the generated files with a hex editor? There
> should be obvious differences between a ISO-8859-1 and a UTF-16
> encoded file at the byte level. Obviously, the ISO-8859-1 file can
> only output placeholder characters since it can not represent asian
> characters.
>
> Have you had success compiling other programs containing asian string
> literals? Are you sure that your compiler's encoding expectation is
> correct. Could you try expliciting specifying the source code
> encoding to the compiler (--encoding flag for gcc).
>
> What happened when you used wide literals?
>
> p.s. Using LOG4CXX_STR and LogString should not be used for log
> requests. They represent the internal string representation in
> log4cxx. The log request methods use the external string types
> std::wstring and std::string. LogString may be assignment compatible
> with the external string types, however the encoding expectations may
> be different.
--
Ken
Re: How to write Asian character to file?
Posted by Curt Arnold <ca...@houston.rr.com>.
On Feb 28, 2006, at 12:29 AM, Ken wrote:
>
> Hi,
> Forgive me if it's bothersome........
> Following is my test source code, I use setEncoding() method to
> set the encoding,
> but the Asian character in log still can not output correctly, I
> only can get
> question mark in file, I tried US-ASCII, ISO-8859-1, UTF-8,
> UTF-16BE, UTF-16LE,
> UTF-16 with setEncoding() call, but all same......
> Anyone can tell me the right way to get the Asian character show
> in log file?
> I got the SVN source code on Feb. 19, build the static lib by:
> ant -Ddebug=false - Dlib.type=static build
>
> int main()
> {
> PatternLayoutPtr layout = new PatternLayout("%d{ISO8601} [%t] %
> l %p - %m%n");
> RollingFileAppenderPtr rfa = new RollingFileAppender();
> rfa->setName("sizeROLLING");
> rfa->setLayout(layout);
> rfa->setFile("tsizeBased-test.log");
>
> SizeBasedTriggeringPolicyPtr sbtp = new
> SizeBasedTriggeringPolicy();
> sbtp->setMaxFileSize(1024 * 1024 * 10);
>
> FixedWindowRollingPolicyPtr swrp = new FixedWindowRollingPolicy();
> swrp->setMaxIndex(10);
> swrp->setMinIndex(1);
> swrp->setFileNamePattern("tsizeBased-test.log.%i");
>
> rfa->setRollingPolicy(swrp);
> rfa->setTriggeringPolicy(sbtp);
> rfa->setEncoding("UTF-16");
> //cout << __LINE__ << ": " << rfa->getEncoding() << endl;
>
> Pool p;
> rfa->activateOptions(p);
>
> LoggerPtr sizeroll = Logger::getLogger("sizeLogger");
> sizeroll -> setLevel(Level::DEBUG);
> sizeroll -> addAppender(rfa);
>
> logstream lc_logstream(sizeroll, Level::DEBUG);
> // lc_logstream << L"koko你好test12+" << LOG4CXX_ENDMSG;
> lc_logstream << LOG4CXX_STR("koko你好test34-") << LOG4CXX_ENDMSG;
> lc_logstream << LogString("koko你好test56*") << LOG4CXX_ENDMSG;
> LOG4CXX_DEBUG(sizeroll, "koko你好test78/");
> LOG4CXX_DEBUG(sizeroll, LOG4CXX_STR("koko你好test78/"));
> LOG4CXX_DEBUG(sizeroll, LogString("koko你好test78/"));
>
> exit(1);
> }
>
> Thanks in advance...
>
>
>
Can you reproduce the problem with a simpler appender, for example,
does the problem occur with a FileAppender?
What platform are you running on? The code used to support character
encoding is different between platforms and if there is a bug, it may
only appear for certain platforms.
Have you examined the generated files with a hex editor? There
should be obvious differences between a ISO-8859-1 and a UTF-16
encoded file at the byte level. Obviously, the ISO-8859-1 file can
only output placeholder characters since it can not represent asian
characters.
Have you had success compiling other programs containing asian string
literals? Are you sure that your compiler's encoding expectation is
correct. Could you try expliciting specifying the source code
encoding to the compiler (--encoding flag for gcc).
What happened when you used wide literals?
p.s. Using LOG4CXX_STR and LogString should not be used for log
requests. They represent the internal string representation in
log4cxx. The log request methods use the external string types
std::wstring and std::string. LogString may be assignment compatible
with the external string types, however the encoding expectations may
be different.