You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@logging.apache.org by ts...@apache.org on 2020/08/06 08:36:24 UTC

[logging-log4cxx] 05/05: Improved the FAQ regarding support of Unicode with additional details from GHPR #31.

This is an automated email from the ASF dual-hosted git repository.

tschoening pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/logging-log4cxx.git

commit c65490f37e0f9ae830534770823b620988ea368b
Author: Thorsten Schöning <ts...@am-soft.de>
AuthorDate: Thu Aug 6 10:36:09 2020 +0200

    Improved the FAQ regarding support of Unicode with additional details from GHPR #31.
---
 src/changes/changes.xml  |  3 ++-
 src/site/markdown/faq.md | 49 ++++++++++++++++++++++++++++--------------------
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/src/changes/changes.xml b/src/changes/changes.xml
index 4e7166d..9334c50 100644
--- a/src/changes/changes.xml
+++ b/src/changes/changes.xml
@@ -24,7 +24,7 @@
 
 	<body>
 		<release	version="0.11.0"
-					date="2020-02-09"
+					date="XXXX-XX-XX"
 					description="Maintenance release.">
 			<action issue="LOGCXX-506" type="fix">CachedDateFormat reuses timestamps without updating milliseconds after formatting timestamp with ms == 654</action>
 			<action issue="LOGCXX-503" type="update">Checksums/Signatures don't match for log4cxx binaries</action>
@@ -34,6 +34,7 @@
 			<action issue="LOGCXX-493" type="fix">Wrong usage of milli- vs. micro- and non- vs. milliseconds in some docs.</action>
 			<action issue="LOGCXX-488" type="fix">Space after log level hides messages</action>
 			<action issue="LOGCXX-484" type="fix">Spelling error s/excute/execute</action>
+			<action issue="LOGCXX-483" type="update">Not able to see hebrew values when logging in log4cxx</action>
 			<action issue="LOGCXX-482" type="fix">Build failure with GCC-6</action>
 			<action issue="LOGCXX-464" type="fix">TimeBasedRollingPolicy should append as configured on rollover</action>
 			<action issue="LOGCXX-446" type="fix">make install fails, trying to overwrite header files</action>
diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md
index fadfc9b..190ff4d 100644
--- a/src/site/markdown/faq.md
+++ b/src/site/markdown/faq.md
@@ -45,33 +45,42 @@ caller are using different C RTL's, the program will likely crash at the point.
 DLL" with release builds of log4cxx and "Multithread DLL Debug" with debug builds.
 
 ## <a name="unicode_supported"></a>Does Apache log4cxx support Unicode?
-### Multiple string flavors
 
-Yes. Apache log4cxx exposes API methods in multiple string flavors `const char*`, `std::string`,
-`wchar_t*`, `std::wstring`, `CFStringRef` et al. `const char*` and `std::string` are interpreted
-according to the current locale settings. Applications should call `setlocale(LC_ALL, "")` on
-startup or the C RTL will assume `US-ASCII`. Before being processed internally, all these are
-converted to the `LogString` type which is one of several supported Unicode representations selected
-by the `--with-logchar` option. When using methods that take `LogString` as arguments, the macro
-`LOG4CXX_STR()` can be used to convert ASCII literals to the current `LogString` type. FileAppenders
-support an encoding property which should be explicitly specified to `UTF-8` or `UTF-16` for XML
-files.
-
-### Example of wrong non-English logging
-
-For example, here is some Hebrew text which says "People with disabilities":
+Yes. Apache log4cxx exposes API methods in multiple string flavors supporting differently encoded
+textual content, like `char*`, `std::string`, `wchar_t*`, `std::wstring`, `CFStringRef` et al. All
+provided texts will be converted to the `LogString` type before further processing, which is one of
+several supported Unicode representations selected by the `--with-logchar` option. If methods are
+used that take `LogString` as arguments, the macro `LOG4CXX_STR()` can be used to convert literals
+to the current `LogString` type. FileAppenders support an encoding property as well, which should be
+explicitly specified to `UTF-8` or `UTF-16` for e.g. XML files. The important point is to get the
+chain of input, internal processing and output correct and that might need some additional setup in
+the app using log4cxx:
+
+According to the [libc documentation](https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html),
+all programs start in the `C` locale by default, which is the [same as ANSI_X3.4-1968](https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding)
+and what's commonly known as the encoding `US-ASCII`. That encoding supports a very limited set of
+characters only, so inputting Unicode with that encoding in effect to output characters can't work
+properly. For example, here is some Hebrew text which says "People with disabilities":
 
 	נשים עם מוגבלות
 
-If you are to log this information on a system with a locale of `en_US.UTF-8`, the log message might
-look something like the following, because the given characters can't be converted to `US-ASCII`:
+If you are to log this information, output on some console might be like the following, simply
+because the app uses `US-ASCII` by default and that can't map those characters:
 
 ```
 loggername - ?????????? ???? ??????????????
 ```
 
-Executing `std::setlocale(LC_ALL, "")` either before actually logging the text above or at the app-
-startup will allow the message to be logged appropriately. See issue [LOG4CXX-483][1] for more
-information.
+The important thing to understand is that this is some always applied, backwards compatible default
+behaviour and even the case when the current environment sets a locale like `en_US.UTF-8`. One might
+need to explicitly tell the app at startup to use the locale of the environment and make things
+compatible with Unicode this way. See also [some SO post](https://stackoverflow.com/questions/571359/how-do-i-set-the-proper-initial-locale-for-a-c-program-on-windows)
+on setting the default locale in C++.
+
+```
+std::setlocale( LC_ALL, "" ); /* Set locale for C functions */
+std::locale::global(std::locale("")); /* set locale for C++ functions */
+```
 
-[1]:https://issues.apache.org/jira/browse/LOGCXX-483
+See [LOGCXX-483](https://issues.apache.org/jira/browse/LOGCXX-483) or [GHPR #31](https://github.com/apache/logging-log4cxx/pull/31#issuecomment-668870727)
+for additional details.