You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
83 lines
3.3 KiB
83 lines
3.3 KiB
UNICODE
|
|
-------
|
|
|
|
Log4cplus uses the expression "UNICODE" in at least two not so equal
|
|
meanings:
|
|
|
|
1. the Unicode standard as defined by the Unicode Consortium
|
|
|
|
2. compiler's and/or C++ standard library's support for strings of
|
|
wchar_ts and their manipulation
|
|
|
|
|
|
WCHAR_T SUPPORT
|
|
---------------
|
|
|
|
Log4cplus is aimed to be portable and to have as little 3rd party
|
|
dependencies as possible. To fulfill this goal it has to use
|
|
facilities offered by the operating systems and standard libraries it
|
|
runs on. To offer the best possible level of support of national
|
|
character, it has to support usage of wchar_t and it has to use
|
|
wchar_t support (especially on Windows) provided by operating system
|
|
and standard C and C++ libraries.
|
|
|
|
This approach to portability has some limittations. One of the
|
|
limittations is lacking support for C++ locales in various operating
|
|
systems and standard C++ libraries. Some standard C++ libraries do not
|
|
support other than the "C" and "POSIX" locales. This usually means
|
|
that wchar_t <-> char conversion using codecvt<> facet is
|
|
impossible. On such deficient platforms, log4cplus can use either
|
|
standard C locale support or iconv() (through libiconv or built-in).
|
|
|
|
|
|
UNICODE AND FILE APPENDERS
|
|
--------------------------
|
|
|
|
Another limitation related to Unicode support is then inability to
|
|
write wchar_t messages that contain national characters that do not
|
|
map to any code point in single byte code page to log files using
|
|
FileAppender. This is a problem mainly on Windows. Linux and other
|
|
*NIX systems can avoid it because they do not need to use wchar_t
|
|
interfaces to have Unicode aware applications. They usually (as of
|
|
year 2012) use UTF-8 based locales. With proper C++ locale setup in
|
|
client applications, national characters can come through into log
|
|
files unharmed. But if they choose to use wchar_t strings, they face
|
|
the problem as well.
|
|
|
|
|
|
*NIX
|
|
----
|
|
|
|
To support output of non-ASCII characters in wchar_t message on *NIX
|
|
platforms, it is necessary to use UTF-8 based locale (e.g.,
|
|
en_US.UTF-8) and to set up global locale with std::codecvt facet or
|
|
imbue individual FileAppenders with that facet. The following code can
|
|
be used to get such std::locale instance and to set it into global
|
|
locale:
|
|
|
|
std::locale::global ( // set global locale
|
|
std::locale ( // using std::locale constructed from
|
|
std::locale (), // global locale
|
|
// and codecvt facet from user locale
|
|
new std::codecvt_byname<wchar_t, char, std::mbstate_t>("")));
|
|
|
|
|
|
WINDOWS
|
|
-------
|
|
|
|
Windows do not support UTF-8 based locales. The above approach will
|
|
yield a std::locale instance converting wchar_ts to current process'
|
|
code page. Such locale will not be able to convert Unicode code points
|
|
outside the process' code page. This is true at least with the
|
|
std::codecvt facet implemented in Visual Studio 2010. Instead, with
|
|
Visual Studio 2010 and later, it is possible to use std::codecvt_utf8
|
|
facet:
|
|
|
|
std::locale::global ( // set global locale
|
|
std::locale ( // using std::locale constructed from
|
|
std::locale (), // global locale
|
|
// and codecvt_utf8 facet
|
|
new std::codecvt_utf8<tchar, 0x10FFFF,
|
|
static_cast<std::codecvt_mode>(std::consume_header
|
|
| std::little_endian)>));
|
|
|