C++/Unicode: Notes on Strings in C and C++ Programs (Unicode)

Notes on Strings in C and C++ Programs

Be aware that there are considerations when you prepare an application for an international market. Windows NT and Windows 2000 support the Unicode character set, which uses two bytes to represent a character. Windows 98 and Windows 95 use either the one-byte ANSI character set for many Western languages or the double byte character set (DBCS), also known as the multi-byte character set (MBCS), for languages that require multiple bytes to represent a single character.

To make code as portable as possible, you should use the Microsoft-specific generic-text mappings defined in Microsoft Visual C++. Generic-text mappings include definitions for various library functions so that they can be mapped at compile time to either the single-byte, double-byte, or wide-character (Unicode) variant of the function. Thus, _tprintf becomes wprint when the program is compiled with _UNICODE defined. This mapping extends to data types as well, so _TCHAR chVarible would declare chVariable to be a char (single-byte character) under ANSI, and a wchar_t (two-byte character) when compiled with _UNICODE defined. The underscore character (_) indicates that the function, macro, or datatype is not part of the Standard ANSI C/C++ language definition. Microsoft Visual C++ prefixes _t to all generic text macros.

To complicate matters, COM strings, including all ADSI and Active Directory strings, must be Unicode strings. That means that even if you compile the program on Windows 98 using the ANSI character set, you need to specify that the strings passed to and returned from COM functions are Unicode. For string literals in C and C++, you do this by inserting the letter L before the string. For example, L"This is a wide string" would tell the compiler to generate a Unicode string regardless of whether _UNICODE is defined.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s