| Tutorial Using Unicode in Visual Basic 6.0 Updated 25-August-2008 03:48 |
![]() |
|
|
Index |
|
1 |
Although Visual Basic 6.0 stores strings internally as Unicode(UTF-16) it has several limitations:
The purpose of this tutorial is to resolve these issues and provide working VB code solutions. The level of difficulty of these solutions vary but in general require intimate knowledge of ActiveX Controls and Classes. Subclassing and API programming are a must to gain functionality that Vb does not directly support.
The amount of information gathered during development of Unicode aware controls was so overwhelming that it made sense to organize it and this Tutorial proved to be an ideal place to bring eveything together under one roof.
| Tutorial Development Tools | |
| Microsoft® Frontpage® 2003 | ![]() |
| Microsoft® Platform SDK Feb 2007 Edition for Vista |
![]() |
Visual Basic 6.0 -
|
|
Note: Dates in this tutorial are displayed as dd-mmm-yyyy (Example: 11-Mar-2004) to eliminate any ambiguities.
* These issues are resolved in Vb.Net although you
will have to go through a learning curve to get up to speed with the language.
Review these tables to determine the
minimum system requirements:
|
2 |
This flowchart shows basic program flow of:
![]() |
|
3 |
| Character Set | Range | Codepage |
Byte Order Mark |
||||||||||||
| OEM (DOS) | 0..255 | 437 OEM - United States | None | ||||||||||||
| ANSI (Windows) | 0..255 | 1252 ANSI - Latin I etc. See SBCS-DBCS |
None | ||||||||||||
| EBCDIC (Mainframe) | 0..255 | 1047 IBM EBCDIC - Latin 1/Open System | |||||||||||||
| UTF-8 |
|
65001 |
EF BB BF | ||||||||||||
| UTF-16LE (little endian, low byte first, x86 Processor and Microsoft Windows) | 0..65535(FFFF) - 2 bytes | 1200 | FF FE | ||||||||||||
| UTF-16BE (big endian, high byte first, PowerPC Processor and Mac OS) | 0..65535(FFFF) - 2 bytes | 1201 | FE FF | ||||||||||||
| UTF-32LE (little endian, low byte first) | 0..10FFFF - 4 bytes | 12000 | FF FE 00 00 | ||||||||||||
| UTF-32BE (big endian, high byte first) | 0..10FFFF - 4 bytes | 12001 | 00 00 FE FF | ||||||||||||
| DBCS - Double-Byte Character Set | 0..65535(FFFF) - 2 bytes Chars 0-127 are 1 byte |
See DBCS |
Note: You should have a utility on your system called
CharMap.Exe which will allow you to browse and select Unicode characters.
|
4 |
SBCS(Single-Byte) and DBCS(Double-Byte) Character Sets are different character sets from Unicode.
Character codes for "A" in ANSI, Unicode, and DBCS
| ANSI Character "A" | &H41 |
A |
| Unicode character "A" | &H41 &H00 | A |
| DBCS character that represents a Japanese wide-width "A" | &H82 &H60 | |
| Unicode wide-width "A" | &H21 &HFF | A |
CharSet from Platform SDK
wingdi.h

| Font Character Sets | ||
|
Number |
Name | Info |
| 0 | ANSI_CHARSET | West, Occidental(United States, Western Europe) |
| 1 | DEFAULT_CHARSET | |
| 2 | SYMBOL_CHARSET | Standard symbol charset |
| 77 | MAC_CHARSET | Macintosh |
| 128 * | SHIFTJIS_CHARSET | Shift-JIS (Japanese Industry Standard) |
| 129 * | HANGEUL_CHARSET | Korea (Wansung) |
| 129 * | HANGUL_CHARSET | Korea (Wansung) |
| 130 * | JOHAB_CHARSET | Korea (Johab) |
| 134 * | GB2312_CHARSET | Simplified Chinese - Mainland China(PRC) and Singapore |
| 136 * | CHINESEBIG5_CHARSET | Traditional Chinese - Taiwan and Hong Kong |
| 161 | GREEK_CHARSET | Greek |
| 162 | TURKISH_CHARSET | Turkish |
| 163 | VIETNAMESE_CHARSET | Vietnamese |
| 177 | HEBREW_CHARSET | Hebrew |
| 178 | ARABIC_CHARSET | Arabic |
| 186 | BALTIC_CHARSET | Baltic |
| 204 | RUSSIAN_CHARSET | Cyrillic - Russia, Belarus, Ukraine and some other slavic countries. |
| 222 | THAI_CHARSET | Thai |
| 238 | EASTEUROPE_CHARSET | |
| 255 | OEM_CHARSET | |
* DBCS - Double-Byte Character Set
DBCS is actually not the correct terminology for what Windows uses. It is actually MBCS where a character can be 1 or 2 bytes. To illustrate this consider the following code which will take a Unicode string of English and Chinese characters, convert to a byte array of MBCS Chinese, dump the byte array to the immediate window, and finally convert it back to a Unicode string to display in a Unicode aware textbox. The byte array when converted using Chinese(PRC) LCID = 2052 contains single bytes for the english characters and double bytes for the Unicode characters. This proves that it is MBCS and not DBCS:
| Option Explicit Private Sub Form_Load() Dim sUni As String Dim sMBCS As String Dim b() As Byte Dim i As Long sUni = "2006" & ChrW$(&H6B22) & "9" & ChrW$(&H8FCE) & "12" & ChrW$(&H6B22) & " 8:04" UniTextBoxEx1 = sUni b = StrConv(sUni, vbFromUnicode, 2052) sMBCS = StrConv(sUni, vbFromUnicode, 2052) Debug.Print sUni, sMBCS Text1 = sMBCS For i = 0 To UBound(b) Debug.Print b(i) Next sUni = StrConv(b, vbUnicode, 2052) UniTextBoxEx2 = sUni End Sub |
| UniTextBox1, UniTextBox2: 2006欢9迎12欢 8:04 |
| Debug Window: 50 ' 2 48 ' 0 48 ' 0 54 ' 6 187 ' 182 ' 欢 57 ' 9 211 ' 173 ' 迎 49 ' 1 50 ' 2 187 ' 182 ' 欢 32 ' space 56 ' 8 58 ' : 48 ' 0 52 ' 4
|
The following demos shows how to display Chinese on an English-U.S. system without changing your Regional settings
Example of VbTextBox using DBCS Charset 134, CHINESE_GB2312
Set your TextBox to any Font that supports script CHINESE_GB2312 (Listview1.Font.Charset=134). Example "Arial Unicode MS". Make sure you select the CHINESE_GB2312 script from the Font Dialog box.
Add this DBCS string:
TextBox1 = "CHS: ¡¡ÄãÕæµÄÄܹ»ÔÚÎı¾¿òÖÐʹÓñ³¾°Í¼Æ¬£¬"

Warning: If you apply an XP Theme to the TextBox below via an IDE Manifest it will display ANSI instead of DBCS.
Example of Vb ListView
using DBCS Charset 134, CHINESE_GB2312
Set your Listiew to any Font that supports script CHINESE_GB2312 (Listview1.Font.Charset=134). Example "Arial Unicode MS". Make sure you select the CHINESE_GB2312 script from the Font Dialog box.
In report mode add this DBCS string:
ListItems.Add , , "CHS: ¡¡ÄãÕæµÄÄܹ»ÔÚÎı¾¿òÖÐʹÓñ³¾°Í¼Æ¬£¬"

.
See MSDN for more information about DBCS:
Issues Specific to the Double-Byte Character Set (DBCS)
ANSI, DBCS, and Unicode: Definitions
Calling Windows API Functions
DBCS Sort Order and String Comparison
DBCS String Manipulation Functions
DBCS-Enabled KeyPress Event
Designing an International-Aware User Interface
Font, Display and Print Considerations in a DBCS Environment
Identifiers in a DBCS Environment
Processing Files That Use Double-Byte Characters
The global options of the StrConv function are converting uppercase to lowercase, and vice versa. In addition to those options, the function has several DBCS-specific options. For example, you can convert narrow letters to wide letters by specifying vbWide in the second argument of this function. You can convert one character type to another, such as hiragana to katakana in Japanese. StrConv enables you to specify a LocaleID for the string, if different than the system's LocaleID.
You can also use the StrConv function to convert Unicode characters to ANSI/DBCS characters, and vice versa. Usually, a string in Visual Basic consists of Unicode characters. When you need to handle strings in ANSI/DBCS (for example, to calculate the number of bytes in a string before writing the string into a file), you can use this functionality of the StrConv function.
|
5 |
The easiest way to add Unicode test strings to your project is
to make a resource file with a Unicode aware editor and compile it with RC.exe.
That way you can test your controls without having to cut/paste the strings when
you need them.
Use Notepad if you are on NT or later, WordPad or UltraEdit if using Win9x.
Download the complete resource file with source
here.
"Welcome" in several languages
| Resource ID |
"Welcome" UTF-16 Unicode |
Resource ID |
"Welcome" UTF-8 Encoded |
| 101 | "ARA: مـرحبــاً" | 151 | "ARA: Ù…Ù€Ø±ØØ¨Ù€Ù€Ø§Ù‹" |
| 102 | "CHS: 欢迎" | 152 | "CHS: 欢迎" |
| 103 | "CHT: 歡迎" | 153 | "CHT: æ¡è¿Ž" |
| 104 | "ENG: Welcome" | 154 | "ENG: Welcome" |
| 105 | "GEO: სასურველი" | 155 | "GEO: სáƒáƒ¡áƒ£áƒ ველი" |
| 106 | "GRK: Καλώς ήλθατε" | 156 | "GRK: Καλώς ήλθατε" |
| 107 | "HEB: בִּרוּבִים חַבָּאִים" | 157 | "HEB: ברוכי×? הב×?×™×?" |
| 108 | "HIN: रवागत" | 158 | "HIN: रवागत" |
| 109 | "JPN: よろてそ" | 169 | "JPN: よã?†ã?“ã??" |
| 110 | "KOR: 여보세요" | 160 | "KOR: 여보세요" |
| 111 | "PAN: ਜੀ ਆਇਆਂ ਨੂੰ" | 161 | "PAN: ਜੀ ਆਇਆਂ ਨੂੰ" |
| 112 | "PTB: Bem-vindo" | 162 | "PTB: Bem-vindo" |
| 113 | "RUS: Добро пожаловать" | 163 | "RUS: Добро пожаловать" |
| 114 | "TAM: அங்கிகரி" | 164 | "TAM: à®…à®™à¯?கிகரி" |
| 115 | "THA: การต้อนรับ" | 165 | "THA: à¸à¸²à¸£à¸•้à¸à¸™à¸£à¸±à¸š" |
| 116 | "URD: स्वागत" | 166 | "URD: सà¥à¤µà¤¾à¤—त" |
| 117 | "VIE: tính từ" | 167 | "VIE: tÃnh từ" |
"Hello" in several languages
* Needs Code2000 Font to see this
| Language | "Hello" UTF-16 Unicode |
"Hello" UTF-8 Encoded |
| Arabic | السلام عليكم | السلام عليكم |
| Bengali (বাঙ্লা) | ষাগতোম | ষাগতোম |
| * Burmese | မ္ရန္မာ | (မ္ရန္မာ) |
| Cantonese (粵語,廣東話) | 早晨, 你好 | 早晨, ä½ å¥½ |
| * Cherokee (á£áŽ³áŽ©)ᎣᏏᏲ | ᎣᏏᏲ | Ꭳáá² |
| Chinese (中文,普通话,汉语) | 你好 | ä½ å¥½ |
| Czech (česky) | Dobrý den | Dobrý den |
| Danish (Dansk) | Hej, Goddag | Hej, Goddag |
| English | Hello | Hello |
| Esperanto | Saluton | Saluton |
| Estonian | Tere, Tervist | Tere, Tervist |
| Finnish (Suomi) | Hei | Hei |
| French (Français) | Bonjour, Salut | Bonjour, Salut |
| German (Deutsch Nord) | Guten Tag | Guten Tag |
| German (Deutsch Süd) | Grüß Gott | Grüß Gott |
| Georgian (ქართველი) | გამარჯობა | გáƒáƒ›áƒáƒ ჯáƒáƒ‘რ|
| Gujarati | (ગુજરાતિ) | (ગà«àªœàª°àª¾àª¤àª¿) |
| Greek (Ελληνικά) | Γειά σας | Γειά σας |
| Hebrew | שלום | ×©×œ×•× |
| Hindi | नमस्ते, नमस्कार। | नमसà¥à¤¤à¥‡, नमसà¥à¤•ार। |
| Italiano | Ciao, Buon giorno | Ciao, Buon giorno |
| Japanese (日本語) | こんにちは, コンニチハ | ã“ã‚“ã«ã¡ã¯, コï¾ï¾†ï¾ï¾Š |
| Korean (한글) | 안녕하세요, 안녕하십니까 | 안녕하세요, 안녕하ì‹ë‹ˆê¹Œ |
| Maltese | Ċaw, Saħħa | ÄŠaw, Saħħa |
| Nederlands | Vlaams Hallo, Dag | Vlaams Hallo, Dag |
| Norwegian (Norsk) | Hei, God dag | Hei, God dag |
| Punjabi | (ਪੁਂਜਾਬਿ) | (ਪà©à¨‚ਜਾਬਿ) |
| Polish | Dzień dobry, Hej | DzieÅ„ dobry, Hej |
| Russian (Русский) | Здравствуйте! | ЗдравÑтвуйте! |
| Slovak | Dobrý deň | Dobrý deň |
| Spanish (Español) | ¡Hola! | ‎¡Hola!‎ |
| Swedish (Svenska) | Hej, Goddag | Hej, Goddag |
| Thai (ภาษาไทย) | สวัสดีครับ, สวัสดีค่ะ | สวัสดีครับ, สวัสดีค่ะ |
| Tamil (தமிழ்) | வணக்கம் | வணகà¯à®•ம௠|
| Turkish (Türkçe) | Merhaba | Merhaba |
| Vietnamese (Tiếng Việt) | Xin Chào | Xin Chà o |
| Yiddish â€(ײַדישע) | דאָס הײַזעלע | ד×ָס הײַזעלע |
Other methods for creating a string at Vb Runtime:
| Sample Output | Vb String |
| ARA: مـرحب | "ARA: " & ChrW$(&H645) & ChrW$(&H640) & ChrW$(&H631) & ChrW$(&H62D) & ChrW$(&H628) |
| ARM: ԱԲԳԴԵԶԷԸԹ | "ARM: " & ChrW$(&H531) & ChrW$(&H532) & ChrW$(&H533) & ChrW$(&H534) & ChrW$(&H535) & ChrW$(&H536) & ChrW$(&H537) & ChrW$(&H538) & ChrW$(&H539) |
| CHS: 欢迎 | "CHS: " & ChrW$(&H6B22) & ChrW$(&H8FCE) |
| CHT: 歡迎 | "CHT: " & ChrW$(&H6B61) & ChrW$(&H8FCE) |
| ENG: Welcome | "ENG: Welcome" |
| GEO: სასურველი | "GEO: " & ChrW$(&H10E1) & ChrW$(&H10D0) & ChrW$(&H10E1) & ChrW$(&H10E3) & ChrW$(&H10E0) & ChrW$(&H10D5) & ChrW$(&H10D4) & ChrW$(&H10DA) & ChrW$(&H10D8) |
| GRK: Καλώς ήλθατε | "GRK: " & ChrW$(&H39A) & ChrW$(&H3B1) & ChrW$(&H3BB) & ChrW$(&H3CE) & ChrW$(&H3C2) & " " & ChrW$(&H3AE) & ChrW$(&H3BB) & ChrW$(&H3B8) & ChrW$(&H3B1) & ChrW$(&H3C4) & ChrW$(&H3B5) |
| HEB: ברוכים הבאים | "HEB: " & ChrW$(&H5D1) & ChrW$(&H5E8) & ChrW$(&H5D5) & ChrW$(&H5DB) & ChrW$(&H5D9) & ChrW$(&H5DD) & " " & ChrW$(&H5D4) & ChrW$(&H5D1) & ChrW$(&H5D0) & ChrW$(&H5D9) & ChrW$(&H5DD) |
| HIN: रवागत | "HIN: " & ChrW$(&H930) & ChrW$(&H935) & ChrW$(&H93E) & ChrW$(&H917) & ChrW$(&H924) |
| JPN: ようこそ | "JPN: " & ChrW$(&H3088) & ChrW$(&H3046) & ChrW$(&H3053) & ChrW$(&H305D) |
| KOR: 여보세요 | "KOR: " & ChrW$(&HC5EC) & ChrW$(&HBCF4) & ChrW$(&HC138) & ChrW$(&HC694) |
| PAN: ਜੀ ਆਇਆਂ ਨੂੰ | "PAN: " & ChrW$(&HA1C) & ChrW$(&HA40) & " " & ChrW$(&HA06) & ChrW$(&HA07) & ChrW$(&HA06) & ChrW$(&HA02) & " " & ChrW$(&HA28) & ChrW$(&HA42) & ChrW$(&HA70) |
| PTB: Bem-vindo | "PTB: Bem-vindo" |
| RUS: Добро пожаловать | "RUS: " & ChrW$(&H414) & ChrW$(&H43E) & ChrW$(&H431) & ChrW$(&H440) & ChrW$(&H43E) & " " & ChrW$(&H43F) & ChrW$(&H43E) & ChrW$(&H436) & ChrW$(&H430) & ChrW$(&H43B) & ChrW$(&H43E) & ChrW$(&H432) & ChrW$(&H430) & ChrW$(&H442) & ChrW$(&H44C) |
| TAM: அங்கிகரி | "TAM: " & ChrW$(&HB85) & ChrW$(&HB99) & ChrW$(&HBCD) & ChrW$(&HB95) & ChrW$(&HBBF) & ChrW$(&HB95) & ChrW$(&HBB0) & ChrW$(&HBBF) |
| THA: การต้อนรับ | "THA: " & ChrW$(&HE01) & ChrW$(&HE32) & ChrW$(&HE23) & ChrW$(&HE15) & ChrW$(&HE49) & ChrW$(&HE2D) & ChrW$(&HE19) & ChrW$(&HE23) & ChrW$(&HE31) & ChrW$(&HE1A) |
| URD: स्वागत | "URD: " & ChrW$(&H938) & ChrW$(&H94D) & ChrW$(&H935) & ChrW$(&H93E) & ChrW$(&H917) & ChrW$(&H924) |
| VIE: tính từ | "VIE: tính t" & ChrW$(&H1EEB) |
Note: Under the hood StrConv inserts a BOM (FEFF) before the CJK Unified Ideographs.
More stuff to play with:
| Nº | Sample | Font.Name | Font.Charset | String |
| 1 | English | Tahoma | ANSI_CHARSET | "English" |
| 2 | româneşte | " | EASTEUROPE_CHARSET | ChrW$(114) & ChrW$(111) & ChrW$(109) & ChrW$(226) & ChrW$(110) & ChrW$(101) & ChrW$(351) & ChrW$(116) & ChrW$(101) |
| 3 | ภาษาไทย |
" | THAI_CHARSET | ChrW$(3616) & ChrW$(3634) & ChrW$(3625) & ChrW$(3634) & ChrW$(3652) & ChrW$(3607) & ChrW$(3618) |
| 4 | Հայերեն | Arial Unicode MS | ChrW$(1344) & ChrW$(1377) & ChrW$(1397) & ChrW$(1381) & ChrW$(1408) & ChrW$(1381) & ChrW$(1398) | |
| 5 | Tiếng Việt |
" |
VIETNAMESE_CHARSET | ChrW$(84) & ChrW$(105) & ChrW$(234) & ChrW$(769) & ChrW$(110) & ChrW$(103) & ChrW$(32) & ChrW$(86) & ChrW$(105) & ChrW$(234) & ChrW$(803) & ChrW$(116) |
| 6 | עברית |
" |
HEBREW_CHARSET | ChrW$(1506) & ChrW$(1489) & ChrW$(1512) & ChrW$(1497) & ChrW$(1514) |
| 7 | मराठी | Arial Unicode MS | ChrW$(2350) & ChrW$(2352) & ChrW$(2366) & ChrW$(2336) & ChrW$(2368) | |
| 8 | 中文 (台灣) | PMingLiU | CHINESEBIG5_CHARSET | ChrW$(20013) & ChrW$(25991) & " (" & ChrW$(21488) & ChrW$(28771) & ")") |
| 9 | नेपाली | Arial Unicode MS | ChrW$(2344) & ChrW$(2375) & ChrW$(2346) & ChrW$(2366) & ChrW$(2354) & ChrW$(2368) | |
| 10 | Русский |
" |
RUSSIAN_CHARSET | ChrW$(1056) & ChrW$(1091) & ChrW$(1089) & ChrW$(1089) & ChrW$(1082) & ChrW$(1080) & ChrW$(1081) |
| 11 | ირუკსაბ | Arial Unicode MS | StrReverse(ChrW$(4305) & ChrW$(4304) & ChrW$(4321) & ChrW$(4313) & ChrW$(4323) & ChrW$(4320) & ChrW$(4312)) | |
| 12 | 日本語 | Arial Unicode MS | SHIFTJIS_CHARSET | ChrW$(26085) & ChrW$(26412) & ChrW$(-30050) |
| 13 | ଉଡିଯା | Arial Unicode MS | ChrW$(2825) & ChrW$(2849) & ChrW$(2879) & ChrW$(2863) & ChrW$(2878) | |
| 14 | Ελληνικά |
" |
GREEK_CHARSET | ChrW$(917) & ChrW$(955) & ChrW$(955) & ChrW$(951) & ChrW$(957) & ChrW$(953) & ChrW$(954) & ChrW$(940) |
| 15 | हिन्दी |
Arial Unicode MS | ChrW$(2361) & ChrW$(2367) & ChrW$(2344) & ChrW$(2381) & ChrW$(2342) & ChrW$(2368) | |
| 16 | 한국어 | GulimChe | HANGEUL_CHARSET | ChrW$(-10916) & ChrW$(-21139) & ChrW$(-14924) |
| 17 | తెలుగు | Arial Unicode MS | ChrW$(3108) & ChrW$(3142) & ChrW$(3122) & ChrW$(3137) & ChrW$(3095) & ChrW$(3137 | |
| 18 | Čeština |
" |
EASTEUROPE_CHARSET | ChrW$(268) & ChrW$(101) & ChrW$(353) & ChrW$(116) & ChrW$(105) & ChrW$(110) & ChrW$(97) |
| 19 | ಕನ್ನಡ | Arial Unicode MS | ChrW$(3221) & ChrW$(3240) & ChrW$(3277) & ChrW$(3240) & ChrW$(3233) | |
| 20 | 中文(中国) | SimSun | GB2312_CHARSET | ChrW$(20013) & ChrW$(25991) & "(" & ChrW$(20013) & ChrW$(22269) & ")") |
| 21 | ગુજરાતી | Arial Unicode MS | ChrW$(2711) & ChrW$(2753) & ChrW$(2716) & ChrW$(2736) & ChrW$(2750) & ChrW$(2724) & ChrW$(2752) | |
| 22 | Türkçe |
" |
TURKISH_CHARSET | ChrW$(84) & ChrW$(252) & ChrW$(114) & ChrW$(107) & ChrW$(231) & ChrW$(101) |
| 23 | தமிழ் |
Arial Unicode MS | ChrW$(2980) & ChrW$(2990) & ChrW$(3007) & ChrW$(2996) & ChrW$(3021) |
|
6 |
|
OS/ |
Unicode |
API |
Fonts |
Additional Requirements |
| Vb5/6 | Yes. Uses Unicode to store and manipulate strings. | Instrinsic controls, Properties Window(IDE), Clipboard, and PropertyBag are ANSI only. | ||
| NT/2000/XP/Vista | Yes. Uses Unicode to store and manipulate strings. | Uses Unicode: DrawTextW Lib "user32" - TextOutW Lib "gdi32" |
Installed. You may need to enable Far East language support via Control Panel, Regional Options, Languages if it was not done so at install time. | None |
| 98/ME |
No. Uses ANSI or * DBCS to store and manipulate strings. |
Uses ANSI: DrawTextA Lib "user32" - TextOutA Lib "gdi32"or DrawTextW Lib "Unicows" TextOutW Lib "Unicows" |
You need to install at least one Unicode font. Arial MS Unicode used to be a free(23Mb) download from Microsoft. It is installed automatically with Office XP Pro or Frontpage 2002. | Microsoft Layer for Unicode on Win9x Systems (MSLU).
Unicows.DLL (269.7kb download) available free from Microsoft. The current FileVersion is "1.0.4018.0" April 21, 2003. |
| Automation 95/98/ME & NT/2000/XP/Vista | Yes. Uses Unicode to pass the strings back and forth. |
XP now supports a total of 136 locales, which includes the 126 locales supported by Windows 2000 and adds the following:
Other international features New to XP:
Windows XP Service Pack 2 introduces 25 additional locales and another 11 with Service Pack 2 Update:
|
Windows XP Service Pack 2 Locales |
||
|
Bengali (India) |
Quechua (Bolivia) |
Sami, Northern (Sweden) |
|
Bosnian (Latin, Bosnia and Herzegovina) |
Quechua (Ecuador) |
Sami, Skolt (Finland) |
|
Croatian (Latin, Bosnia and Herzegovina) |
Quechua (Peru) |
Sami, Southern (Norway) |
|
isiXhosa (South Africa) Sami |
Sami, Inari (Finland) |
Sami, Southern (Sweden) |
|
isiZulu (South Africa) |
Sami, Lule (Norway) |
Serbian (Cyrillic, Bosnia and Herzegovina) |
|
Malayalam (India) |
Sami, Lule (Sweden) |
Serbian (Latin, Bosnia and Herzegovina) |
|
Maltese (Malta) Sami |
Sami, Northern (Finland) |
Sesotho sa Leboa (South Africa) |
|
Maori (New Zealand) |
Sami, Northern (Norway) |
Setswana (South Africa) |
|
|
|
Welsh (United Kingdom) |
|
Windows XP Service Pack 2 Update Locales |
||
|
Bosnian (Cyrillic, Bosnia and Herzegovina) |
Irish (Ireland) |
Nepali (Nepal) |
|
Filipino (Philippines) |
Luxembourgish (Luxembourg) |
Pashto (Afghanistan) |
|
Frisian (Netherlands) |
Mapudungun (Chile) |
Romansh (Switzerland) |
|
Inuktitut (Latin, Canada) |
Mohawk (Mohawk) |
|
|
Windows Vista New locales |
||
| Alsatian (France) | Hausa (Latin, Nigeria) | Spanish (United States) |
| Amharic (Ethiopia) | Igbo (Nigeria) | Tajik (Cyrillic, Tajikistan) |
| Assamese (India) | Inuktitut (Syllabics, Canada) | Tamazight (Latin, Algeria) |
| Bashkir (Russia) | Khmer (Cambodia) | Tibetan (PRC) |
|
Bengali (Bangladesh) |
K'iche (Guatemala) |
Turkmen (Turkmenistan) |
|
Breton (France) |
Kinyarwanda (Rwanda) |
Uighur (PRC) |
|
Corsican (France) |
Lao (Lao P.D.R.) |
Upper Sorbian (Germany) |
|
Dari (Afghanistan) |
Lower Sorbian (Germany) |
Wolof (Senegal) |
|
English (India) |
Mongolian (Traditional Mongolian, PRC) |
Yakut (Russia) |
|
English (Malaysia) |
Occitan (France) |
Yi (PRC) |
|
English (Singapore) |
Oriya (India) |
Yoruba (Nigeria) |
|
Greenlandic (Greenland) |
Sinhala (Sri Lanka) |
|
These locales are automatically installed when you update Windows XP to SP2. You can select new locales in Regional and Language Options. They are not supported in Windows Server 2003.
|
7 |
Click Icon to your
left to Load Table.
Please Wait...
|
Unicode Only LCIDs |
||
|---|---|---|
| Identifier | Language | Platform |
| 0x042b | Armenian | 2000/XP |
| 0x0465 | Divehi | XP |
| 0x0437 | Georgian | 2000/XP |
| 0x0447 | Gujarati | XP |
| 0x0439 | Hindi | 2000/XP |
| 0x044b | Kannada | XP |
| 0x0457 | Konkani | 2000/XP |
| 0x044e | Marathi | 2000/XP |
| 0x0446 | Punjabi | XP |
| 0x044f | Sanskrit | 2000/XP |
| 0x045a | Syriac | XP |
| 0x0449 | Tamil | 2000/XP |
| 0x044a | Telugu | XP |
|
8 |
Table of Known Code Pages
| CP_ACP | 0 | WesternEuropean_Mac | 10000 | UserDefined | 50000 |
| CP_OEMCP | 1 | Japanese_Mac | 10001 | AutoSelect | 50001 |
| CP_MACCP | 2 | Arabic_Mac | 10004 | Japanese_JIS | 50220 |
| CP_THREAD_ACP | 3 | Greek_Mac | 10006 | Japanese_JIS_Allow1byteKana | 50221 |
| CP_SYMBOL | 42 | Cyrillic_Mac | 10007 | Japanese_JIS_Allow1byteKanaSOSI | 50222 |
| OEM_UnitedStates | 437 | Latin2_Mac | 10029 | Korean_ISO | 50225 |
| Arabic_ASMO708 | 708 | Turkish_Mac | 10081 | Japanese_AutoSelect | 50932 |
|
Arabic_DOS |
720 | Chinese_Traditional_CNS | 20000 | Chinese_Simplified_AutoSelect | 50936 |
| Greek_DOS | 737 | Chinese_Traditional_Eten | 20002 | Korean_AutoSelect | 50949 |
| Baltic_DOS | 775 | WesternEuropean_IA5 | 20105 | Chinese_Traditional_Auto_Select | 50950 |
| WesternEuropean_DOS | 850 | German_IA5 | 20106 | Cyrillic_Auto_Select | 51251 |
| Central_European_DOS | 852 | Swedish_IA5 | 20107 | Greek_AutoSelect | 51253 |
| Icelandic_DOS | 861 | Norwegian_IA5 | 20108 | Arabic_AutoSelect | 51256 |
| Hebrew_DOS | 862 | US_ASCII | 20127 | Japanese_EUC | 51932 |
| Cyrillic_DOS | 866 | Cyrillic_KOI8R | 20866 | Chinese_Simplified_EUC | 51936 |
| Greek_DOS_Modern | 869 | Cyrillic_KOI8U | 21866 | Korean_EUC | 51949 |
| Thai_Windows | 874 | WesternEuropean_ISO | 28591 | Chinese_Simplified_HZ | 52936 |
| IBM_EBCDIC_GreekModern | 875 | Central_European_ISO | 28592 | CP_UTF7 | 65000 * |
| Japanese_ShiftJIS | 932 * | Baltic_ISO | 28594 | CP_UTF8 | 65001 * |
| Chinese_Simplified_GB2312 | 936 * | Cyrillic_ISO | 28595 | ||
| Korean | 949 * | Arabic_ISO | 28596 | ||
| Chinese_Traditional_Big5 | 950 * | Greek_ISO | 28597 | ||
| Unicode | 1200 | Latin3_ISO | 28593 | ||
| Unicode_BigEndian | 1201 | Hebrew_ISO_Visual | 28598 | ||
| Central_European_Windows | 1250 | Turkish_ISO | 28599 | ||
| Cyrillic_Windows | 1251 | Latin9_ISO | 28605 | ||
| WesternEuropean_Windows | 1252 | Europa | 29001 | ||
| Greek_Windows | 1253 | Hebrew_ISO_Logical | 38598 | ||
| Turkish_Windows | 1254 | ||||
| Hebrew_Windows | 1255 | ||||
| Arabic_Windows | 1256 | ||||
| Baltic_Windows | 1257 | ||||
| Vietnamese_Windows | 1258 | ||||
| Korean_Johab | 1361 |
* DBCS - Double-Byte Character
Set
* This is a pseudo codepage. There is no corresponding NLS file.
This code page ID can only be used with WideCharToMultiByte() and MultiByteToWideChar() API calls.
|
9 |
Fonts used on Windows XP-SP1:
| Method | API | Parameter | Font Name | API | Face Name |
| GetStockFont | GetStockObject | SYSTEM_FONT | MS Sans Serif | GetTextFace | System |
| GetStockFont | GetStockObject | DEFAULT_GUI_FONT | MS Sans Serif | GetTextFace | MS Shell Dlg |
| GetSysFontA | SystemParametersInfo | cf_Caption | Arial | ||
| GetSysFontA | SystemParametersInfo | cf_Menu | Tahoma | ||
| GetSysFontA | SystemParametersInfo | cf_Message | Tahoma | ||
| GetSysFontA | SystemParametersInfo | cf_SmallCaption | Tahoma | ||
| GetSysFontA | SystemParametersInfo | cf_Status | Tahoma |
Win2000/XP/Office2000/Office2003 should already have
Arial Unicode MS installed.
Win98 users will need a Unicode font to render Unicode glyphs.
| Font Not redistributable. |
Version | Glyphs | Size | Comments |
| Arial Unicode MS | 1.00 | 51,180 | 22.19 Mb | Installed with Win2000/Office2000 or later. |
| Lucida Sans Unicode | 2.00 | 1,776 | 316.4 kb | |
| Bitstream Cyberbit | beta v2.0 | 29,934 | 13.04 Mb | Complete - Download Font and DOC here. |
| Bitstream Cyberbase | beta v2.0 | 1,249 | 302 kb | No CJK - Download Font and DOC here. |
| Bitstream CyberCJK | beta v2.0 | 28686 | 12.74 Mb | CJK only - Download Font and DOC here. |
| Code2000 | 1.13 | 34,810 | 3.01 Mb | Shareware. US$5.00. Download here. |
| TITUS Cyberbit Basic | 2000; 3.0 | 9,568 | 1.82 Mb | Non-Commercial use only. UNICODE 4.0 compliant. Download here. |
| Doulos SIL | 4.010.2004 | 674 kb |
DoulosSIL4.0.10.zip. Non-Commercial use only. License/DistributionRestrictions |
|
| Ezra SIL Hebrew Unicode |
3.20 Mb |
EzrSIL20.zip Non-Commercial use only. License/DistributionRestrictions |
Sample Font Support for several Languages
Test Platform WinXP-SP1
| Sample | Arial Unicode MS |
Code 2000 |
* Microsoft Tahoma or Arial |
* Microsoft Sans Serif |
Bitstream Cyberbit |
TITUS Cyberbit Basic |
| ARA: مـرحبــاً | ||||||
| CHS: 欢迎 | ||||||
| CHT: 歡迎 | ||||||
| GEO: სასურველი | ||||||
| GRK: Καλώς ήλθατε | ||||||
| HEB: בִּרוּבִים חַבָּאִים | ||||||
| HIN: रवागत | ||||||
| JPN: よろてそ | ||||||
| KOR: 여보세요 | ||||||
| PAN: ਜੀ ਆਇਆਂ ਨੂੰ | ||||||
| RUS: Добро пожаловать | ||||||
| TAM: 쏅얧주ø | ||||||
| THA: การต้อนรับ | ||||||
| URD: स्वागत | ||||||
| VIE: tính từ |
* Tahoma, Arial do not support all these languages but works due to Uniscribe Font Fallback. Apparently MS San Serif does not use or support Font Fallback. Font Fallback is only available on Platforms Win2000 or later.
|
10 |
Microsoft Layer for Unicode Technology(UNICOWS.DLL)

While you can make a separate programs for specific platforms it is often desirable to make one program that will work on all platforms. By using Microsoft Layer for Unicode Technology (Unicows.DLL, 240kb), a single executable can run on both NT-based and Win9x Platforms. In this case you can use DrawTextW or TextOutW Lib "Unicows" for all platforms.
The Unicows.Dll forwards calls to the system API if you are running on NT, 2000, or XP platforms.
MSLU does not support the display of characters that the system cannot display. Therefore do not expect to see Chinese under Win98 English even though you have installed a font that supports Chinese(Arial Unicode MS for example). For more info see Newsgroup microsoft.public.platformsdk.mslayerforunicode and http://trigeminal.com/usenet/usenet035.asp.
Use this conditional compilation directive to test your code with and without Unicows:
Under Project/Properties/Make set conditional compilation arguments to UNICOWS = -1 or UNICOWS = 0.
Note:
You may not need MSLU at all if your program uses only DrawText or TextOut. In this case you can simply wrap the ANSI and Wide versions into a Sub. Do not expect to see Unicode on Win9x platforms even if you are using a Unicode font such as Arial Unicode MS:
| 'Put this in your startup (Initialise,Sub
Main, etc.) Dim m_bIsNt as Boolean ' Are we running NT? Dim lVer As Long lVer = GetVersion() m_bIsNt = ((lVer And &H80000000) = 0) |
|
Public Sub pDrawText(ByVal hdc
As Long, ByVal s As String, tR As RECT, ByVal lFlags As Long) |
| Public Sub pTextOut(ByVal lhDC As
Long, ByVal x As Long, ByVal y As Long, ByVal sText As String) Dim lPtr As Long If (m_bIsNt) Then lPtr = StrPtr(sText) If Not (lPtr = 0) Then TextOutW lhDC, x, y, lPtr, Len(sText) End If Else TextOutA lhDC, x, y, sText, Len(sText) End If End Sub |
| Public Sub pGetTextExtentPoint32(ByVal
hdc As Long, ByVal s As String, lpSize As SIZEAPI) Dim lPtr As Long If (m_bIsNt) Then lPtr = StrPtr(s) If Not (lPtr = 0) Then GetTextExtentPoint32W hdc, lPtr, Len(s), lpSize End If Else GetTextExtentPoint32A hdc, s, Len(s), lpSize End If End Sub |
|
11 |
Uniscribe Architecture

In 1999 Microsoft introduced Uniscribe, a Windows system-level component that could take advantage of OpenType fonts. Microsoft Windows 2000 and applications Internet Explorer 5 and Office 2000 were released with support for Uniscribe built in.
For Windows 2000 and later, supports the processing of complex scripts, that
is, those scripts that need special processing to properly render them. It
includes a subset of the features found in GDI+ in Windows 2000 and Windows XP.
The rules governing the shaping and positioning of glyphs are specified and
catalogued in
The Unicode Standard: Worldwide Character Encoding, Version 2.0,
Addison-Wesley Publishing Company.
http://msdn.microsoft.com/library/en-us/mslu/winprog/other_existing_unicode_support.asp?frame=true
http://www.microsoft.com/msj/1198/multilang/multilang.aspx
A complex script has at least one of the following attributes:
You may wonder how WinXP
displays Unicode correctly even when you haven't selected a Font which supports
all the required characters.
"Font fallback: this mechanism, made available through Uniscribe (see section on
Complex Scripts Support), provides a fallback font (or a default font) when
dealing with complex scripts. If the selected font face does not include any
glyphs for the complex script that is about to be displayed, Uniscribe selects a
default hardcoded font for the given script. For example, if you have Hindi text
and the font is Courier, then Uniscribe will use the Mangal font. This technique
is internal to Uniscribe and developers can not add additional fonts to the list
of fallback fonts."
Note: Set flags to SSA_FALLBACK
Uniscribe is installed with Internet Explorer 5.0 or later, MS Office, Win2000, WinXP. Here are some versions of Uniscribe I found (including one found on Win98SE):
|
usp10.dll FileVersion Note: Not redistributable. |
Size (bytes) |
TimeDateStamp (Internal) |
Comments | ||
| 1.0163.1890.1 | 268,288 | 22-Sep-1998 | 23:04:38 | Microsoft Systems Journal Nov 98 Download code here |
|
| 1.0325.2180.1 | 315.152 | 30-Nov-1999 | 09:34:40 | Found on Win98SE \Windows\System Download from DLL World or Microsoft |
|
| 1.0400.2411.1 | Installed with Internet Explorer 6 | ||||
| 1.0405.2415.1 | (lab06_N.010104-1344) | 325,120 | 06-Jan-2001 | 05:14:26 | MS Office 10 common archives |
| 1.0409.2600.1106 | (xpsp1.020828-1920) | 339,456 | 09-Sep-2002 | 21:05:43 | XP-SP1 \Windows\System32 |
| 1.0420.2600.2180 | (xpsp_sp2_rtm.040803-2158) | 406,528 | 07-Dec-2005 | 14:38 | XP-SP2 \Windows\System32 |
| 1.0453.3665.0 | (private/Lab06_dev(paulnel). 020427-0653) |
397,312 | 06-Aug-2002 | 23:14:38 | |
| 1.0471.4030.0 | (main.030626-1414) | 413,184 | 27-Jun-2003 | 10:24:14 | Microsoft Office 2003 |
Best results in tests run on Win98SE has been with Uniscribe version 1.0405.2415.1. Microsoft Office 2003 version has not been tested yet.
A Vb wrapper for this library can be found in Internationalization with Visual Basic by Michael S. Kaplan. It comes
with a CD containing sample sourcecode. The sample includes a Uniscribe-aware
version of ExtTextOutW. More info
here.
A more complex C++ example can be found at "Supporting Multilanguage Text Layout and Complex Scripts with Windows NT 5.0". Dont be mislead by 'Windows NT 5.0' in the title because this demo also works on Win98. More info here.
| Logical characters: | Display Plain Text and handle caret placement: | Display Formatted Text and handle caret placement: |
|
|
ARA: العربية |
العربية :ARA |
|
|
|
This sample has been update and can be found on the Microsoft® Platform SDK(August 2002 Edition, Windows XP SP1) if you have it installed under C:\Program Files\Microsoft SDK\Samples\winui\globaldev\CSSamp. You may encounter problems compiling this due to missing or outdated files.
| File | Copy From | Copy To |
| Shlwapi.h 12-Jul-2002 60,270 bytes |
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include | Microsoft Visual Studio\VC98\Include |
| ShTypes.h 05-Aug-2002 6,622 bytes |
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include | Microsoft Visual Studio\VC98\Include |
| usp10.h 15-Aug-2002 81,839 bytes |
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include | Microsoft SDK\Samples\winui\globaldev\CSSamp |
To build an application that supports Unicode on all Platforms AND uses Uniscribe you could use something similar to this:
| Public Sub pDrawText(ByVal hdc As
Long, ByVal s As String, tR As RECT, ByVal lFlags As Long) Dim lPtr As Long If (IsNt) Then lPtr = StrPtr(s) If Not (lPtr = 0) Then DrawTextW hdc, lPtr, -1, tR, lFlags End If Else If (IsUnicode(s)) Then If (HasUniscribe) Then DrawTextU hdc, s, tR, lFlags 'Uniscribe Wrapper Else DrawTextM hdc, s, tR, lFlags 'MultiByte Wrapper End If Else DrawTextA hdc, s, -1, tR, lFlags End If End If End Sub |
|
12 |
Provides services for applications on international issues, including
conversion between code pages, font linking, code page "guessing", line
breaking, and more. Installed with Internet Explorer 5.5 or later.
http://msdn.microsoft.com/workshop/misc/mlang/mlang.asp
http://msdn.microsoft.com/workshop/misc/mlang/reference/objects/CMultiLanguage.asp
The only Vb wrapper for this library I can find is here.
|
13 |
Provides a programming interface(control) for formatting text. This can be used in lieu of Fm20.Dll Unicode TextBox. No distribution issues and comes with source code.
| Rich Edit version |
Unicode | New | DLL | XP - SP1 | XP | Me | 2000 | NT | 98 | 95 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1.0 |
|
Riched32.dll | Emulator |
Emulator |
Emulator |
|||||
| 2.0 | Supports Unicode | Riched20.dll | May be installed | |||||||
| 3.0 | Expanded support for complex scripts, partly due to Uniscribe. | Riched20.dll | ||||||||
| 4.1 |
|
Hyphenation, page rotation, and Text Services Framework (TSF) support. | Msftedit.dll |
Links:
http://msdn.microsoft.com/library/psdk/winui/richedit_5a7n.htm
http://msdn.microsoft.com/library/en-us/shellcc/platform/commctls/richedit/richeditcontrols.asp
About Rich Edit Controls