Unicode Encoded

Tutorial
Using Unicode in
Visual Basic 6.0
Updated 25-August-2008 03:48

 

  • Copyright © 2003 - 2008 Unisuite/CyberActiveX
    Unicode and the Unicode Logo are trademarks of Unicode, Inc.
    Microsoft and Visual Basic are trademarks of Microsoft Corporation

  • This is a work in progress so expect to see updates often.
  • The latest version can always be found at www.unisuite.com
  • Download a self-contained Compiled HTML(CHM) for off-line reference at http://www.unisuite.com/download/UnicodeTutorial.chm
  • Send comments, suggestions, corrections, or contribute Unicode Vb source code to:
    webmaster@unisuite.com

 

 

Index

 

01 Introduction 16 Custom Control Example 31 Input Methods 46 Win98 61 Where's the Beef (Unicode) New
02 Flowchart 17 Unicode Aware Controls 32 Resource Files 47 MapString 62 DoubleUnicode New
03 Character Sets 18 Convert Utf8 - Utf16 33 Satellite DLLs 48 Byte Array 63 UniMsgBox New
04 SBCS-DBCS 19 Export UTF-8 Example 34 XP Themes 49 Vb Get/Put (UDTs) 64 Final
05 Test Strings(Unicode) 20 Export To File 35 Menus 50 IStream    
06 Platforms 21 ClipBoard 36 Database Updated 51 Calendars    
07 LocaleLCID 22 Unicode Blocks 37 Grid Controls 52 Sorting    
08 Codepage 23 DrawText Align Flags 38 Misc Source Code 53 Subclassing    
09 Fonts 24 RTL(RightToLeft, Mirroring) Updated 39 IsUnicode 54 FileNames Drop/Paste    
10 Uniscribe 25 Hints 40 IdeographicDescChar 55 Printing New    
11 Mlang 26 Why use a Type Library? 41 SurrogatePairs 56 References    
12 RichEdit 27 Chinese GB18030 Support 42 DateTime 57 Links    
13 MSLU 28 PropertyBag 43 FileIO 58 Troubleshooting    
14 GDI+ 29 PropertyBag-UTF8 44 VbAccelerator
ListViewTreeview
59 Vb6 on non-English Machines New    
15 User Controls
Owner/Custom Draw
30 Property Pages 45 Registry 60 FarEast on English Machine
NJSTAR Communicator)
New
   

 

 

1

Introduction

Although Visual Basic 6.0 stores strings internally as Unicode(UTF-16) it has several limitations:

  1. Ships with ANSI only controls (Label, Textbox, etc.).
  2. Properties Window in IDE is ANSI only. Unicode strings are displayed as '????'
  3. PropertyBag automatically converts Unicode strings to ANSI.
  4. Clipboard functions are ANSI only.
  5. Menus are ANSI only.

The purpose of this tutorial is to resolve these issues and provide working VB code solutions. The level of difficulty of these solutions vary but in general require intimate knowledge of ActiveX Controls and Classes. Subclassing and API programming are a must to gain functionality that Vb does not directly support.

The amount of information gathered during development of Unicode aware controls was so overwhelming that it made sense to organize it and this Tutorial proved to be an ideal place to bring eveything together under one roof.

Tutorial Development Tools
Microsoft® Frontpage® 2003  
Microsoft® Platform SDK
Feb 2007 Edition for Vista
Visual Basic 6.0 - Service Pack 6


 

Note: Dates in this tutorial are displayed as dd-mmm-yyyy (Example: 11-Mar-2004) to eliminate any ambiguities.

*  These issues are resolved in Vb.Net although you will have to go through a learning curve to get up to speed with the language.
    Review these tables to determine the minimum system requirements:

  1. System Requirements Visual Basic .NET 2003 Standard.
  2. System Requirements Visual Studio .NET 2003.

 

2

Flowchart

This flowchart shows basic program flow of:

  1. Send ANSI/DBCS text to ANSI Controls or ANSI API.
  2. Send Unicode text to Unicode Controls
  3. Send Unicode text to ANSI/Unicode API using MSLU.
  4. Convert UTF8 to Unicode.
  5. Convert ANSI/DBCS to Unicode.
  6. Get Charset/Codepage from LCID.

 

 

 

3

Character Sets

 

Character Set Range Codepage Byte
O
rder
M
ark
OEM (DOS) 0..255 437 OEM - United States None
ANSI (Windows) 0..255 1252 ANSI - Latin I
etc. See SBCS-DBCS
None
EBCDIC (Mainframe) 0..255 1047 IBM EBCDIC - Latin 1/Open System  
UTF-8
Unicode Hex
 From-To
Unicode Dec
From-To
Output
Bytes
0..7F 0..127 1
80..7FF 128..2047 2
800..FFFF 2048..65535 3

65001

EF BB BF
UTF-16LE (little endian, low byte first, x86 Processor and Microsoft Windows) 0..65535(FFFF) - 2 bytes 1200 FF FE
UTF-16BE (big endian, high byte first, PowerPC Processor and Mac OS) 0..65535(FFFF) - 2 bytes 1201 FE FF
UTF-32LE (little endian, low byte first) 0..10FFFF - 4 bytes 12000 FF FE 00 00
UTF-32BE (big endian, high byte first) 0..10FFFF - 4 bytes 12001 00 00 FE FF
DBCS - Double-Byte Character Set 0..65535(FFFF) - 2 bytes
Chars 0-127 are 1 byte
See DBCS  

Note: You should have a utility on your system called CharMap.Exe which will allow you to browse and select Unicode characters.
 

4

SBCS - DBCS

 

SBCS(Single-Byte) and DBCS(Double-Byte) Character Sets are different character sets from Unicode.
 

 

Character codes for "A" in ANSI, Unicode, and DBCS

ANSI Character "A" &H41

A

Unicode character "A" &H41 &H00 A
DBCS character that represents a Japanese wide-width "A" &H82 &H60  
Unicode wide-width "A" &H21 &HFF

CharSet from Platform SDK wingdi.h

Font Character Sets

Number

Name Info
0 ANSI_CHARSET West, Occidental(United States, Western Europe)
1 DEFAULT_CHARSET  
2 SYMBOL_CHARSET Standard symbol charset
77 MAC_CHARSET Macintosh
128 * SHIFTJIS_CHARSET Shift-JIS (Japanese Industry Standard)
129 * HANGEUL_CHARSET Korea (Wansung)
129 * HANGUL_CHARSET Korea (Wansung)
130 *  JOHAB_CHARSET Korea (Johab)
134 * GB2312_CHARSET Simplified Chinese - Mainland China(PRC) and Singapore
136 * CHINESEBIG5_CHARSET Traditional Chinese - Taiwan and Hong Kong
161 GREEK_CHARSET Greek
162 TURKISH_CHARSET Turkish
163 VIETNAMESE_CHARSET Vietnamese
177 HEBREW_CHARSET Hebrew
178 ARABIC_CHARSET Arabic
186 BALTIC_CHARSET Baltic
204 RUSSIAN_CHARSET Cyrillic - Russia, Belarus, Ukraine and some other slavic countries.
222 THAI_CHARSET Thai
238 EASTEUROPE_CHARSET  
255 OEM_CHARSET  

* DBCS - Double-Byte Character Set

DBCS is actually not the correct terminology for what Windows uses. It is actually MBCS where a character can be 1 or 2 bytes. To illustrate this consider the following code which will take a Unicode string of English and Chinese characters, convert to a byte array of MBCS Chinese, dump the byte array to the immediate window, and finally convert it back to a Unicode string to display in a Unicode aware textbox. The byte array when converted using Chinese(PRC) LCID = 2052 contains single bytes for the english characters and double bytes for the Unicode characters. This proves that it is MBCS and not DBCS:

Option Explicit

Private Sub Form_Load()
   Dim sUni As String
   Dim sMBCS As String
   Dim b() As Byte
   Dim i As Long

   sUni = "2006" & ChrW$(&H6B22) & "9" & ChrW$(&H8FCE) & "12" & ChrW$(&H6B22) & " 8:04"
   UniTextBoxEx1 = sUni
   b = StrConv(sUni, vbFromUnicode, 2052)
   sMBCS = StrConv(sUni, vbFromUnicode, 2052)
   Debug.Print sUni, sMBCS
   Text1 = sMBCS
   For i = 0 To UBound(b)
      Debug.Print b(i)
   Next
   sUni = StrConv(b, vbUnicode, 2052)
   UniTextBoxEx2 = sUni
End Sub
UniTextBox1, UniTextBox2:
2006欢9迎12欢 8:04
Debug Window:
50 ' 2
48 ' 0
48 ' 0
54 ' 6
187 '
182 ' 欢
57 ' 9
211 '
173 ' 迎
49 ' 1
50 ' 2
187 '
182 ' 欢
32 ' space
56 ' 8
58 ' :
48 ' 0
52 ' 4

 


 

The following demos shows how to display Chinese on an English-U.S. system without changing your Regional settings 

Example of VbTextBox using DBCS Charset 134, CHINESE_GB2312

 

Example of Vb ListView using DBCS Charset 134, CHINESE_GB2312
 

.

See MSDN for more information about DBCS:
Issues Specific to the Double-Byte Character Set (DBCS)
ANSI, DBCS, and Unicode: Definitions
Calling Windows API Functions
DBCS Sort Order and String Comparison
DBCS String Manipulation Functions
DBCS-Enabled KeyPress Event
Designing an International-Aware User Interface
Font, Display and Print Considerations in a DBCS Environment
Identifiers in a DBCS Environment
Processing Files That Use Double-Byte Characters
 

DBCS String Conversion

StrConv Function

The global options of the StrConv function are converting uppercase to lowercase, and vice versa. In addition to those options, the function has several DBCS-specific options. For example, you can convert narrow letters to wide letters by specifying vbWide in the second argument of this function. You can convert one character type to another, such as hiragana to katakana in Japanese. StrConv enables you to specify a LocaleID for the string, if different than the system's LocaleID.

You can also use the StrConv function to convert Unicode characters to ANSI/DBCS characters, and vice versa. Usually, a string in Visual Basic consists of Unicode characters. When you need to handle strings in ANSI/DBCS (for example, to calculate the number of bytes in a string before writing the string into a file), you can use this functionality of the StrConv function.

 

5

Test Strings(Unicode)

The easiest way to add Unicode test strings to your project is to make a resource file with a Unicode aware editor and compile it with RC.exe. That way you can test your controls without having to cut/paste the strings when you need them.
Use Notepad if you are on NT or later, WordPad or UltraEdit if using Win9x.
Download the complete resource file with source here.

"Welcome" in several languages

Resource
ID
"Welcome"
UTF-16 Unicode
Resource
ID
"Welcome"
UTF-8 Encoded
101 "ARA: مـرحبــاً" 151 "ARA: مـرحبــاً"
102 "CHS: 欢迎" 152 "CHS: 欢迎"
103 "CHT: 歡迎" 153 "CHT: 歡迎"
104 "ENG: Welcome" 154 "ENG: Welcome"
105 "GEO: სასურველი" 155 "GEO: სáƒáƒ¡áƒ£áƒ áƒ•ელი"
106 "GRK: Καλώς ήλθατε" 156 "GRK: Καλώς ήλθατε"
107 "HEB: בִּרוּבִים חַבָּאִים" 157 "HEB: ברוכי×? הב×?×™×?"
108 "HIN: रवागत" 158 "HIN: रवागत"
109 "JPN: よろてそ" 169 "JPN: よã?†ã?“ã??"
110 "KOR: 여보세요" 160 "KOR: 여보세요"
111 "PAN: ਜੀ ਆਇਆਂ ਨੂੰ" 161 "PAN: ਜੀ ਆਇਆਂ ਨੂੰ"
112 "PTB: Bem-vindo" 162 "PTB: Bem-vindo"
113 "RUS: Добро пожаловать" 163 "RUS: Добро пожаловать"
114 "TAM: அங்கிகரி" 164 "TAM: à®…à®™à¯?கிகரி"
115 "THA: การต้อนรับ" 165 "THA: à¸à¸²à¸£à¸•้อนรับ"
116 "URD: स्वागत" 166 "URD: सà¥à¤µà¤¾à¤—त"
117 "VIE: tính từ" 167 "VIE: tính từ"

"Hello" in several languages
*
Needs Code2000 Font to see this

Language "Hello"
UTF-16 Unicode
"Hello"
UTF-8 Encoded
Arabic السلام عليكم السلام عليكم
Bengali (বাঙ্লা) ষাগতোম ষাগতোম
* Burmese မ္ရန္မာ (မ္ရန္မာ)
Cantonese (粵語,廣東話) 早晨, 你好  æ—©æ™¨, 你好
* Cherokee (á£áŽ³áŽ©)ᎣᏏᏲ ᎣᏏᏲ Ꭳáá²
Chinese (中文,普通话,汉语) 你好 你好
Czech (česky) Dobrý den Dobrý den
Danish (Dansk) Hej, Goddag Hej, Goddag
English Hello Hello
Esperanto Saluton Saluton
Estonian Tere, Tervist Tere, Tervist
Finnish (Suomi) Hei Hei
French (Français) Bonjour, Salut Bonjour, Salut
German (Deutsch Nord) Guten Tag Guten Tag
German (Deutsch Süd) Grüß Gott Grüß Gott
Georgian (ქართველი) გამარჯობა გáƒáƒ›áƒáƒ áƒ¯áƒáƒ‘áƒ
Gujarati (ગુજરાતિ) (ગà«àªœàª°àª¾àª¤àª¿)
Greek (Ελληνικά) Γειά σας Γειά σας
Hebrew שלום שלו×
Hindi नमस्ते, नमस्कार। नमसà¥à¤¤à¥‡, नमसà¥à¤•ार।
Italiano Ciao, Buon giorno Ciao, Buon giorno
Japanese (日本語) こんにちは, コンニチハ ã“ã‚“ã«ã¡ã¯, コï¾ï¾†ï¾ï¾Š
Korean (한글) 안녕하세요, 안녕하십니까 안녕하세요, 안녕하십니까
Maltese Ċaw, Saħħa ÄŠaw, Saħħa
Nederlands Vlaams Hallo, Dag Vlaams Hallo, Dag
Norwegian (Norsk) Hei, God dag Hei, God dag
Punjabi (ਪੁਂਜਾਬਿ) (ਪà©à¨‚ਜਾਬਿ)
Polish Dzień dobry, Hej DzieÅ„ dobry, Hej
Russian (Русский) Здравствуйте! ЗдравÑтвуйте!
Slovak Dobrý deň Dobrý deň
Spanish (Español) ‎¡Hola!‎ ‎¡Hola!‎
Swedish (Svenska) Hej, Goddag Hej, Goddag
Thai (ภาษาไทย) สวัสดีครับ, สวัสดีค่ะ สวัสดีครับ, สวัสดีค่ะ
Tamil (தமிழ்) வணக்கம் வணகà¯à®•à®®à¯
Turkish (Türkçe) Merhaba Merhaba
Vietnamese (Tiếng Việt) Xin Chào Xin Chào
Yiddish â€(ײַדישע) דאָס הײַזעלע ד×ָס הײַזעלע

Other methods for creating a string at Vb Runtime:

Sample Output Vb String
ARA: مـرحب "ARA: " & ChrW$(&H645) & ChrW$(&H640) & ChrW$(&H631) & ChrW$(&H62D) & ChrW$(&H628)
ARM: ԱԲԳԴԵԶԷԸԹ "ARM: " & ChrW$(&H531) & ChrW$(&H532) & ChrW$(&H533) & ChrW$(&H534) & ChrW$(&H535) & ChrW$(&H536) & ChrW$(&H537) & ChrW$(&H538) & ChrW$(&H539)
CHS: 欢迎 "CHS: " & ChrW$(&H6B22) & ChrW$(&H8FCE)
CHT: 歡迎 "CHT: " & ChrW$(&H6B61) & ChrW$(&H8FCE)
ENG: Welcome "ENG: Welcome"
GEO: სასურველი "GEO: " & ChrW$(&H10E1) & ChrW$(&H10D0) & ChrW$(&H10E1) & ChrW$(&H10E3) & ChrW$(&H10E0) & ChrW$(&H10D5) & ChrW$(&H10D4) & ChrW$(&H10DA) & ChrW$(&H10D8)
GRK: Καλώς ήλθατε "GRK: " & ChrW$(&H39A) & ChrW$(&H3B1) & ChrW$(&H3BB) & ChrW$(&H3CE) & ChrW$(&H3C2) & " " & ChrW$(&H3AE) & ChrW$(&H3BB) & ChrW$(&H3B8) & ChrW$(&H3B1) & ChrW$(&H3C4) & ChrW$(&H3B5)
HEB: ברוכים הבאים "HEB: " & ChrW$(&H5D1) & ChrW$(&H5E8) & ChrW$(&H5D5) & ChrW$(&H5DB) & ChrW$(&H5D9) & ChrW$(&H5DD) & " " & ChrW$(&H5D4) & ChrW$(&H5D1) & ChrW$(&H5D0) & ChrW$(&H5D9) & ChrW$(&H5DD)
HIN: रवागत "HIN: " & ChrW$(&H930) & ChrW$(&H935) & ChrW$(&H93E) & ChrW$(&H917) & ChrW$(&H924)
JPN: ようこそ "JPN: " & ChrW$(&H3088) & ChrW$(&H3046) & ChrW$(&H3053) & ChrW$(&H305D)
KOR: 여보세요 "KOR: " & ChrW$(&HC5EC) & ChrW$(&HBCF4) & ChrW$(&HC138) & ChrW$(&HC694)
PAN: ਜੀ ਆਇਆਂ ਨੂੰ "PAN: " & ChrW$(&HA1C) & ChrW$(&HA40) & " " & ChrW$(&HA06) & ChrW$(&HA07) & ChrW$(&HA06) & ChrW$(&HA02) & " " & ChrW$(&HA28) & ChrW$(&HA42) & ChrW$(&HA70)
PTB: Bem-vindo "PTB: Bem-vindo"
RUS: Добро пожаловать "RUS: " & ChrW$(&H414) & ChrW$(&H43E) & ChrW$(&H431) & ChrW$(&H440) & ChrW$(&H43E) & " " & ChrW$(&H43F) & ChrW$(&H43E) & ChrW$(&H436) & ChrW$(&H430) & ChrW$(&H43B) & ChrW$(&H43E) & ChrW$(&H432) & ChrW$(&H430) & ChrW$(&H442) & ChrW$(&H44C)
TAM: அங்கிகரி "TAM: " & ChrW$(&HB85) & ChrW$(&HB99) & ChrW$(&HBCD) & ChrW$(&HB95) & ChrW$(&HBBF) & ChrW$(&HB95) & ChrW$(&HBB0) & ChrW$(&HBBF)
THA: การต้อนรับ "THA: " & ChrW$(&HE01) & ChrW$(&HE32) & ChrW$(&HE23) & ChrW$(&HE15) & ChrW$(&HE49) & ChrW$(&HE2D) & ChrW$(&HE19) & ChrW$(&HE23) & ChrW$(&HE31) & ChrW$(&HE1A)
URD: स्वागत "URD: " & ChrW$(&H938) & ChrW$(&H94D) & ChrW$(&H935) & ChrW$(&H93E) & ChrW$(&H917) & ChrW$(&H924)
VIE: tính từ "VIE: tính t" & ChrW$(&H1EEB)

Note: Under the hood StrConv inserts a BOM (FEFF) before the CJK Unified Ideographs.

More stuff to play with:

Sample Font.Name Font.Charset String
1 English Tahoma ANSI_CHARSET "English"
2 româneşte " EASTEUROPE_CHARSET ChrW$(114) & ChrW$(111) & ChrW$(109) & ChrW$(226) & ChrW$(110) & ChrW$(101) & ChrW$(351) & ChrW$(116) & ChrW$(101)
3 ภาษาไทย
 
" THAI_CHARSET ChrW$(3616) & ChrW$(3634) & ChrW$(3625) & ChrW$(3634) & ChrW$(3652) & ChrW$(3607) & ChrW$(3618)
4 Հայերեն Arial Unicode MS   ChrW$(1344) & ChrW$(1377) & ChrW$(1397) & ChrW$(1381) & ChrW$(1408) & ChrW$(1381) & ChrW$(1398)
5 Tiếng Việt

"

VIETNAMESE_CHARSET ChrW$(84) & ChrW$(105) & ChrW$(234) & ChrW$(769) & ChrW$(110) & ChrW$(103) & ChrW$(32) & ChrW$(86) & ChrW$(105) & ChrW$(234) & ChrW$(803) & ChrW$(116)
6 עברית

"

HEBREW_CHARSET ChrW$(1506) & ChrW$(1489) & ChrW$(1512) & ChrW$(1497) & ChrW$(1514)
7 मराठी Arial Unicode MS   ChrW$(2350) & ChrW$(2352) & ChrW$(2366) & ChrW$(2336) & ChrW$(2368)
8 中文 (台灣) PMingLiU CHINESEBIG5_CHARSET ChrW$(20013) & ChrW$(25991) & " (" & ChrW$(21488) & ChrW$(28771) & ")")
9 नेपाली Arial Unicode MS   ChrW$(2344) & ChrW$(2375) & ChrW$(2346) & ChrW$(2366) & ChrW$(2354) & ChrW$(2368)
10 Русский

"

RUSSIAN_CHARSET ChrW$(1056) & ChrW$(1091) & ChrW$(1089) & ChrW$(1089) & ChrW$(1082) & ChrW$(1080) & ChrW$(1081)
11 ირუკსაბ Arial Unicode MS   StrReverse(ChrW$(4305) & ChrW$(4304) & ChrW$(4321) & ChrW$(4313) & ChrW$(4323) & ChrW$(4320) & ChrW$(4312))
12 日本語 Arial Unicode MS SHIFTJIS_CHARSET ChrW$(26085) & ChrW$(26412) & ChrW$(-30050)
13 ଉଡିଯା Arial Unicode MS   ChrW$(2825) & ChrW$(2849) & ChrW$(2879) & ChrW$(2863) & ChrW$(2878)
14 Ελληνικά

"

GREEK_CHARSET ChrW$(917) & ChrW$(955) & ChrW$(955) & ChrW$(951) & ChrW$(957) & ChrW$(953) & ChrW$(954) & ChrW$(940)
15 हिन्दी
 
Arial Unicode MS   ChrW$(2361) & ChrW$(2367) & ChrW$(2344) & ChrW$(2381) & ChrW$(2342) & ChrW$(2368)
16 한국어 GulimChe HANGEUL_CHARSET ChrW$(-10916) & ChrW$(-21139) & ChrW$(-14924)
17 తెలుగు Arial Unicode MS   ChrW$(3108) & ChrW$(3142) & ChrW$(3122) & ChrW$(3137) & ChrW$(3095) & ChrW$(3137
18 Čeština

"

EASTEUROPE_CHARSET ChrW$(268) & ChrW$(101) & ChrW$(353) & ChrW$(116) & ChrW$(105) & ChrW$(110) & ChrW$(97)
19 ಕನ್ನಡ Arial Unicode MS   ChrW$(3221) & ChrW$(3240) & ChrW$(3277) & ChrW$(3240) & ChrW$(3233)
20 中文(中国) SimSun GB2312_CHARSET ChrW$(20013) & ChrW$(25991) & "(" & ChrW$(20013) & ChrW$(22269) & ")")
21 ગુજરાતી Arial Unicode MS   ChrW$(2711) & ChrW$(2753) & ChrW$(2716) & ChrW$(2736) & ChrW$(2750) & ChrW$(2724) & ChrW$(2752)
22 Türkçe

"

TURKISH_CHARSET ChrW$(84) & ChrW$(252) & ChrW$(114) & ChrW$(107) & ChrW$(231) & ChrW$(101)
23 தமிழ்
 
Arial Unicode MS   ChrW$(2980) & ChrW$(2990) & ChrW$(3007) & ChrW$(2996) & ChrW$(3021)

 

6

Platforms

 

OS/
Application

Unicode
Aware

API

Fonts

Additional Requirements

 Vb5/6 Yes. Uses Unicode to store and manipulate strings.     Instrinsic controls, Properties Window(IDE), Clipboard, and PropertyBag are ANSI only.
 NT/2000/XP/Vista Yes. Uses Unicode to store and manipulate strings. Uses Unicode:
DrawTextW Lib "user32" - TextOutW Lib "gdi32"
Installed. You may need to enable Far East language support via Control Panel, Regional Options, Languages if it was not done so at install time. None
 98/ME

No. Uses ANSI or * DBCS to store and manipulate strings.

Uses ANSI:
DrawTextA Lib "user32" - TextOutA Lib "gdi32"or DrawTextW Lib "Unicows"
TextOutW Lib "Unicows"
You need to install at least one Unicode font. Arial MS Unicode used to be a free(23Mb) download from Microsoft. It is installed automatically with Office XP Pro or Frontpage 2002. Microsoft Layer for Unicode on Win9x Systems (MSLU). Unicows.DLL (269.7kb download) available free from Microsoft.
The current FileVersion is "1.0.4018.0" April 21, 2003.
Automation 95/98/ME & NT/2000/XP/Vista Yes. Uses Unicode to pass the strings back and forth.      

XP now supports a total of 136 locales, which includes the 126 locales supported by Windows 2000 and adds the following:

Other international features New to XP:

New Locales in Windows XP Service Pack 2

Windows XP Service Pack 2 introduces 25 additional locales and another 11 with Service Pack 2 Update:

Windows XP Service Pack 2 Locales

Bengali (India)

Quechua (Bolivia)

Sami, Northern (Sweden)

Bosnian (Latin, Bosnia and Herzegovina)

Quechua (Ecuador)

Sami, Skolt (Finland)

Croatian (Latin, Bosnia and Herzegovina)

Quechua (Peru)

Sami, Southern (Norway)

isiXhosa (South Africa) Sami

Sami, Inari (Finland)

Sami, Southern (Sweden)

isiZulu (South Africa)

Sami, Lule (Norway)

Serbian (Cyrillic, Bosnia and Herzegovina)

Malayalam (India)

Sami, Lule (Sweden)

Serbian (Latin, Bosnia and Herzegovina)

Maltese (Malta) Sami

Sami, Northern (Finland)

Sesotho sa Leboa (South Africa)

Maori (New Zealand)

Sami, Northern (Norway)

Setswana (South Africa)

 

 

Welsh (United Kingdom)

Windows XP Service Pack 2 Update Locales

Bosnian (Cyrillic, Bosnia and Herzegovina)

Irish (Ireland)

Nepali (Nepal)

Filipino (Philippines)

Luxembourgish (Luxembourg)

Pashto (Afghanistan)

Frisian (Netherlands)

Mapudungun (Chile)

Romansh (Switzerland)

Inuktitut (Latin, Canada)

Mohawk (Mohawk)

 

Windows Vista New locales

Alsatian (France) Hausa (Latin, Nigeria) Spanish (United States)
Amharic (Ethiopia) Igbo (Nigeria) Tajik (Cyrillic, Tajikistan)
Assamese (India) Inuktitut (Syllabics, Canada) Tamazight (Latin, Algeria)
Bashkir (Russia) Khmer (Cambodia) Tibetan (PRC)

Bengali (Bangladesh)

K'iche (Guatemala)

Turkmen (Turkmenistan)

Breton (France)

Kinyarwanda (Rwanda)

Uighur (PRC)

Corsican (France)

Lao (Lao P.D.R.)

Upper Sorbian (Germany)

Dari (Afghanistan)

Lower Sorbian (Germany)

Wolof (Senegal)

English (India)

Mongolian (Traditional Mongolian, PRC)

Yakut (Russia)

English (Malaysia)

Occitan (France)

Yi (PRC)

English (Singapore)

Oriya (India)

Yoruba (Nigeria)

Greenlandic (Greenland)

Sinhala (Sri Lanka)

 

These locales are automatically installed when you update Windows XP to SP2. You can select new locales in Regional and Language Options. They are not supported in Windows Server 2003.

 

7

LCID

International Locale Codes
 

Click Icon to your left to Load Table.
 Please Wait...

Unicode Only LCIDs

Identifier Language Platform
0x042b  Armenian 2000/XP
0x0465  Divehi XP
0x0437  Georgian 2000/XP
0x0447  Gujarati XP
0x0439  Hindi 2000/XP
0x044b  Kannada XP
0x0457  Konkani 2000/XP
0x044e  Marathi 2000/XP
0x0446  Punjabi XP
0x044f  Sanskrit 2000/XP
0x045a  Syriac XP
0x0449  Tamil 2000/XP
0x044a  Telugu XP

 

 

Related resources

 

8

Codepage

Also see Table above

Table of Known Code Pages

CP_ACP 0 WesternEuropean_Mac 10000 UserDefined 50000
CP_OEMCP 1 Japanese_Mac 10001 AutoSelect 50001
CP_MACCP 2 Arabic_Mac 10004 Japanese_JIS 50220
CP_THREAD_ACP 3 Greek_Mac 10006 Japanese_JIS_Allow1byteKana 50221
CP_SYMBOL 42 Cyrillic_Mac 10007 Japanese_JIS_Allow1byteKanaSOSI 50222
OEM_UnitedStates 437 Latin2_Mac 10029 Korean_ISO 50225
Arabic_ASMO708 708 Turkish_Mac 10081 Japanese_AutoSelect 50932

Arabic_DOS

720 Chinese_Traditional_CNS 20000 Chinese_Simplified_AutoSelect 50936
Greek_DOS 737 Chinese_Traditional_Eten 20002 Korean_AutoSelect 50949
Baltic_DOS 775 WesternEuropean_IA5 20105 Chinese_Traditional_Auto_Select 50950
WesternEuropean_DOS 850 German_IA5 20106 Cyrillic_Auto_Select 51251
Central_European_DOS 852 Swedish_IA5 20107 Greek_AutoSelect 51253
Icelandic_DOS 861 Norwegian_IA5 20108 Arabic_AutoSelect 51256
Hebrew_DOS 862 US_ASCII 20127 Japanese_EUC 51932
Cyrillic_DOS 866 Cyrillic_KOI8R 20866 Chinese_Simplified_EUC 51936
Greek_DOS_Modern 869 Cyrillic_KOI8U 21866 Korean_EUC 51949
Thai_Windows 874 WesternEuropean_ISO 28591 Chinese_Simplified_HZ 52936
IBM_EBCDIC_GreekModern 875 Central_European_ISO 28592 CP_UTF7 65000 *
Japanese_ShiftJIS 932 * Baltic_ISO 28594 CP_UTF8 65001 *
Chinese_Simplified_GB2312 936 * Cyrillic_ISO 28595    
Korean 949 * Arabic_ISO 28596    
Chinese_Traditional_Big5 950 * Greek_ISO 28597    
Unicode 1200 Latin3_ISO 28593    
Unicode_BigEndian 1201 Hebrew_ISO_Visual 28598
Central_European_Windows 1250 Turkish_ISO 28599    
Cyrillic_Windows 1251 Latin9_ISO 28605    
WesternEuropean_Windows 1252 Europa 29001    
Greek_Windows 1253 Hebrew_ISO_Logical 38598    
Turkish_Windows 1254        
Hebrew_Windows 1255        
Arabic_Windows 1256        
Baltic_Windows 1257        
Vietnamese_Windows 1258        
Korean_Johab 1361        

* DBCS - Double-Byte Character Set
* This is a pseudo codepage. There is no corresponding NLS file. This code page ID can only be used with WideCharToMultiByte() and MultiByteToWideChar() API calls.

 

9

Fonts

Fonts used on Windows XP-SP1:

Method API Parameter Font Name API Face Name
GetStockFont GetStockObject SYSTEM_FONT MS Sans Serif GetTextFace System
GetStockFont GetStockObject DEFAULT_GUI_FONT MS Sans Serif GetTextFace MS Shell Dlg
GetSysFontA SystemParametersInfo cf_Caption Arial    
GetSysFontA SystemParametersInfo cf_Menu Tahoma    
GetSysFontA SystemParametersInfo cf_Message Tahoma    
GetSysFontA SystemParametersInfo cf_SmallCaption Tahoma    
GetSysFontA SystemParametersInfo cf_Status Tahoma    


Win2000/XP/Office2000/Office2003 should already have Arial Unicode MS installed.
Win98 users will need a Unicode font to render Unicode glyphs.

Font
Not redistributable.
Version Glyphs Size Comments
Arial Unicode MS 1.00 51,180 22.19 Mb Installed with Win2000/Office2000 or later.
Lucida Sans Unicode 2.00 1,776 316.4 kb  
Bitstream Cyberbit beta v2.0 29,934 13.04 Mb Complete - Download Font and DOC here.
Bitstream Cyberbase beta v2.0 1,249 302 kb No CJK - Download Font and DOC here.
Bitstream CyberCJK beta v2.0 28686 12.74 Mb CJK only - Download Font and DOC here.
Code2000 1.13 34,810 3.01 Mb Shareware. US$5.00. Download here.
TITUS Cyberbit Basic 2000; 3.0 9,568 1.82 Mb Non-Commercial use only.
UNICODE 4.0 compliant.
Download here.
Doulos SIL 4.010.2004   674 kb DoulosSIL4.0.10.zip.
Non-Commercial use only.
License/DistributionRestrictions
Ezra SIL
Hebrew Unicode
    3.20 Mb EzrSIL20.zip
Non-Commercial use only.
License/DistributionRestrictions

Also see Multilingual Unicode TrueType Fonts on the Internet.

Sample Font Support for several Languages
Test Platform WinXP-SP1

Sample Arial
Unicode MS
Code
2000
* Microsoft
Tahoma or Arial
* Microsoft
Sans Serif
Bitstream
Cyberbit
TITUS
Cyberbit
Basic
ARA: مـرحبــاً
CHS: 欢迎
CHT: 歡迎
GEO: სასურველი
GRK: Καλώς ήλθατε
HEB: בִּרוּבִים חַבָּאִים
HIN: रवागत
JPN: よろてそ
KOR: 여보세요
PAN: ਜੀ ਆਇਆਂ ਨੂੰ
RUS: Добро пожаловать
TAM: 쏅얧주ø
THA: การต้อนรับ
URD: स्वागत
VIE: tính từ

Tahoma, Arial do not support all these languages but works due to Uniscribe Font Fallback. Apparently MS San Serif does not use or support Font Fallback. Font Fallback is only available on Platforms Win2000 or later.

10

MSLU

 

Microsoft Layer for Unicode Technology(UNICOWS.DLL)

While you can make a separate programs for specific platforms it is often desirable to make one program that will work on all platforms. By using Microsoft Layer for Unicode Technology (Unicows.DLL, 240kb), a single executable can run on both NT-based and Win9x Platforms. In this case you can use DrawTextW or TextOutW Lib "Unicows" for all platforms.

  1. The Unicows.Dll forwards calls to the system API if you are running on NT, 2000, or XP platforms.

  2. MSLU does not support the display of characters that the system cannot display. Therefore do not expect to see Chinese under Win98 English even though you have installed a font that supports Chinese(Arial Unicode MS for example). For more info see Newsgroup microsoft.public.platformsdk.mslayerforunicode and http://trigeminal.com/usenet/usenet035.asp.

Use this conditional compilation directive to test your code with and without Unicows:

Under Project/Properties/Make set conditional compilation arguments to UNICOWS = -1 or UNICOWS = 0.

Note:

You may not need MSLU at all if your program uses only DrawText or TextOut. In this case you can simply wrap the ANSI and Wide versions into a Sub. Do not expect to see Unicode on Win9x platforms even if you are using a Unicode font such as Arial Unicode MS:

'Put this in your startup (Initialise,Sub Main, etc.)
   Dim m_bIsNt as Boolean
   ' Are we running NT?
   Dim lVer As Long
   lVer = GetVersion()
   m_bIsNt = ((lVer And &H80000000) = 0)

Public Sub pDrawText(ByVal hdc As Long, ByVal s As String, tR As RECT, ByVal lFlags As Long)
   Dim lPtr As Long
   If (m_bIsNt) Then
      lPtr = StrPtr(s)
      If Not (lPtr = 0) Then
         DrawTextW hdc, lPtr, -1, tR, lFlags
      End If
   Else
      DrawTextA hdc, s, -1, tR, lFlags
   End If
End Sub

Public Sub pTextOut(ByVal lhDC As Long, ByVal x As Long, ByVal y As Long, ByVal sText As String)
   Dim lPtr As Long
   If (m_bIsNt) Then
      lPtr = StrPtr(sText)
      If Not (lPtr = 0) Then
         TextOutW lhDC, x, y, lPtr, Len(sText)
      End If
   Else
      TextOutA lhDC, x, y, sText, Len(sText)
   End If
End Sub
Public Sub pGetTextExtentPoint32(ByVal hdc As Long, ByVal s As String, lpSize As SIZEAPI)
   Dim lPtr As Long
   If (m_bIsNt) Then
      lPtr = StrPtr(s)
      If Not (lPtr = 0) Then
         GetTextExtentPoint32W hdc, lPtr, Len(s), lpSize
      End If
   Else
      GetTextExtentPoint32A hdc, s, Len(s), lpSize
   End If
End Sub

 

11

Uniscribe

 

Uniscribe Architecture

 

In 1999 Microsoft introduced Uniscribe, a Windows system-level component that could take advantage of OpenType fonts. Microsoft Windows 2000 and applications Internet Explorer 5 and Office 2000 were released with support for Uniscribe built in.

For Windows 2000 and later, supports the processing of complex scripts, that is, those scripts that need special processing to properly render them. It includes a subset of the features found in GDI+ in Windows 2000 and Windows XP. The rules governing the shaping and positioning of glyphs are specified and catalogued in The Unicode Standard: Worldwide Character Encoding, Version 2.0, Addison-Wesley Publishing Company.
http://msdn.microsoft.com/library/en-us/mslu/winprog/other_existing_unicode_support.asp?frame=true
http://www.microsoft.com/msj/1198/multilang/multilang.aspx

A complex script has at least one of the following attributes:

You may wonder how WinXP displays Unicode correctly even when you haven't selected a Font which supports all the required characters.
"Font fallback: this mechanism, made available through Uniscribe (see section on Complex Scripts Support), provides a fallback font (or a default font) when dealing with complex scripts. If the selected font face does not include any glyphs for the complex script that is about to be displayed, Uniscribe selects a default hardcoded font for the given script. For example, if you have Hindi text and the font is Courier, then Uniscribe will use the Mangal font. This technique is internal to Uniscribe and developers can not add additional fonts to the list of fallback fonts."
Note: Set flags to SSA_FALLBACK

Uniscribe is installed with Internet Explorer 5.0 or later, MS Office, Win2000, WinXP. Here are some versions of Uniscribe I found (including one found on Win98SE):

usp10.dll FileVersion
Note: Not redistributable.
Size
(bytes)
TimeDateStamp
(Internal)
Comments
1.0163.1890.1   268,288 22-Sep-1998 23:04:38 Microsoft Systems Journal Nov 98
Download code here
1.0325.2180.1   315.152 30-Nov-1999 09:34:40 Found on Win98SE \Windows\System
Download from DLL World or
Microsoft
1.0400.2411.1         Installed with Internet Explorer 6
1.0405.2415.1 (lab06_N.010104-1344) 325,120 06-Jan-2001 05:14:26 MS Office 10 common archives
1.0409.2600.1106 (xpsp1.020828-1920) 339,456 09-Sep-2002 21:05:43 XP-SP1 \Windows\System32
1.0420.2600.2180 (xpsp_sp2_rtm.040803-2158) 406,528 07-Dec-2005 14:38 XP-SP2 \Windows\System32
1.0453.3665.0 (private/Lab06_dev(paulnel).
020427-0653)
397,312 06-Aug-2002 23:14:38 Microsoft VOLT users community
1.0471.4030.0 (main.030626-1414) 413,184 27-Jun-2003 10:24:14 Microsoft Office 2003

 

Best results in tests run on Win98SE has been with Uniscribe version 1.0405.2415.1. Microsoft Office 2003 version has not been tested yet.

A Vb wrapper for this library can be found in Internationalization with Visual Basic by Michael S. Kaplan. It comes with a CD containing sample sourcecode. The sample includes a Uniscribe-aware version of ExtTextOutW. More info here.
 

A more complex C++ example can be found at "Supporting Multilanguage Text Layout and Complex Scripts with Windows NT 5.0". Dont be mislead by 'Windows NT 5.0' in the title because this demo also works on Win98. More info here.

Logical characters: Display Plain Text and handle caret placement: Display Formatted Text and handle caret placement:

ARA: العربية

 العربية :ARA

  • ScriptGetCmap
  • ExtTextOut(ETO_GLYPHINDEX)
  • ScriptStringAnalyse - Calls ScriptItemise, Shape, Place etc.
  • ScriptStringGetLogicalWidths - Returns logical widths for the entire line
  • ScriptStringXtoCP - Pixel position to character index
  • ScriptStringCPtoX - Character index to pixel position
  • ScriptString_pSize - Gets points to SIZE structure for the line
  • ScriptStringOut - Render line to device
  • ScriptStringFree - All analyses must be freed
  • ScriptItemize - Break string on script and direction boundaries
  • ScriptLayout - Bidi embedding level interpreter
  • ScriptShape - Unicode to glyph translation
  • ScriptPlace - Width and position generation
  • ScriptTextOut - Render to device
  • ScriptXtoCP - Pixel position to character index
  • ScriptCPtoX - Character index to pixel position
  • ScriptGetLogicalWidths - Generate widths in character order
  • ScriptBreak - Get line breaking flags

This sample has been update and can be found on the Microsoft® Platform SDK(August 2002 Edition, Windows XP SP1) if you have it installed under C:\Program Files\Microsoft SDK\Samples\winui\globaldev\CSSamp. You may encounter problems compiling this due to missing or outdated files.

File Copy From Copy To
Shlwapi.h
12-Jul-2002
60,270 bytes
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include Microsoft Visual Studio\VC98\Include
ShTypes.h
05-Aug-2002
6,622 bytes
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include Microsoft Visual Studio\VC98\Include
usp10.h
15-Aug-2002
81,839 bytes
Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include Microsoft SDK\Samples\winui\globaldev\CSSamp
     

 

To build an application that supports Unicode on all Platforms AND uses Uniscribe you could use something similar to this:

Public Sub pDrawText(ByVal hdc As Long, ByVal s As String, tR As RECT, ByVal lFlags As Long)
   Dim lPtr As Long
   If (IsNt) Then
      lPtr = StrPtr(s)
      If Not (lPtr = 0) Then
         DrawTextW hdc, lPtr, -1, tR, lFlags
      End If
   Else
      If (IsUnicode(s)) Then
         If (HasUniscribe) Then
            DrawTextU hdc, s, tR, lFlags 'Uniscribe Wrapper
         Else
            DrawTextM hdc, s, tR, lFlags 'MultiByte Wrapper
         End If
      Else
         DrawTextA hdc, s, -1, tR, lFlags
      End If
   End If
End Sub

 

12

Mlang

Provides services for applications on international issues, including conversion between code pages, font linking, code page "guessing", line breaking, and more. Installed with Internet Explorer 5.5 or later.
http://msdn.microsoft.com/workshop/misc/mlang/mlang.asp
http://msdn.microsoft.com/workshop/misc/mlang/reference/objects/CMultiLanguage.asp
 

The only Vb wrapper for this library I can find is here.

13

RichEdit

Provides a programming interface(control) for formatting text. This can be used in lieu of Fm20.Dll Unicode TextBox. No distribution issues and comes with source code.

Rich Edit
version
Unicode New DLL XP - SP1 XP Me 2000 NT 98 95
1.0

  Riched32.dll
Emulator

Emulator

Emulator
2.0 Supports Unicode Riched20.dll May be installed
3.0 Expanded support for complex scripts, partly due to Uniscribe. Riched20.dll
4.1

Hyphenation, page rotation, and Text Services Framework (TSF) support. Msftedit.dll

Links:
http://msdn.microsoft.com/library/psdk/winui/richedit_5a7n.htm
http://msdn.microsoft.com/library/en-us/shellcc/platform/commctls/richedit/richeditcontrols.asp
About Rich Edit Controls