Mapping Tables for Neoview Character Sets

Legal Notice

Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license.

The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Export of the information contained in this publication may require authorization from the U.S. Department of Commerce.

Microsoft, Windows, and Windows NT are U.S. registered trademarks of Microsoft Corporation.

Intel, Pentium, and Celeron are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Java is a U.S. trademark of Sun Microsystems, Inc.

Motif, OSF/1, UNIX, X/Open, and the "X" device are registered trademarks, and IT DialTone and The Open Group are trademarks of The Open Group in the U.S. and other countries.

Open Software Foundation, OSF, the OSF logo, OSF/1, OSF/Motif, and Motif are trademarks of the Open Software Foundation, Inc. OSF MAKES NO WARRANTY OF ANY KIND WITH REGARD TO THE OSF MATERIAL PROVIDED HEREIN, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. OSF shall not be liable for errors contained herein or for incidental consequential damages in connection with the furnishing, performance, or use of this material.

© 1990, 1991, 1992, 1993 Open Software Foundation, Inc. The OSF documentation and the OSF software to which it relates are derived in part from materials supplied by the following:© 1987, 1988, 1989 Carnegie-Mellon University. © 1989, 1990, 1991 Digital Equipment Corporation. © 1985, 1988, 1989, 1990 Encore Computer Corporation. © 1988 Free Software Foundation, Inc. © 1987, 1988, 1989, 1990, 1991 Hewlett-Packard Company. © 1985, 1987, 1988, 1989, 1990, 1991, 1992 International Business Machines Corporation. © 1988, 1989 Massachusetts Institute of Technology. © 1988, 1989, 1990 Mentat Inc. © 1988 Microsoft Corporation. © 1987, 1988, 1989, 1990, 1991, 1992 SecureWare, Inc. © 1990, 1991 Siemens Nixdorf Informationssysteme AG. © 1986, 1989, 1996, 1997 Sun Microsystems, Inc. © 1989, 1990, 1991 Transarc Corporation.OSF software and documentation are based in part on the Fourth Berkeley Software Distribution under license from The Regents of the University of California. OSF acknowledges the following individuals and institutions for their role in its development: Kenneth C.R.C. Arnold, Gregory S. Couch, Conrad C. Huang, Ed James, Symmetric Computer Systems, Robert Elz. © 1980, 1981, 1982, 1983, 1985, 1986, 1987, 1988, 1989 Regents of the University of California.

April 2008


Table of Contents

About This Document
Publishing History
Using the Character Set Mapping Tables
Using Our Abbreviated Version of the GB18030 Mapping Tables
Characters That Are Accessible From the GB18030 Mapping Tables
Algorithm for Mapping Unlinked GB18030 Characters to Unicode Values
Links to the Character Set Mapping Tables

List of Tables

Links to the Character Set Mapping Tables

About This Document

Table of Contents

Publishing History

Publishing History

Part Number Product Version Publication Date
544952–001 HP Neoview Release 2.3 April 2008

Using the Character Set Mapping Tables


This document provides links to the character set mapping tables used by the Neoview Character Set feature for Release 2.3.

Using Our Abbreviated Version of the GB18030 Mapping Tables

Characters That Are Accessible From the GB18030 Mapping Tables

GB18030 encodes characters in sequences of one, two, or four bytes and has 1,112,064 valid byte sequences (characters). Of this number, Unicode.org currently defines glyphs for approximately 63,000 of them.

Although this release of the Neoview platform supports the complete GB18030 character set, the GB18030 mapping tables you can access from this document provide active links to a much smaller subset of the GB18030 character set. Our abbreviated GB18030 mapping tables exclude all of the GB18030 characters (approximately 1,048,575) that are mapped in the surrogate pair range. The abbreviated mapping tables display the Unicode mappings for the remaining 63,000 characters, although not all of them have glyphs. For more information about our abbreviated GB18030 mapping tables, see “Links to the Character Set Mapping Tables”.

For information about an algorithm you can use to map unlinked GB18030 characters in the range 0x90308130 through 0xE3329A35 to their Unicode values, see “Algorithm for Mapping Unlinked GB18030 Characters to Unicode Values”.

Algorithm for Mapping Unlinked GB18030 Characters to Unicode Values

For the approximately 1,048,576 GB18030 characters in the range 0x90308130 through 0xE3329A35, the algorithm to map the characters to Unicode values is as follows:

Let  GBC = some GB18030 character in the range 0x90308130 - 0xE3329A35 
Then the UCS4 value (also known as UTF32 values) is given by the formula:
      UCS4val = 0x10000 + ( GBC % 0x10 ) + 
                          ( ( ( ( GBC & 0x0000FF00 ) >> 8   ) - 0x81 ) * 10 )  + 
                          ( ( ( ( GBC & 0x00FF0000 ) >> 16 ) - 0x30 ) * 1260  )   +
                          ( ( ( ( GBC & 0xFF000000 ) >> 24 ) - 0x90 )  * 12600 )

The UTF16 value can then be calculated as follows:

  • The first 16-bit word of the two-word UTF16 value equals:

    0xD800  + ( ( UCS4val - 0x10000 )  / 1024 )
  • The second 16-bit word of the two-word UTF16 value equals:

    = 0xDC00 + ( ( UCS4val - 0x10000 ) % 1024 )

The UTF8 value can be calculated as follows:

  • The first byte of the 4-byte UTF8 value equals:

    0xF0 + ( ( UCS4val  >> 18 ) % 8   )
  • The second byte of the 4-byte UTF8 value equals:

    0x80 + ( ( UCS4val  >> 12 ) % 64 )
  • The third byte of the 4-byte UTF8 value equals:

    0x80 + ( ( UCS4val  >> 6 ) % 64 )
  • The fourth byte of the 4-byte UTF8 value equals:

    0x80 + ( ( UCS4val  >>  0  ) % 64 )

Where:

  • X % Y means X module Y

  • X << Y means X left-shifted by Y bits

  • X >> Y means X right-shifted by Y bits

  • X & Y means X bit-wise ANDed with Y

Links to the Character Set Mapping Tables

Table 1 contains links to the main mapping table for each of the seven East Asian character sets supported by this release of the Neoview Character Sets feature.

Each mapping table provides this information about each character in the mapping table:

  • What the character looks like (if possible)

  • The UTF8 encoding of the character

  • The Unicode (UTF16) value of the character

Table 1 Links to the Character Set Mapping Tables

This link... Takes you to the...
BIG5 Main mapping table page for the BIG5 character set
EUC-JP Main mapping table page for the EUC-JP character set
GB2312 Main mapping table page for the GB2312 character set
GB18030 Main mapping table page for our abbreviated version of the GB18030 character set. Our GB18030 mapping tables contain a number of inactivated second-level mapping table links. From our main GB18030 mapping table, the active links 90 through E3 take you to second-level mapping tables where ten links in the top row (nn30 through nn39, where nn equals 90 through E3) have been inactivated.
GBK Main mapping table page for the GBK character set
KSC-5601 Main mapping table page for the KSC-5601 character set
SJIS Main mapping table page for the SJIS character set