In the original version of the Internationalized Domain Names in Applications (IDNA) protocol, any Unicode code points taken from user input were mapped into a set of Unicode code points that "made sense", and then encoded and passed to the domain name system (DNS). The IDNA2008 protocol (described in RFCs 5890, 5891, 5892, and 5893) presumes that the input to the protocol comes from a set of "permitted" code points, which it then encodes and passes to the DNS, but does not specify what to do with the result of user input. This document describes the actions that can be taken by an implementation between receiving user input and passing permitted code points to the new IDNA protocol.
Unicode 15.0 adds 4,489 characters, for a total of 149,186 characters. These additions include 2 new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs.
The new scripts and characters in Version 15.0 add support for lesser-used languages and unique written requirements worldwide, including numerous symbols additions. Funds from the Adopt-a-Character program provided support for some of these additions. The new scripts and characters include:
Popular symbol additions:
Other symbol and notational additions include:
Support for other languages and scholarly work worldwide includes:
Updates to the CJK blocks add:
Support for CJK unified ideographs was enhanced in Version 15.0 by significant corrections and improvements to the Unihan database. Changes to the Unihan database include updated source lists, regular expressions, and new and updated fields. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.
Important chart font updates, including:
Unicode 15.1 adds 627 characters, for a total of 149,813 characters.
There are several significant themes for this release of the Unicode Standard.
Supersedes: Unicode 15.0.0
Unicode 16.0 adds 5185 characters, for a total of 154,998 characters. The new additions include seven new scripts:
Other character additions include seven new emoji characters plus 3,995 additional Egyptian Hieroglyphs and over 700 symbols from legacy computing environments.
In addition to new characters, new “Moji Jōhō Kiban” (文字情報基盤) Japanese source references have been added for over 36,000 CJK unified ideographs. These are reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column.
Supersedes: Unicode 15.1.0
Unicode 17.0 adds 4803 characters, for a total of 159,801 characters. The new additions include 4 new scripts:
Supersedes: Unicode 16.0.0
What began in 1988 as a standard for character encoding has grown into a powerful portfolio of open source standards, code, tools, libraries, and products that ensure global language support, interoperability, and resiliency across billions of devices.
Behind what most take for granted on screens today is the Unicode Consortium, the non- profit open source, open standards body for the internationalization of software and services. Unicode is embedded in every major operating system and used on more than 20 billion devices worldwide. It may be the most widely deployed technology ever.
This document clarifies a number of the terms used to describe character encodings. It elaborates the Internet Architecture Board (IAB) three-layer “text stream” definitions from RFC 2130 into a four-layer structure more appropriate for explanation of the Unicode Standard.