The purpose of this repository is for reviewing draft Unihan database changes, removals, and additions by experts.
Each provisional Unihan database property currently being worked on has its own data file. At the moment, these are:
- kCantonese.txt
- kDefinition.txt
- kMeyerWempe.txt
- kPhonetic.txt
- kSemanticVariant.txt
- kSimplifiedVariant.txt
- kSpecializedSemanticVariant.txt
- kSpoofingVariant.txt
- kTraditionalVariant.txt
- kZVariant.txt
Additional files included are:
- AlternateRadicals.txt
- CantoneseLookup.txt
AlternateRadicals.txt is a list of characters which could reasonably be looked up in a radical-stroke index such as Unicode's under multiple radical-stroke values. Excluded are instances where the radical is the same but stroke counts differ only slightly. For easier editing, the characters for the radicals are generally shown, e.g.
U+61D5 懕 ⼼ 61.14 ⼚ 27.16
In all cases, the first value should be considered the standard value as defined in UAX #38.
Simplified radicals are not indicated.
CantoneseLookup.txt is an aid to editors of the kCantonese
property, and includes ideographs for which a Cantonese reading is known to exist, but whose Cantonese reading has not been confirmed by an authoritative source.
Changes to properties that are not provisional require UTC approval. As such, the appropriate way to request changes to non-provisional properties is by preparing and submitting a proposal, or submitting feedback via the Contact Form, not by submitting a pull request, or creating a new issue in this repository.
The format for the data files in this repository is almost exactly as in the Unihan database, using tabs to delimit the three fields, but with the actual ideograph following the code point in the first column to ease review. For example:
U+4E95 井 kCantonese zeng2
Please use the #unihan
channel in the Unicode Consortium’s Slack organization for general discussions, or for requesting that other property data files be added to this repository.
Please use the #cantonese
channel in the Unicode Consortium’s Slack organization for discussions regarding kCantonese property values.
Copyright © 2021-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.
The contents of this repository are governed by the Unicode Terms of Use and are released under LICENSE.