Preparsed UCD
What
A text file with preparsed UCD (Unicode Character Database) data.
- Preparser script: tools/unicode/py/preparseucd.py
- ppucd.txt output: icu4c/source/data/unidata/ppucd.txt (raw text version)
- Parser for ppucd.txt: icu4c/source/tools/toolutil/ppucd.h & .cpp
- genprops tool rewritten to use that: tools/unicode/c/genprops
Syntax
# Preparsed UCD generated by ICU preparseucd.py
Only whole-line comments starting with #, no inline comments.
ucd;10.0.0
Data lines start with a type keyword. Data fields are semicolon-separated. The number of fields per line is highly variable.
The ucd line should be the first data line. It provides the Unicode version number.
property;Binary;Alpha;Alphabetic
property;Enumerated;bc;Bidi_Class
Property lines define properties with a type and two or more aliases.
binary;N;No;F;False
binary;Y;Yes;T;True
value;bc;ON;Other_Neutral
Property value lines define the values of enumerated and catalog properties, with the property short name and two or more aliases for each value.
There is only one shared definition of the values and aliases for binary properties.
defaults;0000..10FFFF;age=NA;bc=L;blk=NB;bpt=n;cf=;dm=;dt=None;ea=N;FC_NFKC=;gc=Cn;GCB=XX;gcm=Cn;hst=NA;InPC=NA;InSC=Other;jg=No_Joining_Group;jt=U;lb=XX;lc=;NFC_QC=Y;NFD_QC=Y;NFKC_CF=;NFKC_QC=Y;NFKD_QC=Y;nt=None;SB=XX;sc=Zzzz;scf=;scx=