to install an old orthography Malayalam Unicode font which is required to read the posts below]
Modificaiton suggestions for Malayalam listing in
AllKeys table thru examples:
-x is the symbol of vowel x
m_ is the anuswara
~ is the virama
H is the visarga
n_ = n ~ zw space= [1A2B.0020]
n~ = n ~ = [1A2B.0021]
nu~ = n ~ = [1A2B.0022]
na = n ~ -a = [1A2B.0021], [1A08.0020]
naa = n ~ -aa = [1A2B.0021], [1A0A.0020]
..
nau = n ~ -au = [1A2B.0021], [1A17.0020]
n~a = n ~ a = [1A2B.0021], [1A08.0021]
...
n~au = n ~ au = [1A2B.0021], [1A17.0021]
nka = n ~ k ~ -a = [1A2B.0021], [1A18.0021], [1A08.0020]
...
Together with anuswara
m_ka = ng ~ k ~ -a = [1A1C.0021], [1A18.0021], [1A08.0020]
...
m_ja = ny ~ j ~ -a = [1A21.0021], [1A1F.0021], [1A08.0020]
...
m_ta = ny ~ j ~ -a = [1A2B.0021], [1A27.0021], [1A08.0020]
...
m_ya = m ~ y ~ -a = [1A30.0021], [1A31.0021], [1A08.0020]
...
Notes:
- 'n' and '~' together form a contraction.
- 'na' is represented by an expansion. The symbol of 'a' is a fictitious entity appear only in collation.
- n_, n~ and nu~ have same in primary level with value. But they are different in secondary level. One could very well differentiate 'nu~' from 'n~' in primary level also. This is just a usage example of an generic framework: splitting Consonant and Consonant+vowel-sign as 'n ~ -x'
Thoughts - Chillus and virama forms are diacritics of base character. So diacritics themselves have to be level-1 ignorable; but should have some weight in level-2. Also, chillu can be thought of as a virama form (vowellessness) + a zero width hidden whitespace. This can be achieved by keeping the secondary value of a chillu less than the virama form.
- Simillarly the full form of a vowel differ from its sign only in secondary value.
- The AllKeys file containing the Default Unicode Collation Element Table (DUCET), and does not currently handle Malayalam accurately. For example, ZWJ is by default ignorable, and NNA + VIRAMA + ZWJ, NNA + VIRAMA are treated as equal.
- Why is there an expansion for malayalam digits? In which level should the digit zero of different scripts should differ?
-
If Chillus are encoded, the following equivalance should not be used for tailoring:
0D7F = 0D15 0D4D 200D
...
The behavior of 0D15 0D4D 200D is different from chillu as explained in the solution to Ken's counter-challenge for chillu-challenge.
[Thanks to Åke Persson for educating me on
UTS#10]