to install an old orthography Malayalam Unicode font which is required to read the posts below]
Shaping behaviour of ZWJ and ZWNJ (let me call them collectively as ZW operator for remaining of the document) in the context of Malayalam:
It forces a shaping engine to take shaping decision before the logical boundary is reached. Characters are read from left to right. ZW operator can have one or two operands to its left.
If ZW operator sees only one operand then outcome is trival - it is the original form of the letter.
If ZWJ sees two operands to its left, then output their ligature form or fallback. If they are consonant+virama, it will be the half form or chillu form of the consonant.
If ZWNJ sees two operands to its left, output their non-joined form.
The output of a ZW operation will not change shape in further shaping decisions. The resultant will be considered as a single operand for those operations.
When a shaping engine gets operand X + virama + operand Y, then it will output the single ligature for X and Y or use the fallbacks:
1. full form of X and sign or subscript form of Y
2. non-joined form with visible virama.
When a shaping engine gets virama + operand X, then the sign form or subscript form of operand X is outputted.
Above summary assumes, there is only one subscript and sign form for a letter. Similar assumption is there about chillu and half forms of a letter.
"Badly (that is the whole point of my critic against the use of ZWJ to mean semi-open ligature instead of more closed ligature). For example, the typical example for ZWNJ/ZWJ is Nagari K.KA क्क. The body of the standard explains how to force the overt virama form with क्क, or the linear form with क्क. But how do you "force" the stacked form (shown in the body of the standard),
particularly when one notes that in today's Devanagari, the linear form क्क is the "preferred", among other at small resolutions or for UI fonts like Mangal?"
There is at least one scenario where a letter has two different sign-subscript forms in Malayalam. It is for VA in the context of LLLA(zha) and YA.
Finding a way to write them separately will require more than the modification/addition of a format character because of the same old argument - a spell checker ignores format characters.
Modificaiton suggestions for Malayalam listing in AllKeys table
-x is the symbol of vowel x
m_ is the anuswara
~ is the virama
H is the visarga
n_ = n ~ zw space= [1A2B.0020]
n~ = n ~ = [1A2B.0021]
nu~ = n ~ = [1A2B.0022]
na = n ~ -a = [1A2B.0021], [1A08.0020]
naa = n ~ -aa = [1A2B.0021], [1A0A.0020]
nau = n ~ -au = [1A2B.0021], [1A17.0020]
n~a = n ~ a = [1A2B.0021], [1A08.0021]
n~au = n ~ au = [1A2B.0021], [1A17.0021]
nka = n ~ k ~ -a = [1A2B.0021], [1A18.0021], [1A08.0020]
Together with anuswara
m_ka = ng ~ k ~ -a = [1A1C.0021], [1A18.0021], [1A08.0020]
m_ja = ny ~ j ~ -a = [1A21.0021], [1A1F.0021], [1A08.0020]
m_ta = ny ~ j ~ -a = [1A2B.0021], [1A27.0021], [1A08.0020]
m_ya = m ~ y ~ -a = [1A30.0021], [1A31.0021], [1A08.0020]
- 'n' and '~' together form a contraction.
- 'na' is represented by an expansion. The symbol of 'a' is a fictitious entity appear only in collation.
- n_, n~ and nu~ have same in primary level with value. But they are different in secondary level. One could very well differentiate 'nu~' from 'n~' in primary level also. This is just a usage example of an generic framework: splitting Consonant and Consonant+vowel-sign as 'n ~ -x'
- Chillus and virama forms are diacritics of base character. So diacritics themselves have to be level-1 ignorable; but should have some weight in level-2. Also, chillu can be thought of as a virama form (vowellessness) + a zero width hidden whitespace. This can be achieved by keeping the secondary value of a chillu less than the virama form.
- Simillarly the full form of a vowel differ from its sign only in secondary value.
- The AllKeys file containing the Default Unicode Collation Element Table (DUCET), and does not currently handle Malayalam accurately. For example, ZWJ is by default ignorable, and NNA + VIRAMA + ZWJ, NNA + VIRAMA are treated as equal.
- Why is there an expansion for malayalam digits? In which level should the digit zero of different scripts should differ?
If Chillus are encoded, the following equivalance should not be used for tailoring:
0D7F = 0D15 0D4D 200D
The behavior of 0D15 0D4D 200D is different from chillu as explained in the solution to Ken's counter-challenge for chillu-challenge.
[Thanks to Åke Persson for educating me on UTS#10