to install an old orthography Malayalam Unicode font which is required to read the posts below]
Abstact
- Chillu-C1 + C2 is not equivalant to C1 + Virama + C2, in contrast to Rachana's claims.
- Chillu issue is has an existance independant of Samvruthokaram confusion
DetailsIt is proved thru a counter-example.
If Chillu-C1 + C2 is equivalant to C1 + Virama + C2, provide different unicode representations for the following two malayalam words:
- /pin_nilaavum/
- /pinnilaavum/
If there is no solution, that would indicate that Chillu-C1 + C2 is not equivalant to C1 + Virama + C2 and the current chillu representation in Malayalam Unicode is flawed; thus making the section 3 of
Rachana document incorrect.
In other words, the half-form (notation of vowellessness) of C1 is not same as chillu-C1.
Why should they have different encodings?The words /pin_nilaavum/ and /pinnilaavum/ are different in all 3 essential attributes of a word:
- Meaning. /pin_nilaavum/ means 'and shadow of moonlight'. /pinnilaavum/ means 'will be behind'
- Pronounciation. The second 'na' of /pinnilaavum/ is an alveolar and that of /pin_nilaavum/ is dental.
- Orthography. The first 'na' of /pin_nilaavum/ is chillu while we have the conjunct double of 'na' in /pinnilaavum/.
So these two words should have two different Unicode encodings. This argument is exactly same as why 'apple' and 'banana' should have two different Unicode encodings.
The difference should be in some non-joiner characters
See pages 389 to 391 in
chapter 15 of Unicode 4.0.0"ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER are format control
characters. Like other such characters, they should be ignored by
processes that analyze text content. For example, a spelling-checker
or find/replace operation should filter them out. (See Section 2.11,
Special Characters and Noncharacters, for a general discussion of
format con- trol characters.)"
(thanks to Mahesh Pai)
More examples
/van_yavanika/ meaning 'big curtain'
/vanyavanika/ meaning 'wild forest'
/kaN_valayam/ meaning 'eye boundary'
/kaNvalayam/ meaning 'peace of Kanvan - the mythical character'.
/than_vinayam/ meaning 'his/her modesty'
/thanvinayam/ meaning 'policy of a woman'
/man_vikshObham/ meaning 'explosion of mind'
/manvikshObham/ meaning 'fury of a lady'
General patterns of these examples are (both the forms of character capable of forming chillu) + (semi-vowels) and (ന in chillu or pure form) + ന. Above examples are just few from vast number possible with these pattern rules.
Counter-challenge from Kenneth WhistlerIf separate characters are encoded for Malayalam Chillus, so that the "challenge" distinction were to be encoded as:
"nn" is U+0D28, U+0D4D, U+0D28
"n_n" is U+0DXX, U+0D4D, U+0D28
implementers are then faced with determining what to do with the following sequence:
"???" is U+0D28, U+0D4D, U+200D, U+0D28
That sequence, of course, exists now, and would be a legitimate and possible sequence even if a Chillu-n is encoded. So how would a rendering engine render that sequence, and how would
it be distinguished, by an end user or a text process such as a search engine, from the proposed U+0DXX, U+0D4D, U+0D28 sequence for "n_n"?
That counter-challenge needs a "solution" for the encoding of Chillu characters to make sense for Malayalam. For if there is no solution forthcoming, addition of Chillu characters would potentially be *increasing* the ambiguity potential for the Unicode representation of Malayalam text, rather than decreasing it.
Solution to Ken's challenge
Half-form of NA (ന) is not chillu. It is described in detail here.
Now, we can use the rules in
ZERO WIDTH JOINER in Indic Scripts standard to see the behavior of the challenge sequence. It will form the conjunct double ന്ന /nna/ as per the second bullet in the section 7-proposal of above pr#37.