to install an old orthography Malayalam Unicode font which is required to read the posts below]
In section 4
, Rachana establishes that chillu-ല is not chillu-ത. However, real issue is not with chillu-ല. It is with chillu-ര and chillu-ള, because they also are chillu-റ and chillu-ഴ, respectively.
This fact is clearly established in the formost grammar book of Malayalam: Keralapaanineeyam by A. R. Rajaraja Varma. See the relevant scan from the section peethika: 4.varnnavikaarangal
below:Possible solutions and their implications
I am considering only chillu-ര/റ right now. These thoughts are applicable to chillu-ള/ഴ as well.
- Encode this chillu as chillu-ര only. That is, only RA + VIRAMA + ZWJ will form the chillu. This would cause wrong collation ordering for words with chillu-റ. That is, /kaarr_mEgham/ from the above example will get wrong place in the collation order.
- Both RA + VIRAMA + ZWJ and RRA + VIRAMA + ZWJ form chillu-ര/റ. This gives a uniqueness rule warning: "if this scheme is allowed, a document (eg: a wiktionary.org document) written by multiple people using various inputting tools can quite possibly have different 'spellings' for a word, without reader or writer being aware of it. This can cause many problems including ineffective searches and inconsistent collation".
This is an example where codepoint and characters of language need to differ. A human can understand the character from its word context which is not available/usable for a computer using codepoint.
Thus collation correctness of chillu forming characters to their underlying letter identity is impossible in codepoint-space without the help of sophisticated text processing at higher levels. This would in turn mean, the collation correctness is not an argument until a new solution or perspective is proposed. Till then, both the choices of encoding chillu with a control/format character or giving independant codepoint for it, have to be evaluated with respect to rest of the merits these options have.