Problem with some input sequences in Indic SMP scripts

Namaste. I am using Keyman 12.0.55 with LibreOffice 6.3.0 on Windows 10 and facing a problem with a keyboard I am creating. I provide as minimal test case the following code:

+ "k" > $ka $virama
+ "a" > $vowelAOut
$virama + "a" > nul

$vowelAOut + any(aBasedVowelsIn) > index(aBasedVowelsIndepOut, 2)
$ka + any(aBasedVowelsIn) > context index(aBasedVowelsDepOut, 2)

If it is fed Devanagari characters as seen below:

store(ka) U+0915
store(virama) U+094D
store(vowelAOut) U+0905

store(aBasedVowelsIn)       "aiu"
store(aBasedVowelsIndepOut) U+0906 U+0910 U+0914
store(aBasedVowelsDepOut)   U+093E U+0948 U+094C

โ€ฆ it is working and I am able to get เค† เค เค” เค•เคพ เค•เฅˆ เค•เฅŒ by typing โ€œaaโ€ โ€œaiโ€ โ€œauโ€ โ€œkaaโ€ โ€œkaiโ€ โ€œkauโ€.

However the same code does not work if it is fed Grantha characters:

store(ka) U+11315
store(virama) U+1134D
store(vowelAOut) U+11305

store(aBasedVowelsIn)       "aiu"
store(aBasedVowelsIndepOut) U+11306 U+11310 U+11314
store(aBasedVowelsDepOut)   U+1133E U+11348 U+11357

โ€ฆ and while I get ๐‘Œ† ๐‘Œ ๐‘Œ” for โ€œaa ai auโ€, I do not get ๐‘Œ•๐‘Œพ ๐‘Œ•๐‘ˆ ๐‘Œ•๐‘— for โ€œkaaโ€ โ€œkaiโ€ โ€œkauโ€. Instead, the last input vowel fails to combine and the output is: ๐‘Œ•๐‘Œ… ๐‘Œ•i ๐‘Œ•u.

I also tested Bengali, Tamil and Brahmi, and find that BMP scripts work fine whereas SMP scripts do not. I wonder if this has something to do with UTF-16 representation of the input character?

I also found that this happens only if output the virama for the basic consonant keystroke and remove it with โ€œaโ€. If I instead do not use the virama to be cancelled by โ€œaโ€ and replace the first three lines of my programming with just:

+ "k" > $ka
+ "a" > $vowelAOut

โ€ฆ then the sequences โ€œkaโ€ โ€œkiโ€ โ€œkuโ€ output ๐‘Œ•๐‘Œพ ๐‘Œ•๐‘ˆ ๐‘Œ•๐‘— correctly as expected. So the problem arises only with using the virama and cancelling, and that too only with SMP scripts as mentioned above.

I also wonder whether the following passage in the named constants reference (apparently I canโ€™t display it even by removing http:// because I am a โ€œnew userโ€ on this forum) has something to do here, because I am using named consonants (may not seem necessary in above minimal example but makes for readability in full code) -

Named constants are supported for characters above plane 0.

Seeing the above behaviour (and considering why such a statement would at all be necessarsy) I wonder if it should actually read โ€œare not supportedโ€!

Any help would be appreciated. Thanks!

Hi @jamadagni and welcome to the forum :slight_smile:

The joys of internet forums โ€ฆ Iโ€™ve made it possible for you to post a link now.

Um yes, that definitely seems backwards to me. The named constant support is pretty old (try 15+ years) and could use some love. Iโ€™ve added an issue to add support for named constants for SMP chars as it is a compile-only change and thus should be reasonably straightforward โ€“ at least until I look at the code in question and shudder at its age.

For now, youโ€™ll have to use outs(foo) rather than $foo and define the constants at the top of your .kmn file. Sorry :frowning:

Iโ€™ve managed to do some testing here now and reproduced the issue. After some more investigation I found that the issue is not with named character codes but rather with the nul statement in some applications (specifically, on Windows, applications that do not use Text Services Framework, including LibreOffice). Thus, Iโ€™ve closed the original issue and opened #2398 in its place. Now investigating the issue there. (Weโ€™ll have to undo the โ€˜updateโ€™ we made to the help :wink:

Weโ€™ve patched this issue and the fix should be available in version in the next day or so.

1 Like

๐‘Œจ๐‘Œฟ๐‘Œค๐‘Œฐ๐‘Œพ๐‘Œฎ๐‘ ๐‘Œ‰๐‘Œช๐‘Œ•๐‘ƒ๐‘Œค๐‘Œพ ๐‘Œต๐‘Œฏ๐‘Œฎ๐‘เฅค ๐‘Œง๐‘Œจ๐‘๐‘Œฏ๐‘Œต๐‘Œพ๐‘Œฆ๐‘Œพ๐‘Œƒ! :wink: