Advanced text replacement - is this possible?

kmshames · March 19, 2024, 1:05am

I have two advanced text replacement rules I’d like to implement in a touch keyboard layout.

The first is that I have a deadkey for acute accent, which types an independent accent character. If it is followed by a vowel, it replaces with an acute accented vowel; if it is followed by any other keystroke (including multi-character keys), then the accent is deleted and the key types as normal.

I would like it so that if I tap acute accent followed by [T_NG], then the acute accent character would disappear and I would just get ‘nd’. This would be something like
[T_ACUTE] + any(vowels) > index(acutevowels, 2)
[T_ACUTE] + any(keys) > keystroke

Where “keystroke” is like “context”, but outputs the normal output for whatever virtual key caused it. As far as I can tell this doesn’t exist, but I am looking for a potential workaround. (If nothing else, I can fall back on either making the deadkey not give visual feedback, or to have the accent key typed after the vowel, not before)

Second:

I would like to have a key that can apply accents retroactively to vowels in a word, after it has been typed. So for example:

nihruaa [T_ACUTE] [T_GRAVE][T_NOTONE][T_ACUTE] becomes níhrùaá

I can hardcode this behavior using a lot of complex rules, but I am looking for a more concise solution.

In particular, I’m wondering if it’s possible to make an optional context element. The orthographic words in Itunyoso Triqui have a relatively constrained template of:
((C)(C)V) ((C)(C)V) ((C)(C)V) ((C)(C)(C)V(V)(n)(jh))
(Up to three (C)(C)V syllables, and a final syllable with up to three onset C’s, optional double vowel in the final syllable, an optional ‘n’, and an optional ‘j’ or ‘h’ after the last vowel)

which could be implemented easily if I could have a context store that matches either any letter or nothing (akin to a “?” in regular expressions).

Is this sort of thing possible? The brute force alternative would be to create a large number of rules for each possible configuration, so it would be very helpful!

Marc · March 19, 2024, 10:16pm

kmshames:

I would like it so that if I tap acute accent followed by [T_NG], then the acute accent character would disappear and I would just get ‘nd’. This would be something like
[T_ACUTE] + any(vowels) > index(acutevowels, 2)
[T_ACUTE] + any(keys) > keystroke
Where “keystroke” is like “context”, but outputs the normal output for whatever virtual key caused it. As far as I can tell this doesn’t exist, but I am looking for a potential workaround. (If nothing else, I can fall back on either making the deadkey not give visual feedback, or to have the accent key typed after the vowel, not before)

Some of your earlier rules are already doing exactly this, if I am understanding your description correctly?

'´' dk(acute) + [NCAPS K_A] > 'á'

So you would need something like this:

+ [T_ACUTE] > '´' dk(acute)
'´' dk(acute) + [T_NG] > 'nd'

~~(Or add [T_NG] to the vowels store, perhaps renaming it to vowelKeys for clarity)~~ Can’t do this because stores are 1:1 for characters but you are emitting two chars!

Second question: no, in .kmn we don’t currently support optional matches, so you’d probably need multiple rules to do this.

However, see the upcoming LDML keyboard standard (warning: draft link), which we will be supporting in Keyman 17.0 for desktop platforms, and for touch in Keyman 18.0. This does support complex regex matches and is working in 17.0 beta for desktop keyboards. I’d love to see a proof of concept of your ideas which would help us to validate the need for complex transforms as we have implemented.

Matthew_Lee · March 20, 2024, 1:09pm

Marc’s answer is great, but I did semi “brute force” things by using character classes. This way be relevant to your first case of disappearing characters.

From my keyboard: keyboards/release/sil/sil_cameroon_qwerty/source/sil_cameroon_qwerty.kmn at master · keymanapp/keyboards · GitHub

store(word) " aæαɛbɓcdɗeəfgʼꞌhiɨjklmnŋoɔøœpqrstuʉvwẅxyƴzAÆⱭƐBƁCDƊEƏFGꞋHIƗJKLMNŊOƆØŒPQRSTUɄVWẄXYƳZ◌-'" U+0300 U+0304 U+0301 U+030C U+0302 U+0303 U+0308 U+0327 U+030D U+0330
store(letter) "aæαɛbɓcdɗeəfgʼꞌhiɨjklmnŋoɔøœpqrstuʉvwẅxyƴzAÆⱭƐBƁCDƊEƏFGꞋHIƗJKLMNŊOƆØŒPQRSTUɄVWẄXYƳZ◌"
store(lc) "aæαɛbɓcdɗeəfgʼꞌhiɨjklmnŋoɔøœpqrstuʉvwẅxyƴz◌"
store(uc) "AÆⱭƐBƁCDƊEƏFGʼꞋHIƗJKLMNŊOƆØŒPQRSTUɄVWẄXYƳZ◌"
store(nfpunct) ",:;"
store(final) ".!?"
store(num) "0123456789"
store(diablock) " 0123456789?!;:'-_=<©®>.,[]{}\|/@°#$%^&*()«»‹›‘“’”€¥£…†ʼꞌꞋ" U+0022
store(dia) U+0300 U+0304 U+0301 U+030C U+0302 U+0303 U+0308 U+0327 U+030D U+0330

My keyboard is for an national orthography and not a single specific language, so I don’t know which letters could take diacritics, but I have the diablock and the dia stores that allow me to write exception rules and avoid diacritics on punctuation and numbers…

Specific Rule:

any(diablock) + [T_0300] > context
c Don't output a 0300 on  diacritic blockers

General:

any(diablock) + any(dia) > context 
c Don't output any diacritics on any diacritic blockers.

This could be adjusted to your first case.

For the second case, you may be able to use a vowel store, a consonant store, maybe even a semivowel store to build the multiple rules you will need (not a complete or tested example). This will at least be better than building rules for each letter.

store(C) "rstln"
store(V) "aeiou"
stare(S) "jh"
any(C) any(C) any(V) any(S) + any(dia)  > index(C, 1) index(C, 2) index(V,3) index(dia, 5) index(S,4) 
any(C) any(C) any(V) + any(dia) > index(C, 1) index(C, 2) index(V,3) U+0300

With all that said, Regex will be much nicer.

kmshames · March 20, 2024, 5:13pm

Thank you for the answer! This makes sense, I hadn’t considered I could just make a rule for each key.

kmshames · March 20, 2024, 5:15pm

This is a great workaround - the number of possible word shapes means I probably can’t implement it for all of them, but it would be feasible for one and two syllable words.

kmshames · March 20, 2024, 10:03pm

Update: I found a workaround:

store(a_z) ‘abcedfghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’
dk(ng) dk(ch) dk(kw) dk(chr)

store(plainvowels) ‘aeiouAEIOU’
store(acutevowels) ‘áéíóúÁÉÍÓÚ’

any(plainvowels) any(a_z) any(a_z) any(a_z) any(a_z) + [T_H_BACK] >
index(acutevowels, 1) context (2) context(3) context(4) context(5)

any(plainvowels) any(a_z) any(a_z) any(a_z) + [T_H_BACK] >
index(acutevowels, 1) context (2) context(3) context(4)

any(plainvowels) any(a_z) any(a_z) + [T_H_BACK] >
index(acutevowels, 1) context (2) context(3)

any(plainvowels) any(a_z) + [T_H_BACK] >
index(acutevowels, 1) context (2)

any(plainvowels) + [T_H_BACK] >
index(acutevowels, 1)

any(plainvowels) any(a_z) any(a_z) any(a_z) any(a_z) + [T_NOTONE] >
index(plainvowels, 1) dk(vowel) context (2) context(3) context(4) context(5)

any(plainvowels) any(a_z) any(a_z) any(a_z) + [T_NOTONE] >
index(plainvowels, 1) dk(vowel) context (2) context(3) context(4)

any(plainvowels) any(a_z) any(a_z) + [T_NOTONE] >
index(plainvowels, 1) dk(vowel) context (2) context(3)

any(plainvowels) any(a_z) + [T_NOTONE] >
index(plainvowels, 1) dk(vowel) context (2)

any(plainvowels) + [T_NOTONE] >
index(plainvowels, 1) dk(vowel)

Because of the built-in behavior to check for the longest context first, it simply looks for the vowel before the longest uninterrupted chain of letters (and deadkeys from some other rules I have that aren’t intended to interrupt this process) in the context (no spaces = same word). I only included up to 5-letter words in this post, but I actually have it up to 15 characters of context to match almost any theoretical word.

It then either turns the letter into its acute accent version, or leaves it plain but inserts a dk(vowel) deadkey after the vowel. Since neither acute accented vowels nor dk(vowel) are included in store(a_z), it essentially “marks off” that vowel as having been dealt with, and the next longest context match will be for the next vowel in the word.