Creating a Proper Applicable Lexical Model

Hello everyone,

I am currently working on a keyboard IME for Nsibidi, an ideographic system from West Africa and I was able to make the keyboard but now I am having issues applying my custom lexical model to render the keyboard to allow typing to render a possible character similar to the chinese pinyin method with latin letters as so,

mu → 木,目,暮,姆

latin characters to ideographic logogram

This is something I’ve been thinking about in the background, and I do have some ideas on how we can support something like this. Unfortunately, I don’t believe we have anything already in-place tailored to IME-style input via the lexical-model system.

If you double-check, you may note that the picker keyboard you reference isn’t designed for use on touch devices; I’d be very surprised if it’s actually operable there. I did do some experiments in the past toward converting our IMEs over to the predictive-text system, but never got terribly far with it.

I’m not sure when we’ll be able to address it well given our current roadmap. It wouldn’t be a small amount of work, though it’s definitely possible to implement given appropriate time and resources.

Our currently-implemented model design admittedly assumes that text in should “roughly equal” text out. The underlying design, however, makes no such assumption.

There are a couple of layers to consider here:

  • Accuracy of the input ‘pinyin’-like (Latin) script
  • Mapping the pinyin-like input to the target script

Our predictive-text engine can handle that first point pretty simply, even when repurposed toward use as a picker IME. There may be a need to ‘tweak’ how heavily ‘correction’ effects contribute to the final picker list, of course, and that’s something I could help with.

The second point, to my knowledge, is probably best handled via table or database lookup. I imagine this could be done via a large JSON file for languages like Chinese, where the same input Latin characters could result in multiple different Chinese-script characters. Something more efficient might be possible for languages that only ever map to a single result.

There’s no existing predictive-text code to facilitate this, but the structures and internal API are in place that a custom solution (or future implementation by us) could be linked in.

@TheJujuman Please reply to this topic if you need further assistance, otherwise it’ll be closed in three weeks.