Creating a Proper Applicable Lexical Model

Hello everyone,

I am currently working on a keyboard IME for Nsibidi, an ideographic system from West Africa and I was able to make the keyboard but now I am having issues applying my custom lexical model to render the keyboard to allow typing to render a possible character similar to the chinese pinyin method with latin letters as so,

mu → 木,目,暮,姆

latin characters to ideographic logogram

This is something I’ve been thinking about in the background, and I do have some ideas on how we can support something like this. Unfortunately, I don’t believe we have anything already in-place tailored to IME-style input via the lexical-model system.

If you double-check, you may note that the picker keyboard you reference isn’t designed for use on touch devices; I’d be very surprised if it’s actually operable there. I did do some experiments in the past toward converting our IMEs over to the predictive-text system, but never got terribly far with it.

I’m not sure when we’ll be able to address it well given our current roadmap. It wouldn’t be a small amount of work, though it’s definitely possible to implement given appropriate time and resources.

Our currently-implemented model design admittedly assumes that text in should “roughly equal” text out. The underlying design, however, makes no such assumption.

There are a couple of layers to consider here:

  • Accuracy of the input ‘pinyin’-like (Latin) script
  • Mapping the pinyin-like input to the target script

Our predictive-text engine can handle that first point pretty simply, even when repurposed toward use as a picker IME. There may be a need to ‘tweak’ how heavily ‘correction’ effects contribute to the final picker list, of course, and that’s something I could help with.

The second point, to my knowledge, is probably best handled via table or database lookup. I imagine this could be done via a large JSON file for languages like Chinese, where the same input Latin characters could result in multiple different Chinese-script characters. Something more efficient might be possible for languages that only ever map to a single result.

There’s no existing predictive-text code to facilitate this, but the structures and internal API are in place that a custom solution (or future implementation by us) could be linked in.

@TheJujuman Please reply to this topic if you need further assistance, otherwise it’ll be closed in three weeks.

This topic was automatically closed after 21 days. New replies are no longer allowed.