Improve suggestions/corrections based on context

When the suggestions are provided for non-latin languages. lexical model doesn’t know how to suggest based on correct.

here is an example from hunspell “REP” flag for Kannada.

In the below example, hunspell is providing the dictionary to specify how to search for words in the dictionary. These are alternate letters that hunspell will replace to search in the wordlist.

This is very important feature for corrections and also while suggesting the words.

We need to pass this list in the dictionary and lexical model should be aware of possible wrong typing of letters.

SET UTF-8
FLAG num
TRY ಅ
REP 54
REP ಕ ಖ
REP ಖ ಕ
REP ಚ ಛ
REP ಛ ಚ
REP ಟ ಠ
REP ಠ ಟ
REP ಡ ಢ
REP ಢ ಡ
REP ತ ಥ
REP ಥ ತ
REP ದ ಧ
REP ಧ ದ
REP ಿ ೀ
REP ೀ ಿ
REP ು ೂ
REP ೂ ು
REP ೆ ೇ
REP ೇ ೆ
REP ೊ ೋ
REP ೋ ೊ

Hi @mayura, thank you for the suggestion. I believe we have something like this on our roadmap; not sure if it will be in 14.0 or a future release.

line number 452: const PARTIAL_NFD_LOOKUP = {…}

can we use this constant to achieve the above?

No, that object is there for normalization, not correction.

Hello Marc,

Could we revisit this suggestion, please.

We are doing considerable work in 14.0 on improving suggestions and corrections but I think the functionality you are requesting will have to wait for a future release.

SET UTF-8
FLAG num
TRY ಅ
REP 54
REP ಕ ಖ
REP ಖ ಕ
REP ಚ ಛ
REP ಛ ಚ
REP ಟ ಠ
REP ಠ ಟ
REP ಡ ಢ
REP ಢ ಡ
REP ತ ಥ
REP ಥ ತ
REP ದ ಧ
REP ಧ ದ
REP ಿ ೀ
REP ೀ ಿ
REP ು ೂ
REP ೂ ು
REP ೆ ೇ
REP ೇ ೆ
REP ೊ ೋ
REP ೋ ೊ

You should be able to achieve your goal by using existing functionality. Granted, it could probably be better documented, or at least better highlighted.

From @keymanapp/models-types, index.d.ts:

/**

  • Indicates a mapping function used by the model to simplify lookup operations
  • within the lexicon. This is expected to result in a many-to-one mapping, transforming
  • the input text into a common, simplified ‘index’/‘key’ form shared by all
  • text forms that a person might reasonably interpret as “the same”.
  • Example usages:
    • converting any upper-case characters into lowercase.
    • For English, ‘CAT’ and ‘Cat’ might be keyed as ‘cat’, since users expect all three to be treated as the same word.
    • removing accent marks that may be difficult to type on standard keyboard layouts
    • For French, users may wish to type “jeune” instead of “jeûne” when lazy or if accent marks cannot be easily input.
  • Providing a function targetted for your language can greatly improve a user’s experience
  • using your dictionary.
  • @param text The original input text.
  • @returns The ‘keyed’ form of that text.
    */
    toKey?(text: USVString): USVString;

A bit later:

  • … When possible,
  • it is recommended to accomplish this by defining a toKey (searchTermToKey in model
  • source) instead.

If each of your REP entries above indicates that the two characters should be able to freely replace each other, you’ll want to define a custom method called searchTermToKey in your model’s source that matches the toKey type signature found above.

You can find our default implementations here: https://github.com/keymanapp/keyman/blob/master/developer/js/source/lexical-model-compiler/model-defaults.ts.

Append extra replace statements in your custom implementation in order to accomplish your goals. Turn lines like this:

REP ಛ ಚ

into something like this:

.replace(/ಛ/g, 'ಚ')

Only pick one entry of each pair of lines for this.

This will ensure that use of either letter will look up words that use either letter, even if it’s the opposite of the pair.

1 Like

Thank you. I will try this.