When the suggestions are provided for non-latin languages. lexical model doesn’t know how to suggest based on correct.
here is an example from hunspell “REP” flag for Kannada.
In the below example, hunspell is providing the dictionary to specify how to search for words in the dictionary. These are alternate letters that hunspell will replace to search in the wordlist.
This is very important feature for corrections and also while suggesting the words.
We need to pass this list in the dictionary and lexical model should be aware of possible wrong typing of letters.
SET UTF-8
FLAG num
TRY ಅ
REP 54
REP ಕ ಖ
REP ಖ ಕ
REP ಚ ಛ
REP ಛ ಚ
REP ಟ ಠ
REP ಠ ಟ
REP ಡ ಢ
REP ಢ ಡ
REP ತ ಥ
REP ಥ ತ
REP ದ ಧ
REP ಧ ದ
REP ಿ ೀ
REP ೀ ಿ
REP ು ೂ
REP ೂ ು
REP ೆ ೇ
REP ೇ ೆ
REP ೊ ೋ
REP ೋ ೊ
We are doing considerable work in 14.0 on improving suggestions and corrections but I think the functionality you are requesting will have to wait for a future release.
SET UTF-8
FLAG num
TRY ಅ
REP 54
REP ಕ ಖ
REP ಖ ಕ
REP ಚ ಛ
REP ಛ ಚ
REP ಟ ಠ
REP ಠ ಟ
REP ಡ ಢ
REP ಢ ಡ
REP ತ ಥ
REP ಥ ತ
REP ದ ಧ
REP ಧ ದ
REP ಿ ೀ
REP ೀ ಿ
REP ು ೂ
REP ೂ ು
REP ೆ ೇ
REP ೇ ೆ
REP ೊ ೋ
REP ೋ ೊ
You should be able to achieve your goal by using existing functionality. Granted, it could probably be better documented, or at least better highlighted.
Indicates a mapping function used by the model to simplify lookup operations
within the lexicon. This is expected to result in a many-to-one mapping, transforming
the input text into a common, simplified ‘index’/‘key’ form shared by all
text forms that a person might reasonably interpret as “the same”.
Example usages:
converting any upper-case characters into lowercase.
For English, ‘CAT’ and ‘Cat’ might be keyed as ‘cat’, since users expect all three to be treated as the same word.
removing accent marks that may be difficult to type on standard keyboard layouts
For French, users may wish to type “jeune” instead of “jeûne” when lazy or if accent marks cannot be easily input.
Providing a function targetted for your language can greatly improve a user’s experience
using your dictionary.
@param text The original input text.
@returns The ‘keyed’ form of that text.
*/
toKey?(text: USVString): USVString;
A bit later:
… When possible,
it is recommended to accomplish this by defining a toKey (searchTermToKey in model
source) instead.
If each of your REP entries above indicates that the two characters should be able to freely replace each other, you’ll want to define a custom method called searchTermToKey in your model’s source that matches the toKey type signature found above.