Text suggestion not working in some cases

Hi All,
Recently I submitted Armenian lexical model for text prediction. I have tested it on Keyman 15.0.266 for Android with Armenian Mnemonic keyboard. There are several issues I would like to mention.

  1. The lexical model contains about 160 000 words and it loads in 10-12 seconds.
  2. When I type a word starting with a quotation mark (e.g. «, “, „ ), it does not suggest words correctly.
  3. In Armenian there are words with hyphen (e.g. ինչ-որ), and when I type the prefix ինչ- incorrect words are suggested.
  4. When I type a word starting with a capital letter, the suggested words should also start with a capital letter.
  5. When I type a word with all letters capital, the suggested words should have all letters capitalized.
  6. When there is no room for suggested long words it would be convenient to see those words with a smaller font size.

I think these issues do not depend on language and should be processed by Keyman app.

There are quite a few issues you’ve noted here, so I hope you don’t mind me addressing this one piece at a time.

  1. The lexical model contains about 160 000 words and it loads in 10-12 seconds.

First off, that load time is clearly not ideal and is quite significant. While I don’t recall any similar load-time reports, that is a rather large wordlist model you’re working with. Unfortunately, that’s presently a limitation of our engine’s architecture; such a large dictionary will require a bit of time to parse. We can look into mitigations for this in the future, but note that our current planned roadmap puts this no sooner than version 18.0 - it’ll be a while. This would be quite complex to tackle and would require significant amounts of dedicated time.

  1. When I type a word starting with a quotation mark (e.g. «, “, „ ), it does not suggest words correctly.
  2. In Armenian there are words with hyphen (e.g. ինչ-որ), and when I type the prefix ինչ- incorrect words are suggested.
  1. When there is no room for suggested long words it would be convenient to see those words with a smaller font size.

Points 2, 3, and 6 are all issues we’ve known about for a little while. While I don’t believe we have a formal issue written up yet for point 3, you can find threads for the other two here:

For the “smaller font size” idea, we actually considered that at one point in the past, but we ran into a different problem when words can get too long. To take a different example from the one seen at that link - here, from the classic film “Mary Poppins”: supercalifragilisticexpialidocious. On any phone-form device, this would become unreadably small… so we’d need to consider how to handle cases like this well at the same time.

One proposal we’ve received in this direction: feat(common/models): Continuous suggestions #2220. If we combine the two ideas, we could simply give the space for two shorter suggestions to a single very-long suggestion if/when needed.

In the meantime, while not immediately obvious, you can ‘drag’ the suggestion and scroll it left and right to see the end of words that aren’t fully shown. Less ideal, to be sure, but it at least handles all cases ‘decently’ (though not ideally) for now.

  1. When I type a word starting with a capital letter, the suggested words should also start with a capital letter.
  2. When I type a word with all letters capital, the suggested words should have all letters capitalized.

For points 4 and 5, we rely on the keyboard to tell us whether or not a language supports capitalization. If there is no ‘caps’-layer functionality in the keyboard, we will not model capitalization in this way - there are languages that don’t want capital forms of letters that other languages for the same script support! While I see that your keyboard does have caps-based rules, the ‘caps’ layer defined within it doesn’t quite match the form we introduced in 15.0 as documented here:
https://help.keyman.com/developer/language/guide/casing-support

Fixing that up within the keyboard should fully address those two points.

For the others… to revisit a point made earlier, much of this is stuff we hope to address directly in the 18.0 release cycle.

1 Like

Some thoughts about mitigating performance issues:

  • From 166835 words in the wordlist, 80,922 words are found only once.
  • Another 23,892 words are found only twice.

It is perhaps not ideal to remove rare words but this should significantly improve performance – the downside is that it is harder for users to know how to spell them, but given many of them are quite unlikely to ever appear (as more frequently used words will take priority), there may not really be a noticeable difference to end users. (You could keep the rarer words in a separate file in the pull request for potential future inclusion.)

Two more minor items:

  • The third most common word in the wordlist seems to be invalid: �է 61253
  • There are 173 two-letter words and 31 one-letter words – these could also be removed (there is minimal benefit for these though.)

(For point of comparison, our English wordlist currently has fewer than 25,000 words, which I think is too few, but it’s still pretty useful.)

For the 1st point, I have removed the rare words from the wordlist and its size become about 85 000 which significantly improved the performance.

For the 3rd point, I think the hyphen is an orthographical mark (e.g. sign-in, follow-up, mother-in-law) therefore it should not be removed during the processing of the search term.

For the 6th point, I found the drag of the suggested word very useful, I didn’t know about it. But one can define a minimum readable font size for suggested words.

For the 4th and 5th points, I have renamed the layer from capslock to caps but I couldn’t test it on my Android smartohone because of a crash.

Excuse me, I was working on an old workspace with a lot of modifications. Please, disregard that crash.
I finally, managed to rename the layer capslock to caps and test it on my Android phone. The result is the same, no capitalized words are suggested.
I could look at the source files if you could share a sample keyboard where capitalized suggestions work.

From a quick test in the Keyman Developer browser emulation, the “obolo_chwerty” keyboard (which has a “caps” layer) and the “katelem_ann-latn_getat” lexical model seem to work together. When on the “caps” layer, suggested words are in all capital letters.

See: keyboards/release/o/obolo_chwerty at master · keymanapp/keyboards · GitHub
and: lexical-models/release/katelem/katelem.ann-latn.getat at master · keymanapp/lexical-models · GitHub

Please let us know if you need more information.

sil_euro_latin and nrc.en.mtnt should also work.

Something I forgot to mention in my previous comment: for a lexical model to have caps-related behavior, you need to add one flag to your .model.ts file:

languageUsesCasing: true

You can see it in use with nrc.en.mtnt, our default English lexical model here: lexical-models/nrc.en.mtnt.model.ts at master · keymanapp/lexical-models · GitHub

That’s actually more important than the keyboard-related notes.

Okay, I’ve also looked at the keyboard + model.

Once you follow the suggestion that @joshua_horton gave above regarding languageUsesCasing, then the capitalizing of suggestions works correctly.

The keyboard’s shift and caps layers could benefit from Keyman 15’s new casing support. I suggest that you follow the notes at Casing support to get solid start-of-sentence and casing detection going in the armenian_mnemonic keyboard.

Not really related, but I also note that there were several rules which had problems with an if() test (lines 28-38), which need the 1 to be changed to '1' (this caused an internal error in the built keyboard; I have noted this as an improvement to make to the compiler):

c Toggle variable for key T
if(option_key_t_is_tyun != '1') U+221A + [NCAPS K_T] > set(option_key_t_is_tyun = '1') save(option_key_t_is_tyun)
if(option_key_t_is_tyun != '1') U+221A + [CAPS SHIFT K_T] > set(option_key_t_is_tyun = '1') save(option_key_t_is_tyun)
if(option_key_t_is_tyun = '1') U+221A + [NCAPS K_T] > set(option_key_t_is_tyun = '0') save(option_key_t_is_tyun)
if(option_key_t_is_tyun = '1') U+221A + [CAPS SHIFT K_T] > set(option_key_t_is_tyun = '0') save(option_key_t_is_tyun)

c Toggle variable for key P
if(option_key_p_is_peh != '1') U+221A + [NCAPS K_P] > set(option_key_p_is_peh = '1') save(option_key_p_is_peh)
if(option_key_p_is_peh != '1') U+221A + [CAPS SHIFT K_P] > set(option_key_p_is_peh = '1') save(option_key_p_is_peh)
if(option_key_p_is_peh = '1') U+221A + [NCAPS K_P] > set(option_key_p_is_peh = '0') save(option_key_p_is_peh)
if(option_key_p_is_peh = '1') U+221A + [CAPS SHIFT K_P] > set(option_key_p_is_peh = '0') save(option_key_p_is_peh)

I’m generating a few issues / pull requests to follow up relating to our interactions here:

Thanks for finding the cause of the crash. I will be aware of such subtlety for the future use of the if statement.
I have made the following changes you advised and the capitalized suggestions started to work.

  1. Rename capslock layer to caps.
  2. Add languageUsesCasing: true to model.ts file.

Thanks to everyone.