Predictive text

I want to develop predictive text for Biblical Hebrew. (I’m developing a keyboard to make it easier to type Biblical and Liturgical Hebrew.) I can find plain txt files, which is great. My question is whether you have any advice as to whether it’s better to use the text with accents (as well as base consonants and combining vowels) or with just consonants and vowels without accents. Including the accents would eliminate the need to go back and add them if one wants them, but (a) that might change the frequencies per word, and (b) in many cases, the accents are unnecessary.
thanks

That’s a really good question and hard to answer. It really depends on your users I think!

Two things to think about:

  1. Look into setting up a custom filter on search terms so that users can type the unaccented words and yet have the accented words predicted. This is a bit more advanced but it can work really, really well. See the searchTermToKey function in the SENĆOŦEN keyboard at https://github.com/keymanapp/lexical-models/blob/master/release/nrc/nrc.str.sencoten/source/nrc.str.sencoten.model.ts#L5 for an example of how to ignore certain characters while searching.

  2. Perhaps you should consider two wordlists: one with accents and one without. That way your users can decide.

This might be a case for having a variant language tag. There is nothing appropriate registered. But perhaps he-x-accents for the accented text and he for unaccented?

I suspect it’s probably just easier to offer two predictive models to the user – they can choose their preferred model.

Respectfully, due to the inactiveness of the conversation, this topic is now closed for any further discussion.
Please feel free to create a new topic if there is any question or the issue persists.