The touch keyboard we are testing at the moment the lexical model was created based on 4 textfiles. One from our first large database of texts, the 2nd from another database of texts, the third with just the lexemes of the dictionary (those all have frequency 1) and the 4th textfile is from a database of words from exemple sentences from the same dictionary.
We have noticed that words that are frequently typed by testers, like “goodmorning” (one word in the language) does not necessarily score high in any of the 4 text databases, as do several other frequent types of greetings. So when typing “good…etc” it is not the word “goodmorning” that is proposed, but other more frequently used words like say “goodafternoon”. Now let us say “goodafternoon” has a frequency of 500 in file nr 1 and a frequency of 300 in file number 2, 1 in file number 3 and 20 in file number 4. So in all 821.
To remedy the situation, I tried to add a 5th textfile that just contains let us say problem words. So in that fifth file, I added “goodmorning” with a frequency of 822.
This change does not influence the predictive model though. Still goodafternoon is proposed first. How can I remedy this in the model?