I want keyman lexical model to be 3 column
One column is the text for detection, another column is for output, final column is for frequency
In this way, a prototype for IME is done.
If you can not make this function, I can still set Keyman rule to delete some part of the text to realize this function.
However I hope to see this funtion in the next updating.
If you can make this function for us, and add 30 canditates, and can set to not undiffer tones, I can do more research about IME then.
I mean
If you want to build a cross platform system, it will be very costly, and you will feel tired to find a start point.
But, you can start just from the Android and IOS system.
Because today most of people they type on the mobile device.
Developing a picker system based on touch device will be easier as you already have former experiences on lexical model.
After developing the IME on the phone, you can slowly studying on how to develop it on the windows system and other systems
On this part: the “text for detection” is what the searchTermToKey part of the lexical model specification is for. In theory, you could write a (probably large) function that returns the “detection” form of the word for each “output”. Of course… it would be preferable to have our model compiler do the heavy lifting for you, which I completely understand.
Given your other issue (regarding diacritic sensitivity / “undiffer tonation”), a major replacement of that method with something custom may be able to get you “close” to such a “prototype”. You’d probably want a custom tool written to assist with that function, though, and thus someone knowledgable enough about code to make it work.
It’d still only be a “prototype” and lacking the full set of features we’d want for a proper IME implementation… but if that’s fine by you, hopefully this can provide some good leads.
Here is a example of vietnamese text. The Taiwanese will be similar.
Ngày 3-7, tại Hà Nội, Đài Tiếng nói Việt Nam phối hợp với UBND tỉnh Thanh Hóa họp báo giới thiệu Liên hoan Phát thanh toàn quốc lần thứ XVI, năm 2024. Đây là hoạt động nghiệp vụ của ngành phát thanh Việt Nam, được tổ chức định kỳ 2 năm một lần, nhằm phát hiện, tôn vinh những tác giả, tác phẩm xuất sắc của những người làm báo phát thanh cả nước.
I have tried my best to change the code and it works a little bit.
const source: LexicalModelSource = {
format: ‘trie-1.0’,
wordBreaker: {
use: ‘default’, // we want to use the default word breaker, BUT!
// CUSTOMIZE THIS:
joinWordsAt: [‘-’, ’ ', ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘0’, ‘a’, 'a '], // join words that contain hyphens
}
sources: [‘wordlist.tsv’],
languageUsesCasing: true,
searchTermToKey: function(term, applyCasing) {
return Array.from(term)
.map(function(c) { return applyCasing(‘lower’, c) };
}
};
export default source;
However it seems that the lexical model still can not identify the space as a “not word breaker”, why it can identify the hyphen as a “not word breaker”.
By the way, I found if I add a space before the initial letter in the word in lexical model, the lexical model will delete this space automatically. Do you know how can I solve the issue?