Undiffer tonation in lexical model will cause strong chaos in lexical model

Undiffer tonation in lexical model will cause strong chaos in lexical model when I type Vietnamese and Taiwanese.
How should I cancel it?

Hello @Cathaylab,

Keyman do not support this feature yet but you could come up with a workaround as said below. Read this section to answer your question.

Context from the guide above:

…However, there are cases, such as with SENĆOŦEN, where some characters do not decompose into a base letter and a diacritic. In this case, it is necessary to write your own key function.

Can you give me some examples?
Can you code for me? Please.
Because normally I need some example to understand everything.
In the existed keyman lexical model page. There is no any lexical model built in my requiry.
I can not copy any former works so I feel hard to code.
I am sorry about that.

If I only want to remain the upper and lower case feature and remove all other features. How can we code?

const source: LexicalModelSource = {
format: ‘trie-1.0’,
wordBreaker: ‘default’,
joinWordsAt: [‘-’, ’ ']
sources: [‘wordlist.tsv’],
languageUsesCasing: false,
wordBreaker: function(str: string) {
return str.split(/\s|\u200b/).map(function(token) {
return {
left: str.indexOf(token),
start: str.indexOf(token),
right: str.indexOf(token) + token.length,
end: str.indexOf(token) + token.length,
text: token
}
});
},
punctuation: {
insertAfterWord: “”
}
};
export default source;

here is the lexical document file. How can we change the document?

Hello @Cathaylab,

We regret that implementing this functionality has proven challenging for you. I am unable to conduct additional study on this at the moment. Please allow us some time to review your feature request and determine what we can do.

This feature has been requested on Keyman’s repo on GitHub.

Could you give me some example cases with actual Vietnamese and/or Taiwanese text, so I can be clear about what your request is?

If I had to guess, it sounds like you do not wish for text like “cafe” to match “café” - that is, you do not wish to have e and é treated as if they’re the same letter when making suggestions. So, for English, “cafe” could suggest “cafeteria”, but “café” could not - as there’s no é in “cafeteria”.


If my guess is correct and this is the issue you’re raising, then…

Here’s the full code we use, for your current case, to differentiate what you type when it comes to making predictions:

This is the part causing you problems:

That’s what is responsible for stripping the diacritic off of your letters and making e and é act the same. You’d probably prefer something like this:

const source: LexicalModelSource = {
  format: ‘trie-1.0’,
  wordBreaker: ‘default’,
  joinWordsAt: ['-', ' ']
  sources: ['wordlist.tsv'],
  languageUsesCasing: true,
  searchTermToKey: function(term, applyCasing) {
    return Array.from(term)
      .map(function(c) { return applyCasing('lower', c) }
      .join('')
      // Replace directional quotation marks with plain apostrophes
      .replace(/[‘’]/g, "'")
      // Also double-quote marks.
      .replace(/[“”]/g, '"');
  }
}

export default source;

This will make e and é (and all other similar cases) act as completely different letters, differentiating them thoroughly.

Also, I removed the later wordBreaker definition here. I severely doubt that it’s needed for Vietnamese. Taiwanese, on the other hand, may well need it - I’m personally less familiar with that.

If you need something more nuanced than this, I will need more detail in order to better assess those nuances.

1 Like

it shows this error
Model source does have a default export. Did you remember to write export default source;?

It also shows this error
wordlist.tsv:24676 - fatal KM03001: Unexpected exception: ReferenceError: wordform is not defined

I mean coding is almost impossible for us
We can only do copy and paste the premade codes.

I did leave that line off as well; I’ll edit the original code block to include it.

And just fixed this bit too. My apologies for the trouble.

Can you send the fixed version to me from beginning to ending?
Then I can copy it in my file.

You have not change the code in good version, because there is multiple issues
Please put the code into the newese Keyman developper then send me
best regards,
Cathaylab

The conversation continues here.

This topic was automatically closed after 12 days. New replies are no longer allowed.