Lexical model not suggesting words with an apstrophe mid word

MarkSkinner · December 3, 2021, 4:19pm

In one of our projects the apostrophe (U+0027) is used to mark elision. But words containing the apostrophe never seem to be suggested. Is the Lexical model seeing apostrophe as word breaking? If so how do i make it treat apostrophe as word forming? Or am i just missing something obvious?

jheath · December 4, 2021, 9:59am

The bad new is that the apostrophe (U+0027) is not a word-building character. It’s Unicode category is “Punctuation, Other [Po]” (see Unicode Character 'APOSTROPHE' (U+0027)). That means that any software that treats Unicode characters properly should not use it in part of a word. I think the best solution is to convert data in this language so the apostrophe goes to U+02BC Modifier Letter Apostrophe. It’s Unicode category is “Letter, Modifier [Lm]” (see Unicode Character 'MODIFIER LETTER APOSTROPHE' (U+02BC)), so it would be considered part of the word. (Another possible character to use is the Saltillo (U+A78C), which has upper and lowercase versions, but that seems to be used more in Central America…)

Note that this is not necessary an easy solution. In addition to modifying existing data, you may need to change Keyman keyboards to be able to type this character, and then there’s the whole process of convincing people who type in this language to use that character, even though it looks very much like an apostrophe.

Maybe the Keyman folks have a work-around that will let you use the apostrophe, but I’m afraid you will continue to run across this sort of problem with this language in the future.

Marc · December 5, 2021, 9:25am

@jheath is correct. While you can configure the lexical model to recognise apostrophe as a word forming character, you’re still going to run into trouble with it in so many other contexts, so if possible, you should look at using U+02BC, which is reasonably widely used for this purpose in various orthographies.

Nyny · January 23, 2024, 8:29am

This topic was automatically closed after 14 days. New replies are no longer allowed.