Handling punctuation marks in Armenian

In Armenian script some of the punctuation marks (question, exclamation, etc) are put right on the vowel of the last syllable of the word that is being articulated. When typing on Keyman on mobile i’d better type the punctuation after the word and have it placed properly, as otherwise the typing will conflict with word suggestions. In addition I’d want to remove a whitespace before punctuation marks.

I have tried to look into available documentation but found no immediate hints as to how to implement that in Keyman Developer. Any suggestions for the direction to go?

Hi Aram. I’m not sure I understand the question. I’m seeing 4 Armenian keyboards in our release section (there are some in legacy too).
The ones in release are based on Windows keyboards:
https://keyman.com/keyboards/basic_kbdarme
https://keyman.com/keyboards/basic_kbdarmph
https://keyman.com/keyboards/basic_kbdarmty
https://keyman.com/keyboards/basic_kbdarmw

For the first keyboard I see a rule like this in the code:
+ [SHIFT K_BKSLASH] > U+055e
Basically, that means on a US keyboard I would type the | key and I would get the Armenian question mark. There is no extra white space added before the question mark.

Also, I don’t see any reordering happening in that keyboard so I’m not sure I understand the situation well enough to help.

There are some any and index and context statements you can use in writing your rules for reordering. Here’s a starting place: index statement

If this doesn’t help, is there a specific keyboard you are having an issue with?

Hi Lorna,

Thank you for your reply, really appreciate it. Sorry for being unclear. I have designed my own layout as the basic ones does not include specific layouts for mobile devices and seem to be autogenerated. They are not very useful.

My problem comes from the fact that I have also added a lexical model based on a wordlist dictionary to the layout. In Armenian the question mark is put in the middle of the word, like this:

ինչո՞ւ

The curl above the letter «ո» (it’s a vowel, pronounced “o”) is the question mark. The thing is that if I put it as I type, the word suggestions in the bar above the keyboard will stop working. So I’d better type the word, possibly using suggestion to make it easier (Armenian words are pretty long sometimes).

As to the space: when I select suggestion from the bar, it is inserted with a white space after the word. So every time I select the suggestion but intend to put a punctuation mark, I have to remove the white space. So my question was if I can define a rule in the Keyman Developer (possibly in Typescript code) to remove the white space when a punctuation mark is typed in.

Thanks for the explanation Aram. I think someone who understands the lexical models might need to respond to at least the first part of your question.
Regarding the space before punctuation, you could experiment by adding something like this:

store(punct) ".,!"
U+0020 + any(punct) > index(punct,2)

In the store(punct) you would put your Armenian punctuation)
Then, I think the rule would get rid of the space.
I haven’t tried it myself, but I think it would work.

I think this can be solved entirely within the lexical model.

The simplest way to solve this is add a function to ignore the question mark when it is found in the text using the searchTermToKey function. searchTermToKey allows us to massage the text to make it easier to find matching strings in the lexical model wordlist. This function works both ways: it works on input text, but also on the words in the wordlist. It does not modify the output or displayed text: it just simplifies the search key.

With this approach, you would type the question mark as normal – immediately after the «ո» letter – and the lexical model will simply ignore it.

The default lexical model searchTermToKey function ignores diacritic marks. So we can start with that default function, and modify it to meet our needs.

I’ve used a Wikipedia article to select marks which we will ignore, just for the purposes of example. You will of course know better which ones are important!

The following Armenian punctuation marks placed above and slightly to the right of the vowel whose tone is modified, in order to reflect intonation:

Armenian punctuation marks used inside a word:

  • [ ֊ ] The yent’amna is used as the ordinary Armenian hyphen.
  • [ ՟ ] The pativ was used as an Armenian abbreviation mark, and was placed on top of an abbreviated word to indicate that it was abbreviated. It is now obsolete.
  • [ ՚ ] The apat’arts is used as a spacing apostrophe (which looks either like a vertical stick or wedge pointing down, or as an elevated 9-shaped comma, or as a small superscript left-to-right closing parenthesis or half ring), only in Western Armenian, to indicate elision of a vowel, usually /ə/.

The function I have made here is only lightly modified from the default. You may find that you can skip the normalization step – I have not assessed the Armenian Unicode block for its normalization rules. This function should be placed into your .model.ts file; see the documentation for more detail.

searchTermToKey: function (term: string): string {
  // Use this pattern to remove various Armenian marks.
  // See: https://www.compart.com/en/unicode/block/U+0530
  const ARMENIAN_MARKS = /[՜՛՞֊՟՚]/g;

  // Lowercase each letter in the string INDIVIDUALLY.
  // Why individually? Some languages have context-sensitive lowercasing
  // rules (e.g., Greek), which we would like to avoid.
  // So we convert the string into an array of code points (Array.from(term)),
  // convert each individual code point to lowercase (.map(c => c.toLowerCase())),
  // and join the pieces back together again (.join(''))
  let lowercasedTerm = Array.from(term).map(c => c.toLowerCase()).join('');

  // Once it's lowercased, we convert it to NFKD normalization form
  // This does many things, such as:
  //
  //  - separating characters from their accents/diacritics
  //      e.g., "ï" -> "i" + "◌̈"
  //  - converting lookalike characters to a canonical ("regular") form
  //      e.g., ";" -> ";" (yes, those are two completely different characters!)
  //  - converting "compatible" characters to their canonical ("regular") form
  //      e.g., "𝔥𝔢𝔩𝔩𝔬" -> "hello"
  let normalizedTerm = lowercasedTerm.normalize('NFKD');

  // Now, using the pattern defined above, replace each mark with the
  // empty string. This effectively removes all the marks!
  //
  // e.g.,  "ինչո՞ւ" -> "ինչու"
  let termWithoutMarks = normalizedTerm.replace(ARMENIAN_MARKS, '');

  // The resultant key is lowercased, and has no additional marks.
  return termWithoutMarks.normalize('NFC');
},

As you read this function and the corresponding documentation, you may well think of other ways in which you want to play with the search keys!

You may also find the discussions on word breakers and punctuation in the same Advanced Lexical Model Topics area helpful.

Lorna, thank you so much!

I have tried the rule, but the compiler gives a warning that the rule will never be matched because its key code is never fired.

Marc, thank you!

That’s very interesting! I will dive into the code and the documentation to figure out how to use the search keys best.

Marc,

I have tried your approach. Indeed the code lets me clean up the word from all the punctuation marks and the search in dictionary works. But then of course when I chose the word from the dictionary, the punctuation mark is lost, which makes the whole approach of a little value as then I have to go back and put the mark into its place. That means that every time I type a word I will have to chose to either put the punctuation mark and risk losing it if I select a suggestion or avoid using the suggestion or just tap without the marks knowing that if I am using the suggestion I will lose the mark anyways. The last approach is the best one actually if only I could then put the mark and make sure it goes into its place inside the word automagically. The nice thing about Armenian is that the punctuation such as the exclamation or question is always put on the stressed syllable , and it is always the last one in the word, so there is an easy algorithm to place the mark where needed that works almost always, and the exception are easy to take into account, as they are few.

So the ultimate solution to my problem lies in the proper placement of the punctuation marks more than in filtering them in the search key. But I am not sure if that is even possible with Keyman at the moment.

Also I have tried to put the empty string to the key “insertAfterWord” in the “punctuation” member of the LexicalModelSource class. It does the job of eliminating the space, but that’s not convenient, as space is preferred most of the time, it is only not necessary when a punctuation mark is placed right after the word (as in English, actually). I have tried the approach suggested by Lorna, but as I have mentioned, it did not work. I will look into rules to find out what could be done with that.

Yes, you are right. I am sorry for the misdirection. This should still be possible to solve with some additional code in your lexical model – but not trivial. @eddieantonio and @joshua_horton may have some ideas; certainly it’s an interesting challenge!

In terms of the algorithm to insert marks after accepting a word, you could possibly do that with .kmn. Lorna starts things off in the right direction, but perhaps a secondary processing group would be cleaner.

I’ve put some conceptual code here; you’d need to adjust to fit the real algorithm! At the end of your main group, add:

match > use(adjust_marks)

group(adjust_marks)

store(vowel) ...  c add your vowels here
store(cons) ...  c add your consonants here
store(mark) '՜՛՞֊՟՚'

c Basic situation, typed vowel, consonant, mark, end-of-word, reorder
any(vowel) any(cons) any(mark) ' ' > context(1) context(3) context(2) ' '

c With predictive text, typing the mark after accepting a word, reorder:
any(vowel) any(cons) ' ' any(mark) > context(1) context(4) context(2) ' '

Note how the adjust_marks group does not have a using keys clause. This means it processes only the context – which will have been updated by the rule matched in the main group. So what I am trying to do here is allow the mark to be typed after the word, and then dynamically placed after the vowel when the user presses space bar. There may need to be other end-of-word choices such as Enter as well.

Marc, thanks a lot!

I will dive into the technology, as the groups and their logic is something I have to practice and learn about. The code you have suggested seems very advanced to me at the moment, hope when I understand basics better, it will become easier to grasp. When I get things work as expected, I will report back for sure.

Please reply to this thread in 14 days, otherwise it'll be closed automatically.

This topic was automatically closed after 13 days. New replies are no longer allowed.