Order of typing Hebrew characters

How does the keyboard know to start processing a string? If I type letter vowel vowel vowel letter, how does it know that I intend the first 2 vowels to go with the preceding letter and the third to go with the following letter?
I think we need to say that everything else can be entered randomly, but the letter must precede everything else. Then when the next letter is typed, we can process the entire string. Would that work?
I’ll need to get more details on the constraints. Rule of thumb is that 2 marks of the same class occur not infrequently on a letter, but my guess is that 3 or more are rare.

How it works depends on the keyboard developer.

For some scripts it is safe to assume that all diacritics combine with the base character that precedes them. Although for some language specific keyboards I deliberately do not do that. Is for a sequence like āä it can be typed a + diaeresis + a + dieresis. But I also allow a + a + dieresis to generate ää as a typing shortcut since the sequence aä is meaningless and can not exist in the orthography the layout the keyboard was designed for.

Alternatively some scripts have combining characters that appear visually before the base character but are stored after the base character. For such scripts there is a distinction between visual and logical keyboards where certain class of combining characters are typed before the base (in visually ordered input) but are reorder by keyboard rules to an appropriate position after the base character.

So as I said it’s really up to the keyboard developer. But care needs to be taken when the complexity of the rules increase.

Do I understand correctly that the keyboard must be either visual or logical? I.e., on one the user must enter the combining character before the base character and in the other the user must enter the combining character after the base character?

Since I’m aiming for a comprehensive keyboard to allow entry of all Unicode Hebrew characters, I’d like to make it possible to enter the precomposed characters with the keyboard, but control that with an option, so that entering letter + dagesh is output as letter + dagesh, but if they enter the precomposed (letter+dagesh),

  • if opt1, precomposed(letter+dagesh) > precomposed(letter+dagesh)
  • else, precomposed(letter+dagesh) > letter + dagesh

would that work?

You really do need to choose a paradigm for a keyboard – either logical, or visual. Otherwise, there will inevitably be irreconcilable ambiguity. For example, if I type base combining base, should the combining mark be attached to the first or second base character?

Selecting a paradigm like this also helps your keyboard users to know how to get started. There are plenty of rules they already know for writing their language and the idea of having structure won’t be a problem. The problem comes when there are somewhat arbitrary and complex ordering requirements based on technical limitations, which is the challenge you’ve already been trying to solve with the diacritic order :).

I would urge you to consider leaving the precomposed letters out of a generalised keyboard. They should only be used in very exceptional circumstances and increasingly less often. If you really do want to support the precomposed forms, then I think that putting them in a separate keyboard will create less confusion for your users, and directing them away from it with fairly strong phrasing. For example, “unless you know you need these precomposed letters, you should be using instead my generalised Hebrew keyboard <here>” or something of the like!

This should also simplify the logic of the keyboard because the models for character encoding are so different.

:heavy_check_mark: I like that! thanks

1 Like

exactly. I want to always attach combining characters to the preceding base character. It’s only the various combining characters that should be reordered as necessary.

1 Like

A keyboard developer when developing with combining characters has two options:

  • Reorder combining characters as necessary. This creates much more complex rules in the keyboard. But generally makes input easier for the user
  • Constrain or force the user to type in the order required by Unicode. Generally this approach require more detailed knowledge of the user and has a much higher learning curve

Personally I prefer the first approach, although more and more I am leaning towards developing orthographically responsive keyboard layouts.

on a string like “dcdc”, does a rule like this handle each pair in the string (the 1st 2 letters are processed to result in cddc, then the 2nd pair which results in cddc, then the 3rd pair which results in cdcd)? (and then after match() it does it again recursively)

No, those rules would only impact the final two characters in the string.

I understood the rules better after I translated them into Excel.

I generated 624 strings of 1-4 characters from (a, b, c, d) with repetitions, of which 191 (31%) required no reordering.

I then applied rules as shown in attached chart. After Rules 1 and 2 were applied in the 3rd iteration, all strings were in correct order.Keyman - Excel chart.zip (25.7 KB)

Nice work :slight_smile: I am guessing some of these combinations are vanishingly rare but it’s nice to be comprehensive!

from a programming point of view, is there any difference between these two?

 + [K_A] > 'ש'
 + [K_A] > U+05E9

There is no difference; those rules are identical once compiled.

1 Like

Ok. I got my keyboard to compile with all its rules. However it is not reordering anything when I type either in Test Keyboard or typing in Word. Did I leave out a magic word?hebnew01a.kmn (44.6 KB)

I’ve been away on leave for the last 3 weeks – am back on deck now. Can you give some examples of what you expect (e.g. key sequence, expected output) and then I can test the keyboard and give you feedback.

Welcome back!

See the attached screenshot of a Word document for an example.
Below is what I was hoping the reordering rules would do.

Let me know if this is what you need.
Thanks.

Keyed

U+05D1 U+0596 U+05B8 U+05BC
ב ֖ ָ ּ

Expected Reordering

[520] any(low_vowel) any(before_low_vowel)

context(2) context(1)
U+05D1 U+0596 U+05BC U+05B8
ב ֖ ּ ָ

[530] any(after_dagraph) any(dagraph) any(incl_dagraph) > context(2) context(1) context(3)
U+05D1 U+05BC U+0596 U+05B8
ב ּ ֖ ָ

[519] any(low) any(before_low) > context(2) context(1)
U+05D1 U+05BC U+05B8 U+0596
ב ּ ָ ֖

It appears you are just missing a single line in your source file. In group(main), you need to have a match > use(reorder) rule to trigger the reordering. You have the recursive match > use(reorder) already in group(reorder), but you never make it to the reorder group because of the missing rule :slight_smile:

I added that rule on my copy of the keyboard (anywhere in the main group is fine), and then the reordering worked as you would expect.

1 Like

thanks!
issue IGH6205


what did I do?