A few questions about TECKit (Mapping Editor) + FLEx integration

Dear honored community,

I have the following problem:
I need to convert a Latin based script (LTR) into a Rohingya Hanifi script (Unicode plane U+10D00 to U+10D3F) and into an Arabic script (+ additional letters from U+0800-08FF)

There are different challenges I’d like to get your advice on before I start sinking days and hours into TECKit:

  1. Can it handle Unicode V.13 and Unicode plane U+10D00 to U+10D3F? Mapping Editor only gives me access to U+0000-U+FFFF will TECKit be able to take it if I manually manipulate it to include U+10D00 to U+10D3F?
  2. Will FLEx be able to take that TECKit file or will it crash (as happens sometimes).
  3. Sadly the scripts I need to convert are inconsistent meaning they are using letters/combinations the others don’t and subsidize (or completely drop). Is there a way to make TECKit understand that a Latin “a” at the beginning of a word has to go with Alif+A-vowel in the Arabic/Rohingya script where as an a appearing within a word going with the preceeding consonant (Arabic) or being an individual character (Rohingya Hanifi). How can I put rules like that into TECKit?
  4. In Arabic Unicode the byte order eg. first vowel then duplication sign can also be written the other way around which both are rendered the same way but are actually different in byte order. Do I have to teach TECKit every possible combination? (Or is there a faster way to do this?)

Best,
Michael

If a character is specified with a numeric code, TECkit can handle any codepoint, even those outside of the basic multilingual plane (BMP). So U+10D00 to U+10D3F should not be a problem.

If you are using names (such as hanifi_rohingya_letter_a) to specify the character then their might be an issue. TECkit itself has been updated to understand all the character names through Unicode v13. Since you mention the Mapping Editor I suspect you are using SIL Converters (am I correct?) and the released version of SIL Converters has not been updated in quite a while, so has an older version of TECkit (which does not know about the recent character names added to Unicode). Test releases of SIL Converters (with TECkit updated for Unicode 13) are being built, let me know if you would like to help with the testing.

If FLEx crashes on a TECkit map with characters outside of the BMP, please report that as a bug to the FLEx developers.

TECkit has contextual rules that should be able to do what you want. The TECkit package has documentation on these rules, you can also look at a collection of TECkit maps. Let me know if you need additional help.

I am not that familiar with Arabic so I don’t know what character the duplication sign is. Are the examples you give equivalent with Unicode normalisation (NFC/NFD)? If so, then you need to output only one of the orders, TECkit (or another process) can use Unicode normalisation to convert to the other form if needed.

Hi Michael,

I assume you mean “character order” and not “byte order” ?

In future, it would be beneficial to tell us Unicode character codes. My guess:
By duplication sign I’ll guess you are referring to U+0651 ARABIC SHADDA; and by vowel you are referring to the set of vowels that are Unicode combining marks, such as U+064E ARABIC FATHA and U+0650 ARABIC KASRA (and not characters such as U+0627 ARABIC LETTER ALEF).

If I have the right characters, then you are correct that <shadda, fatha> and <fatha, shadda> mean the same thing, and as Bobby mentioned you can ask TECkit to perform Unicode normalization which will put these in a fixed order.

The TECkit syntax does allow you to look at the context, so that you can treat ‘a’ at the beginning of a word differently than ‘a’ that follows another letter.

Another issue is whether you want to make your mapping work in both directions, that is from Latin script to Arabic script, but also in reverse from Arabic script to Latin script. This does make the mapping more complex, but is often easier than maintaining two separate mappings.

See my reply on the FLEx list for a sample RS to AS mapping file which handles the vowel context and much more - a good starter mapping file that includes numerous RS to AS tricks.

https://groups.google.com/g/flex-list/c/Ws_tUsk07rI/m/Z5CCYrO9BgAJ