Problem typing Vietnamese tone markers and diacritics

Yes, that is a specialized use case. Custom keyboards are probably appropriate for that use case (along with a special toggle to turn off any engine-provided normalization, so we still need that dialog!)

In this conversation, it seems like the term NFC is being used to mean the same thing as composed. For Vietnamese, maybe this is the case. But in other cases, they are different. For example in Lithuanian you can have the characters a, ogonek, acute. The example from Unicode has

  • U+00E1 LATIN SMALL LETTER A WITH ACUTE
  • U+0328 COMBINING OGONEK

The NFC form is

  • U+0105 LATIN SMALL LETTER A WITH OGONEK
  • U+0301 COMBINING ACUTE ACCENT

I don’t know what the composed form would be. Also,
U+0958 DEVANAGARI LETTER QA has the NFC (and NFD form) of

  • U+0915 DEVANAGARI LETTER KA
  • U+093C DEVANAGARI SIGN NUKTA

Normalized Form C: Canonical Composition.

Composition, as in Composed.

Keep in mind that Unicode’s NFC and NFD forms are pretty much the industry standard and/or reference point among programmers for these discussions.

@bobby

NFC - Unicode Normalisation Form C is a sequence of operations: all characters are first decomposed, then canonically ordered, then where possible all or some of the decomposed sequences are recomposed. For Vietnamese, all base character + combining diacritic combinations have precomposed forms. This is not true for all languages, for instance in a wide range of African languages NFC normalised text will still contain combining diacritics, since not all sequences have precomposed forms.

For Vietnamese vowel + tone combinations, there can be up to five canonically equivalent forms, more if you include deprecated characters.

For real life Vietnamese data I see data in NFC, NFD, Windows form (based on how Windows-1252 worked) where vowels are precomposed, and tones are combining diacritics. The other form I often work with is the one used in MARC-21 bibliographic data, which is NFD except for u-horn and o-horn which are precomposed.

Please reply to this topic if you need further clarification and/or assistance, otherwise it’ll be closed in three weeks.

This topic was automatically closed after 21 days. New replies are no longer allowed.