Unsupported combining diacritic


#1

Curious, I noticed today there is a character U+0341, Combining Acute Tone Mark, that is not supported by the Charis SIL font. Apparently it is different than U+0301, in that it does not go over the middle of the preceding character but sits in the space between the preceding character and the following character.

Was it deliberate to not include it in Charis SIL?


#2

0341 and 0342 aren’t for general use - they were meant only for Vietnamese - and their use is discouraged. The Unicode Standard says “Vietnamese-specific accent placement should be handled instead by specialized rendering of 0300 and 0301. Use of 0340 and 0341 is discouraged.” So we discourage it by not providing it. If you are encoding Vietnamese you should use 0300 and 0301.


#3

If you look in the Unicode charts you will see the use of U+0340 and U+0341 are discouraged. They also say it is preferred to use U+0300 and U+0301. So, we chose not to include them in our fonts as a good way to discourage their usage.


#4

Thanks for the explanation. I’m used to looking up info on characters by typing the code into the search bar, and none of those sources say this character is not recommended. So apparently they don’t have all the information.

What pops up in my search:
fileformat.info
compart.com
graphemica.com
charbase.com


#5

You may do better by using ScriptSource. We created that site because other sites were incomplete, misleading, or just plain wrong.


#6

Most sites that present Unicode character information will list only what is in the character properties database. (In particular, such sites rarely inform about any prose within the Unicode standard that may be relevant.)

The clue in those properties that suggests there is more to the story about U+0340 and U+0341 is that they have singleton canonical decompositions to U+0300 and U+0301, respectively. Such decompositions mean there is no behavior difference between, for example U+0341 and U+0301, which raises the question of why are they in the standard and thus there must be more info about them. Time to look in the prose.

FYI, the excerpt Victor quoted can be found in https://www.unicode.org/charts/nameslist/n_0300.html or the charts at https://www.unicode.org/charts/PDF/U0300.pdf


#7

And yes: Kudos to ScriptSource for picking up the block annotations – most sites don’t!


#8

Why doesn’t scriptsource come up in google searches a highly as the three listed by the OP? I use those three sites as well because they do come up in google searches at the top of the list.


#9

Now that I know that scriptsource has character info, I’ll start looking there.