Vernacular form showing in reversal not consistent

jheath · November 9, 2023, 7:49am

I have a LIFT file where the vernacular comes in both Roman and Arabic scripts. In both the French and English reversal indexes, most of the vernacular forms shown (under the French/English in the reversal entry) are Roman script, but some are shown in Arabic script. E.g. the French index has:

But also:

There are no reversal entries in the LIFT file (well, almost none), so these reversals are being created automatically by DAB. But why is it creating some with RS and some with AS?

I’m inserting below the LIFT entries for these two words, with a few irrelevant fields removed. They are identical in form. So why would DAB treat them differently? And how can I force the reversal entries to all be RS?

Thanks,
Jeff

<entry dateCreated="2020-12-19T12:21:57Z" dateModified="2021-05-16T15:55:03Z" id="nahal_7a0512d7-4bb0-4219-a71e-d0aed13eb72b" guid="7a0512d7-4bb0-4219-a71e-d0aed13eb72b">
<lexical-unit>
<form lang="shu-Arab"><text>نَحَلْ</text></form>
<form lang="shu-Latn"><text>nahal</text></form>
</lexical-unit>
<trait name="morph-type" value="stem" />
<sense id="1179d5a7-0868-4035-84fc-ba5da0019046">
<grammatical-info value="n.coll.">
<trait name="inflection-feature" value="{sbj}[num:coll]" />
</grammatical-info>
<gloss lang="en"><text>bees</text></gloss>
<gloss lang="fr"><text>abeilles</text></gloss>
<note type="grammar">
<form lang="en"><text>n.coll</text></form>
</note>
</sense>
</entry>

<entry dateCreated="2020-12-19T12:22:07Z" dateModified="2021-01-02T14:32:09Z" id="tchuuri_7a1ee9f4-dc83-4d0b-b0bb-f0160fb9460b" guid="7a1ee9f4-dc83-4d0b-b0bb-f0160fb9460b">
<lexical-unit>
<form lang="shu-Arab"><text>تُشورِي</text></form>
<form lang="shu-Latn"><text>tchuuri</text></form>
</lexical-unit>
<trait name="morph-type" value="stem" />
<sense id="b6014d02-ee29-4b49-bd24-01131576c7eb">
<grammatical-info value="n.m.">
<trait name="inflection-feature" value="{sbj}[gen:m]" />
</grammatical-info>
<gloss lang="en"><text>a shelter (of millet stalks)</text></gloss>
<gloss lang="fr"><text>abri</text></gloss>
<note type="grammar">
<form lang="en"><text>n.m</text></form>
</note>
</sense>
</entry>

jheath · November 9, 2023, 8:43am

You would think that if the Roman script is listed first in the Languages pane in DAB, that it would be the default that is used for all of the reversal vernacular forms. But that doesn’t seem to be the case.

I didn’t have this problem in the last version of the app, but at that time the “shu-Latn” WS was simply called “shu”. (There was a change required since then in our FLEx database, due to language defaults.) My guess is that since “shu” appears before “shu-Arab” alphabetically, it becomes the default for reversal glosses? And since “shu-Latn” appears after “shu-Arab” alphabetically, maybe that’s why it uses the AS form? But if so, why doesn’t it do so consistently for all of the reversal entries? I tried to analyze all of the entries that displayed the AS form, but couldn’t find any sort of pattern.

So here’s an ugly hack, that seemed to work in this case: I edited the LIFT file to change all references to “shu-Latn” to “shu”, and in the writing-systems definition in the .appDef file for the project, I also changed “shu-Latn” to “shu”. Now all of the reversal vernacular forms appear in RS. Note that I wouldn’t recommend this solution to the casual user… but it seemed to work for me.

Jeff

jheath · November 11, 2023, 2:05pm

Well, not quite… When I opened a reversal index, I didn’t see any AS vernacular forms used in the index, so it looked good… UNTIL I tapped on an entry to open it. If I then went back to the reversal index, it would show the AS vernacular form for that entry and for one or two other entries around it!

My guess is that when the reversal index is first built, DAB makes an attempt to use the vernacular form that comes first in the language list. (Although there were some exceptions to that - see the first post in this thread…) But then when you are running the app, whenever it updates entries in the reversal index (why would it need to do this? I don’t know…), it just uses whatever vernacular form comes first in the lexical-unit.

This is how the lexical unit appears in the LIFT file I’m working with:

<lexical-unit>
<form lang="shu-Arab"><text>حَكَّ</text></form>
<form lang="shu-Latn"><text>hakka</text></form>
</lexical-unit>

It appears that the Writing Systems are written out into the LIFT file in alphabetical order, so even though Latn is the principle WS in FLEx, the LIFT file is written out Arab then Latn. My work-around in the post above above was to simply rename “shu-Latn” to “shu”, which made it first alphabetically, but not first in the lexical-unit list. That seemed to help fix the problem. But with the additional problem of the AS forms showing up after opening the entry (and the reversal being “updated”), I wondered if maybe the order of the forms in the lexical-unit is also important. So instead of renaming “shu-Latn” to “shu” and leaving it in second position, I left it as “shu-Latn” but just moved it into the first position in the lexical-unit list. So like this:

<lexical-unit>
<form lang="shu-Latn"><text>hakka</text></form>
<form lang="shu-Arab"><text>حَكَّ</text></form>
</lexical-unit>

And THAT seems to work! Now the entries in the reversal index all have vernacular forms in RS, even if you open up an entry (which “updates” the reversal) and then go back to the reversal index.

Wondering what Regular Expression you might be able to use to make this change in a LIFT file? Try this:

Find: (<form lang="shu-Arab"><text>.+?</form>)\r\n(<form lang="shu-Latn"><text>.+?</form>)
Replace: \2\r\n\1

So this seems to be a bug in DAB, which I assume can affect LIFT databases which have more than one WS in vernacular forms, and probably only ones where the default vernacular doesn’t show up first alphabetically. It’s not too surprising to have a bug like this, since this is not the normal case (having two vernacular WSs). I’ll probably eventually try to do some debugging to try to fix this, but don’t hold your breath…