Regex for field replacement

Bart_Eenkhoorn · June 10, 2022, 12:50pm

This may not be the exact place to ask this question, but I am hoping that one of you knows how to do this …

I am looking for a find / replace regex to do the following changes:

In the exemple mdf database record below there are \dn id. lines indicating “idem”, i.e. the same as the previous field of the same language. The “id.” needs to be replaced with the previous \dn value, but only within the borders of each record delimited by \lx and \dt. (end of line marker is the \r\n combination as you are probably aware.)

\lx kpii
\ps n.
\sn 1
\de homme
\re homme
\dn cɛ
\rn cɛ
\dr man
\rr man
\sn 2
\de mâle
\re mâle
\dn id.
\rn *
\dr male
\rr male
\sn 3
\de masculin
\re masculin
\dn cɛya
\rn cɛya
\dr masculin
\rr masculin
\sn 4
\de bravour
\re bravour
\dn id.
\rn *
\dr bravory
\rr bravory
\dt 01/01/2022

Sense numbers can be up to 8 with two to three valid \dn values, followed by \dn id. in the next sens as shown in the exemple.

Thanks in advance for your help,

Bart. (i.e. I am using the regex engine in notepad++)

Rhina · June 10, 2022, 8:20pm

That’s an interesting one! I can’t help, although I doubt regex will take you far with this one, python of linux code would be much simpler I would say (as an amateur). But I would love to hear how you solved that eventually

drowe · June 10, 2022, 10:26pm

Hi Bart,
In the past we did stuff like this with the Consistent Changes (CC) program. I’ll send you more offline and you can see if it works for you.
Thanks,
David

jheath · June 11, 2022, 3:54pm

I think RegEx might be able to do it, but could you clarify that the \dn id. will just by replaced by the \dn value from the previous sense? So in your example, \sn 2 will get \dn cɛ and \sn 4 will get \dn cɛya? It would be fine if there are several consecutive senses that have \dn id.. You would just need to run the Regex several times to copy it to the following senses.

jheath · June 11, 2022, 4:12pm

If my assumption in the previous post is correct, then you should be able to use these Find/Replace expressions in Notepad++ (making sure to check the Regular Expressions option):

Find: (\\dn (.*)\r\n(\\(rn|dr|rr) .*\r\n)+\\sn .*\r\n(\\(de|re) .*\r\n)+\\dn )id\.
Replace: \1\2

This is assuming that your data is very regular, as the example above seems to be, always with English fields first, then national language, then regional language. If you want it to be more robust, then expand those two sections that list SFMs: (rn|dr|rr) and (de|re) to both include ALL of the possible SFMs located within a sense. So you could, for example, make them both into: (de|re|rn|dr|rr). But those lists of SFMs shouldn’t include dn, as that’s the special marker that we are tracking and copying.

If you want an explanation of the expression, here’s a basic one:
Find a \dn (and capture it’s data), followed by some other sense fields, followed by a \sn field, followed by some other sense fields, followed by a \dn id.. When you find that, replace the id. at the end with the \dn data that was captured at the beginning.

If that doesn’t make sense, let me know and I could try again…

Bart_Eenkhoorn · June 14, 2022, 8:15am

Thank you David and Jeff for your help already. Now I have two possibilities, CC and regex… The database is indeed (almost) always e \ n \ r in that order. The reason why I wanted the regex to limit its changes in a \lx \dt context is that there could be a record that has no valid \dn but does have an \dn id. so without taking the \lx \dt conext into account it would copy from the previous record. En tout cas, I will give both CC and regex a try and will let you know.

Bart_Eenkhoorn · June 21, 2022, 3:14pm

Thanks again for your help Jeff. It put me on the trail of the tool VS Code, that allows for multiline find replace. So I did use some of your suggestion.
In the end I went for the cc script that David sent me (see below). Regexes for these (to me) complicated operations require a lot of deep thought, which is hard to come by these days
En tout cas, thanks for your help.

begin > use(main)

group(main)
"\lx " > dup
store(dn) endstore c clear out dn store at beginning of record
"\dn " > dup
store(dn) use(dn) c capture contents of \dn field
“\dn id.” > "\dn " out(dn) c replace id. with contents of latest \dn field

group(dn)
nl > endstore c finish capturing contents of \dn field
out(dn) c put contents of \dn field back into file
dup c replace nl
use(main)