Regex for field replacement

This may not be the exact place to ask this question, but I am hoping that one of you knows how to do this …

I am looking for a find / replace regex to do the following changes:

In the exemple mdf database record below there are \dn id. lines indicating “idem”, i.e. the same as the previous field of the same language. The “id.” needs to be replaced with the previous \dn value, but only within the borders of each record delimited by \lx and \dt. (end of line marker is the \r\n combination as you are probably aware.)

\lx kpii
\ps n.
\sn 1
\de homme
\re homme
\dn cɛ
\rn cɛ
\dr man
\rr man
\sn 2
\de mâle
\re mâle
\dn id.
\rn *
\dr male
\rr male
\sn 3
\de masculin
\re masculin
\dn cɛya
\rn cɛya
\dr masculin
\rr masculin
\sn 4
\de bravour
\re bravour
\dn id.
\rn *
\dr bravory
\rr bravory
\dt 01/01/2022

Sense numbers can be up to 8 with two to three valid \dn values, followed by \dn id. in the next sens as shown in the exemple.

Thanks in advance for your help,

Bart. (i.e. I am using the regex engine in notepad++)

That’s an interesting one! I can’t help, although I doubt regex will take you far with this one, python of linux code would be much simpler I would say (as an amateur). But I would love to hear how you solved that eventually :thinking:

Hi Bart,
In the past we did stuff like this with the Consistent Changes (CC) program. I’ll send you more offline and you can see if it works for you.
Thanks,
David

I think RegEx might be able to do it, but could you clarify that the \dn id. will just by replaced by the \dn value from the previous sense? So in your example, \sn 2 will get \dn cɛ and \sn 4 will get \dn cɛya? It would be fine if there are several consecutive senses that have \dn id.. You would just need to run the Regex several times to copy it to the following senses.

If my assumption in the previous post is correct, then you should be able to use these Find/Replace expressions in Notepad++ (making sure to check the Regular Expressions option):

Find: (\\dn (.*)\r\n(\\(rn|dr|rr) .*\r\n)+\\sn .*\r\n(\\(de|re) .*\r\n)+\\dn )id\.
Replace: \1\2

This is assuming that your data is very regular, as the example above seems to be, always with English fields first, then national language, then regional language. If you want it to be more robust, then expand those two sections that list SFMs: (rn|dr|rr) and (de|re) to both include ALL of the possible SFMs located within a sense. So you could, for example, make them both into: (de|re|rn|dr|rr). But those lists of SFMs shouldn’t include dn, as that’s the special marker that we are tracking and copying.

If you want an explanation of the expression, here’s a basic one:
Find a \dn (and capture it’s data), followed by some other sense fields, followed by a \sn field, followed by some other sense fields, followed by a \dn id.. When you find that, replace the id. at the end with the \dn data that was captured at the beginning.

If that doesn’t make sense, let me know and I could try again… :slight_smile:

Thank you David and Jeff for your help already. Now I have two possibilities, CC and regex… The database is indeed (almost) always e \ n \ r in that order. The reason why I wanted the regex to limit its changes in a \lx \dt context is that there could be a record that has no valid \dn but does have an \dn id. so without taking the \lx \dt conext into account it would copy from the previous record. En tout cas, I will give both CC and regex a try and will let you know.

Thanks again for your help Jeff. It put me on the trail of the tool VS Code, that allows for multiline find replace. So I did use some of your suggestion.
In the end I went for the cc script that David sent me (see below). Regexes for these (to me) complicated operations require a lot of deep thought, which is hard to come by these days :slight_smile:
En tout cas, thanks for your help.

begin > use(main)

group(main)
"\lx " > dup
store(dn) endstore c clear out dn store at beginning of record
"\dn " > dup
store(dn) use(dn) c capture contents of \dn field
“\dn id.” > "\dn " out(dn) c replace id. with contents of latest \dn field

group(dn)
nl > endstore c finish capturing contents of \dn field
out(dn) c put contents of \dn field back into file
dup c replace nl
use(main)