To reply to Juha’s statement about the locking level in LD (the smallest edit unit that is pulled/pushed and gets lost in case of a conflict), when it comes to Language Depot and FLEx data, the answer is the same. For the linguistic data that Language Depot and FLEx use, the smallest edit unit that can conflict is a single field in a single entry (and even smaller than that in certain circumstances). In Juha’s example where user A pulled a dictionary entry X at version 101, and B pulled entry X at version 102, and then B edited and pushed (creating version 103) before A pushed his own edits, here’s how that would go:
Scenario 1: A and B edited different fields of the entry
If B edited the Grammatical Notes field, and A edited the Definition field, there would be no conflict. When A pushed, the Send/Receive system would notice that his version of entry X was at version 101, and the most recent version was 103, including an edit by B to the Grammatical Notes field. Since both edits were in different fields, there’s no chance of the edits conflicting. So Language Depot would create version 104 as a merge version, that merges versions 101 and 103. Version 104 would have both A’s Definition edits and B’s Grammatical Notes edits included.
Scenario 2: A and B edited the same field, but different parts of the field
Most fields in FLEx and Language Forge are multi-part fields. The most common kind of multi-part field is a field where you can have different text for each writing system in your project. E.g., in the Lexeme field you’ll often have two different writing systems, one for IPA and one for the actual alphabet of the language. If A and B both edited the Lexeme field, but B edited the IPA part of the field while A edited the local-alphabet part, then again there would be no conflict. The Send/Receive system would see that A had seen version 101 where the local-alphabet lexeme field contained “cat”, and since version 103 still has “cat” in the local-alphabet lexeme, it would allow A’s edit (where he changed “cat” to “kitty”) to be merged. Meanwhile, B’s edit changing the IPA field from “kʰæt” to “kʰɪɾi” would be preserved, so version 104 would be created with the Lexeme field containing “kʰɪɾi” in the IPA writing system, and “kitty” in the the local-alphabet writing system. Again, no conflict.
Scenario 3: A and B edited the same part of the same field, in exactly the same way
If A and B had both edited the Lexeme field, and made the exact same edit, again there would be no conflict. (I think; this particular scenario is one that I haven’t seen happen in practice, so I’m not 100% sure if I’m right about how the Send/Receive system would handle it). Here, the Send/Receive system would use the version history graph to figure out that there was no conflict. FieldWorks and Language Forge don’t show you the version history graph, but if you use advanced tools for working with Mercurial repositories (most of which are designed for software developers), it’s possible to get it to show you a graph of which version was based on which other version, which version was a merge between two previous versions, and so on. Usually the graph will look like a straight line, more or less, but in situations like the one Juha mentioned it’s possible you’ll see a place where there’s a “fork” in the graph, which hopefully gets resolved again soon. In this case, the “fork” would happen at version 101. On one side of the fork would be version 102 (based on 101), then version 103 that B created, and on the other side of the fork would be version 101, and nothing after that until A pushes his change and the two sides of the fork get resolved. (If you’re a more visual kind of person, there’s a diagram below that shows more or less what that would look like, though since I copied this graph from someone else’s example scenario, the ID numbers are different from the ones in Juha’s example).
Now, as I said, there’s a version history graph, and the Send/Receive system can trace the “parent” of each version, and when it needs to merge two commits it can find the “ancestor” version of each commit. And it can compare what the data looked like in the “ancestor” commit to what it looked like after A was done editing it, and compare what the data looked like in the “ancestor” commit to what it looked like after B was done editing it. So it would see that at version 101, the Lexeme field contained “cat”. B changed “cat” to “kitty” in version 103, and A is now trying to change “cat” to “kitty”. Since both edits are the same, there’s no conflict, and it will create a version 104 that merges versions 101 and 103, and which has “kitty” in the Lexeme field.
Scenario 4: A and B edited the same part of the same field, in different ways
This is the one that will cause conflicts, and as Chris mentioned, the general resolution strategy when the Send/Receive system finds a conflict is the “They Win” strategy. In other words, when A does a Send/Receive, if his edit conflicts with B’s then B’s edit will win, and A will be notified of the conflict. (If B had done the Send/Receive, then A’s edit would have won). In Language Forge, the conflict notification will show up as a comment on the relevant lexical entry. In FLEx, the conflict notification shows up when you’re done with the Send/Receive process, and it shows you all the conflicts it found. It also creates an annotation on the lexical entry, which shows up as a little icon of an exclamation mark in a triangle (or is it in a circle?); clicking on the icon will show you the conflict data, which should look something like “A edited the field to say “domesticated cat”, while B edited the field to say “kitty”. Send/Receive kept the edit made by B.” So while A’s change has been “lost” in the sense that it’s no longer in the data, it’s been preserved in the notification message. The idea is that A will see the conflict message, and talk to B about which one it should be. Then if they agree that A’s change was correct and it really should be “domesticated cat”, then A will edit the field again, creating another version where “kitty” is changed to “domesticated cat”. (If the change was really long, then the fact that it’s still available in the notification log will be helpful. as A can just copy and paste the text to re-create the change). This time when A sends that commit to Language Depot, it will be based on the most recent version of the data so it will “win” and the final version of the data will have “domesticated cat” in that entry.
Scenario 5: A and B both add something to the same list
This is by far the most common edit conflict, and it’s not a serious one. If there was a list somewhere (such as a list of lexical senses in an entry), and both A and B added a new item to the list, then the Send/Receive system won’t know which order the items should go in, and it will just choose an order arbitrarily, and create an edit-conflict notification about it. For example, if version 101 of the data had just one definition for the word “bank” (“a business that keeps money and pays interest”), maybe A would add the definition “the side of a river”, while B would add the definition “to lean to one side while turning, like an airplane”. Both A and B will then submit a version that has two senses, sense 1 being the “money” sense, and sense 2 being the one they just added (“river” or “turn” respectively). When A submits his change, the Send/Receive system will notice that he only had the "money sense in his data, while the Language Depot data had two senses, “money” and “turn”. The “turn” sense and the “river” sense can’t both be sense 2, so the Send/Receive system will arbitrarily pick one of them (the change made by B, in this case) to be sense 2, and will make A’s change become sense 3. It will then add a conflict notification, so that A and B can talk about the sense order in case it matters, and rearrange it in a later version of the data if they decide that the “river” sense should come before the “turn” sense.
There are other scenarios, such as one where B deleted a lexical entry while A made changes to it, but Chris has already mentioned what happens there (the edits always win over the deletes, so that data isn’t lost; if the entry really should have been deleted, then you can always delete it again after pulling in the latest version of the data, and as long as what you’re deleting is truly the latest version then the delete will happen). As you can see, conflicts are really rare, because even when people have different versions of the data, most of the time they’re editing different lexical entries or different parts of the same entry, and the Send/Receive system is able to preserve both changes with no conflict. And even when a conflict does arise, the most common reason is because two people added data in the same place, and the Send/Receive system can easily preserve both of the things that were added, and it only needs human intervention to verify which order was desired (if it matters). So conflicts where A’s change is really “lost” (it’s never totally lost because it gets preserved in the conflict notification) are really quite rare.
I hope this glimpse “behind the curtain” helps you understand the Send/Receive system a little bit better.
- Robin Munn, Language Technology developer