I’m looking at the possibilities of AI and setting it in the context of earlier developments. My understanding is that SIL’s first major use of computers was back in 1963 with the Concordance project in partnership with the University of Oklahoma. Typesetters were using or at least exploring software by 1973 and field linguists began to get their own computers in the early 1980’s.
I’ve seen several articles on the developing use of computers and specialist software over the years but wonder if anyone can point me to simple timeline of some of the key developments?
incidentally one article in SIL’s Intercom in 1981 spoke of the development of a Computer Steering Committee to:
- Help the ‘expectant’ group to understand the corporate wide priorities for computer development.
- Reduce the fears of the ‘fearful’ group. Help them appreciate how the computer is going to help them personally, and also help the corporation as a whole.
- Help appropriate people, including our leaders, grasp the magnitude of some of the larger projects
so that expectations will be more realistic.
It sounds like those needs in 1981 may be similar to some of the needs around AI in 2023
Some timelines I remember: 1984 the Sharp PC 5000 made having one’s own computer possible if you did not have electric power. Some software from that time: Ed (editor), Design and Shapes made it possible to design special characters to replace existing characters, or use a sequence of characters for a special character. JAARS distributed Direct Translator Support (DTS) software that included these different programs and their documentation. Scripture and dictionaries were done in SFM files with codes entered manually.
~ 1988 Shoebox for DOS became available and was a big hit for the assistance it gave in entering and using SFM codes. Used mostly for dictionaries, could interlinearize texts.
1990s, Windows computers became more common that DOS computers, SIL distributed Encore Font system which permitted users to create “hacked” fonts by picking glyphs for each character position.
late 90s, Paratext becomes known and used for Scripture translations.
early 2000s, adoption of Unicode by SIL software and SIL members.
AI isn’t anything magical. It really is nothing new. Greater computing power is more available and for majority languages, more data is available to use for training.
As with all decision making process, good information allows for a good decision. The age old principle still applies – “Garbage in, garbage out”. Only good information allows for the computer to help us solve the problems/challenges we have in a helpful manner.
SIL began working with AI many years ago with tools like CARLA. We were seeking to have the computer help us do Bible Translation more quickly by using to make a draft of a similar language. All the Word (formerly TBTA) has been doing a similar thing for 20+ years. Paratext guessing is also rudimentary AI at work. It takes a knowledgeable person to verify/validate the guesses made by computers.
As computer data scientists use more power and new algorithms to process massive amounts of data, we are still challenged to participate in leveraging the tools and algorithms with minority languages because we lack sufficient quantities of data. Our challenge will be to identify sufficient data and knowledgeable people who can accurately train the data.
Where Scripture data has been spell checked, validated for accurate grammar, we have clean data for input. If we have a good back translation, we have another layer of data to map. If key terms have been set to Greek and Hebrew there is another set of data that can be used to train AI algorithms to help us.
These examples are important as they help us understand that there is a significant human cost for training data. What we don’t recognize in general is that English, for example, has massive amounts of data that data scientists have been collecting for years to train AI application for that language. There are millions of people entering new data online every day that Google and others are scanning and using to train AI computer systems.
We don’t have that same robust amount of data available for training in the languages SIL works with. That doesn’t mean nothing can be done. It does mean that we must change our expectations to consider how lean data can be used and how data quality can be improved to get the best results.
AI isn’t a magic bullet. It is a useful tool that we should consider how to leverage.
Don’t forget Gary Simon’s first portable field computer, hand-built in the late 1970s (?) with Joe Grimes on an NSF grant.
I’ve not found details of Gary Simon’s hand built computer but Poh San of Wycliffe Singapore began using a computer in Ukarumpa in 1977 and shares early experience here: https://www.wycliffe.sg/post/early-days-of-computers-in-bible-translation
Here’s some of the software developed and used by SIL in the 1980s
Direct Translator Support Software (as available in 1984) included
Text Analysls (TA) has several specific functions which allow it to align interlinear text, analyze word segments, list text references, produce word lists and concordances, and print textf iles.
Record Formatter (RF) is a text-oriented base program that is used to sort text, produce subfiles, sequence text for use in primers, and reverse listings. The alphabet and collating sequence can be defined by the user in this and the Text Analysis program.
Consistent Changes (CC) is a powerful program that changes a file of text according to specifications in a table. This is used to revise or correct words, phrases, and sequences of etters. CC has several flexible options that allow for making changes or rerranging parts of text based on certain conditions set by the user.
Manuscripter (MS) prints text in almost any format, from early draft copies to final typeset manuscripts. This general purpose formatting program is especially suited for large documents-technical papers, entire New Testaments, and reference manuals. With the font design capabilities of the Big Print program and Consistent Change tables, users can define special characters, enlarged characters for primers, and unique orthogrphies for output on printers.
Chapter-Verse Check verifies that scripture text has the correct number of verses in a given chapter, and the correct number of chapters in a book. It also checks for proper chapter and verse sequencing.
Lexicon Merge adds new lexical (or dictionary) entries from a sorted file into an existing lexical file. The alphabet and collating sequence may be defined by the user.
Programmable Text Processor (PTP) is a programming language designed for automatic text processing. Some of the functions accomplished with the other DTS programs can be done with PTP, but this method requires programming skills and more user interaction. Because PTP is a language, it can be configured to do unique tasks.
The DTS software was originally developed to run on computers from Digital Equipment Corporation (DEC). There are about 200 of these systems now on the field. As smaller, lower priced computers have come into use, parts of the DTS software have been converted to run on other systems.
Source Intercom Aug/Sep 1984
Some things haven’t changed. Here’s the final word in the article…
Computers can be a real timesaver when properly applied to a translation project, or they can be a frustrating experience in a world of strange jargon and blinking lights. We hope that the services of your branch and international computing entities will provide the key to many more teams increasing their productivity in language projects.
Thanks to all of you who develop software, and to the community of developers and users who make it more helpful and less frustrating