Auto AI Transcription

I was making a music video in the minority language project we work with. My video editor (cyberlink) offered as a new feature AI subtitling (but only into English). I did this and it made the breaks of subtitle files at the natural locations in the text. If I thought carefully about the gobbly goop that came out as the subtitles, I could determine what verse of the song it had picked up and I could adjust the text easily. Several subscription online video editors have a similar feature. They offer Indonesian and Tagalog which are more closely related languages. In some ways using this was good. It could give me some exact words and in other ways it was worse, because it would attempt to change words towards the wrong language. I wonder if SIL would consider setting up AI subtitling or transcription where it attempts to match the data input for a language’s phonology and also works against a word list and word corpus or maybe even other transcribed texts to provide an attempt at transcribing automatically. I don’t imagine this will get done by the time I need it, but I know I will need this function in the future as well. It seems to tie in well with the function of SayMore, but maybe I’m missing a more appropriate software.

AI transcription would be a great feature!

An option to do transcription into IPA would good, and I heard that there are people working on this possibility.

Lots of interesting things are happening on the AI front. As things develop, I will keep this in mind. I think IPA transcription is probably nearly within reach. @Michael_Martin, do you have any thoughts on what it might take to attempt this? @Greg_Trihus, If there are already plans to try to roll this out in APM, please let me know how it goes and feed me any relevant information as to how I might try to hook it up in SayMore.

I would love to play with this. I think it would be fascinating to see what an IPA auto transcription can do, though my expectations would be very very low. It would be most helpful if there was a way of training the model a little, or at least exposing the uncertainties any generative AI model will have.


One of the questions is whether we want a solution that requires an offline service or whether the IPA transcription will be done with a connection to the Internet and could run in an online service or are we (SIL LSDev) wanting a solution that is both online and offline? A well designed online system may also include a minimal cost and so we should think through whether this is part of a particular project, ltops, or general direct expenses.


Offline would be ideal, assuming it didn’t involve an enormous installer bloat. But I think that for SayMore’s purposes, online-only would be acceptable, especially if it meant faster speed or increasingly better performance. Of course, if there was a cost to that, then we’d need to take that into consideration. Especially since many SayMore users may be motivated by and funded by concerns that do not align closely with SIL’s primary funders, I’d have no problem passing along the costs to users, if we had a good way to do that.

Looping in @Joshua_Nemeck.

@Luke / @rowbory To see this happen, we would probably need a viable funding model. Do you have any thought as to how we might get this funded? Would you (or anyone) be able to write up a convincing story that shows how automated transcription would make a real difference in our work that would appeal to the values of a potential funder? Note that SayMore is not generally thought of as being a central part of the toolchain for the tasks that appeal to most of our major funders. However, I know that it is sometimes used in ways that are not necessarily exactly what it was designed to do. Of course, two additional funding streams are possible:

  1. A university or research foundation that would be willing to fund some additional development

  2. A pay-to-play funding model where individual users would pay for the transcription service. (This model would not cover initial development costs upfront.)

I haven’t had to do much with funding so I might not be of great help, but my idea would be that most of those funding have an objective that people would read a higher level text. This is why we got into literacy initiatives in the first place. It isn’t enough for people to know how to read their language, they also need comprehension. The problem is not many books are made in the native languages and to develop reading and writing comprehension in many native languages you need well over 100 books. That doesn’t mean that there aren’t lots of stories. In fieldwork we get to record them, but the task of getting them into writing and eventually into a publishable form is arduous. Not many have the time to do that along with their other responsibilities. The gathering of these stories and their transcriptions happen usually at a time where language learning is the primary focus of a sent worker. The world’s majority languages are already easily identifiable by transcription services. This gives opportunity to the minority languages. I worked with dictionary work in the past and having a good corpus is helpful in understanding how a word works and in turn would be helpful in the translation of ALL texts as terms are better understood. Ideally this tool could help workers create large and searchable corpus.

I don’t know if any of that is helpful.

IPA Transcription exists outside SayMore. The original is called Allosaurus that I was involved in, I was recently working with a couple others (including @Joshua_Nemeck ) to re-train it with better and wider data, but we haven’t moved very far yet.

In case anyone comes across this and wants to try it out, here’s a link to Allosaurus. It sounds like it is maybe still somewhat experimental or likely to need to some additional tuning to work well for some languages, but certainly worth trying. Ideally, for use in SayMore it would be nice to either have it packaged up as a stand-alone executable (not requiring python to be installed) or test it out with IronPython to be sure it works that way. Also, it looks like there is a fair bit of tweaking and configuration possible, so it would be nice to wrap that with some usable UI inside of SayMore. But maybe we’d want to start with it as an experimental feature and make sure it’s the way forward before doing all that.

Thanks, Luke. I think that’s a great start toward describing the need in terms of desired results. I will do what I can to ensure that this gets in front of the people who can maybe push this forward.