Importing text and images from Word to RAB, audio synchronization

I have successful experience creating apps with SAB, but I am just now learning to use RAB, with some difficulty in this transfer of skills.

The RAB documentation says that text and images can be imported from Word documents, using page breaks for each new Word page or “chapter” in RAB. So I created a 20 page picture story book in Word, with an image and text on each page, separated with page breaks (CTRL+Enter). It’s a relatively short book, so I selected to have one MP3 audio file for the whole book, as it says in the documentation: “If you have a picture story book, you can have a single audio file for the whole book or one audio file per page.”

  1. But after importing the DOCX in RAB, the images didn’t import, where I had expected to see them under Images/Illustrations. The DOCX file is correctly showing under the books tab but no images were imported.
  2. Do I need to avoid special characters in the “book name”?
  3. I was surprised to find that I can’t use special characters in the app name because then when I attempt to sync the audio using aeneas I get this error and no timing file is created:

[ERRO] Unable to create file ‘D:\andyw\Documents\App Builder\Reading Apps\App Projects\Suu?le k?d?m?\Suu?le k?d?m?_data\timings\C01-01-B001-01-timing.txt’
[ERRO] Make sure the file path is written/escaped correctly and that you have write permission on it
Those question marks should be showing up with these special characters: \App Projects*Suúle kʊdɔḿ\Suúle kʊdɔḿ_data*\timings

  1. When I started over with a new app name (no special characters), then I did get Aeneas to work, but really only partly. The timing file was only created for page 1 of the DOCX file (chapter 1 in RAB) with the 8 minute audio spread out over the few lines of page 1 and pages 2-20 are not synchronized.

This error looks similar to the errors I would get for SAB projects where I had introduction files that didn’t meet the specifications for the phrase files. Is there something similar to phrase files for RAB? Any other ideas why this is happening?

I guess I could break up the 8 minute audio file into 20 separate audios for each page in Word, or each “chapter” in RAB, but that would be more work.
Do I really need to create separate audio files for each page in Word?

I’m looking for more detailed documentation about this so that I can in turn train others in this capacity.

Thanks!

FYI I’m using RAB 9.2.1 release 72 on a Windows 10 machine.

From Leo:
Download from Google Playstore “Bibleview” The Birth of Jesus and Revelation with sound and lastly Either Old Testament or New Testament all are made using Word .docx Then I can share how I have built the apps in RAB and also learn of you. We both can learn from others as well.

max sound bites is under 100Bites which is small.

@Ian_McQuay do you know who might be able to help me find answers to these questions?

You can use SAB instead of RAB as long as your package name is not like org.sil.???

  1. I am not getting images either with DOCX. I’ll write that up. I always prefer SFM for my sources so I don’t get problems like this.

  2. I have projects with Thai and Arabic scripts as the name.

  3. The special characters in book names in SAB are okay. The issue you had with Aeneas is probably caused by your default codepage for the command prompt. If you start a Command Prompt then type chcp then press enter. What do you get?

  • If it is 65001 then you are using a Unicode code page. Aeneas should work okay.

  • If not then that will cause problems. (I assuming your special characters are in Unicode.)

    To set the code page for every Command Prompt, open Regedit. navigate to
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Command Processor
    In there you need to add or if there adjust Name of Autorun with Data to chcp 65001

You may want to change your font or the Command Prompt so it is a good Unicode font but mono spaced. Consolas or Lucinda Console

  1. If you want to send me your project and audio file via Google Drive I’ll look at it. Either one or both. Story book often only have one audio file.

Leo wrote:
Andy my reply is in […]

  1. Ian wrote: I am not getting images either with DOCX. I’ll write that up.
    [I use one image with DOCX - see my app in Google PlayStore - bibleview]

  2. Ian wrote: I always prefer SFM for my sources so I don’t get problems like this.
    [I think SFM is the way to go in the long run you end up with more options. I would like to ask for help to change my DOCX files to SFM files to see what happens to my apps. One of my apps is in Thai.]

When I type in chcp on the Command Prompt I get “Active code page: 437” so is that where you expected to see 65001 if that was my default? See an image below after I made RegEdit changes. All of our language data is in Unicode by the way.

I opened RegEdit and there was no Autorun listed, so I added a new “string value” like this:

I’m a wannabe power user (read: potential danger to self) so I felt comfortable opening up RegEdit, but had to kind of feel around to add the Autorun line, eventually right clicked on the open space, then selected New and clicked String Value. Ok, I honestly did have to get some tips from this superuser StackExchange.
After making this RegEdit change when I open up a new CMD at the tope of the page it shows Active code page: 65001 so I hope that is the expected result. (On this forum I’m currently limited to one image per post so I can’t include more screen shots in this post).

But I don’t think making adjustments in RegEdit is something that we should recommend to most users.
In the superuser post listed just above there are several proposals for dealing with the UTF-8 issue. I felt a bit uneasy when I read one comment with 14 up votes: “globally changing console code page to 65001 is an extraordinarily bad idea”. I’m also unsure why some users suggest @chcp 65001>nul rather than simply chcp 65001. One alternative suggestion is going to the system’s Language settings, selecting Administrative language settings , clicking Change system locale... and checking the Beta: Use Unicode UTF-8 for worldwide language support. Then restart. This does sound more simple than directing users to RegEdit. But I think even better would be if SAB and RAB would notice if a book or app name is in Unicode then for those projects use the appropriate encoding for Aeneas and any other CMD lines that would be required.

Re DOCX vs SFM
In the documentation for RAB it says:

Since we envision various literacy workers creating more literacy apps with RAB by simply creating Word documents that contain Images and text, I don’t think it’s realistic to think that most users will be using SFM, at least for the RAB platform. Personally I’m trying to get a local team to create at least one new RAB each month, and I like the simplicity of having people create drafts of the Apps in Word.

For any other users who, like me didn’t know how to change the Command Prompt Font here is how I did that.
First type CMD in the search window, then right click on the top of the Command Prompt bar to select Properties at the end of the list. Then from there you can see the different tabs including Font. I used Consolas font at 16 point because of my 4k screen.

I hope this is helpful to any others who have faced the same issues.

After making changes to the Registry using RegEdit I’m still getting errors. Note here in the screen shot you can see it’s set to 65001 at the top of the page, so is UTF-8. In this screen shot the font is Consolas.

@jheath is this why you suggested that we keep special characters out of the “Project Name”? - presumably because the project name is what is used to name the app folders…

@Ian_McQuay, when you say you use Thai and Arabic scripts in the name do you mean the Project Name or the App name?

@Ian_McQuay I’ve changed the CMD from Active code page: 437 to 65001 (so that the Command Prompt always displays in Unicode UTF-8.)

However, I'm still getting the same errors for any project that uses special characters in the Project Name. When I open the folder using File Explorer, the folder names are correctly showing in Unicode.

Probably we need to avoid special characters in the RAB/SAB Project Name, and reserve the special characters for the App Name, as @jheath suggested in the other thread.

Thanks for for testing that and letting me know it failed…

@jheath advice is the best practice.