Might there be a way to have Bloom automatically create a unique word list (based on the text of a particular book) when one uses the spreadsheet export/import tool?
It could be added as a separate tab to the spreadsheet or dumped in the accompanying folder as a text file.
We are finding that the Spreadsheet Tool will likely be very useful in the QA process. As part of this process, having a unique word list would be helpful for checking for spelling errors. An alphabetized wordlist can be gone through very quickly. If errors are found, one can then search for them in the spreadsheet and fix them before importing the spreadsheet back into the book.
You can check here, maybe they already have what you are looking for as they have many languages there, though there is also lot of misspelled words and foreign words as well as lists were generated from movie subtitles.
Maybe you could also get in touch with developer and get some information on how he created these lists, as from what I understand you wanna do same thing, get unique words out of large body of text.
Otherwise this should be easy to do in programming languages, simply declare large text as a string, remove all the punctuation marks (but also need to watch that you don’t remove ’ in don’t, for example, this can be done programmatically as well, like when there is later before ’ and after it, then it’s part of word and not quotation mark, and other languages might have similar cases).
Then simply split this cleaned string on white space
And then filter generated array so that only unique words are left.
This should be fairly easy to create.
How ever if you wanna do this in spreadsheets (sorry I don’t know what Bloom is)
You could try to do it in spreadsheets as well, it should work with similar logic, just that it might be slow, based on how big text are you working with, though 1 book shouldn’t be a problem.
Thanks. Yes, there a number of tools and ways to create unique word lists from text. Bloom is software created by SIL for creating simple books, particularly in non-dominant languages. My request was specific to Bloom software and its Spreadsheet Import/Export function.
Yeah, we should be able to do this, @xofweber . Thanks for the request.
Thanks! That is great news.