Export dictionary to TSV to support creating predictive text keyboard in Keyman

Hi all! :wave:t4:

This is a feature request! I was wondering if you could add an export option to save all words to a TSV file, such that it is trivial to create a predictive text smartphone keyboard with the upcoming Keyman Developer 12.

At bare minimum, we’d like the ability to export a UTF-8 encoded plain text file of one word, per line. Nothing else is required! An optional frequency count of the words may be provided after tab character.

I’m part of the team that’s developing predictive text in Keyman 12, and it would be really nice if we could just have users press a button in WeSay to produce the appropriate file to dump into Keyman Developer to produce a predictive text keyboard—just like that! :wink:

1 Like

i also face the same problem, couldn’t creat a tsv file from wesay for my own developed keyboard.

Since there is no tool within WeSay 1.6.10.0 to export a file as TSV directly, we can probably try something like… It may take a while, but…

Within WeSay, export the dictionary as an SFM and then get all lines beginning with \lx. Here is my workaround:

  1. Export the dictionary to SFM, i.e.
    image

  2. The exported file should be opened after completion.

  3. Select all and copy it.

  4. Open Google Spreadsheet or MS Excel, I’ll go with Google Spreadsheet here (I don’t have MS Office)

  5. Paste what’s copied in (3) in the spreadsheet, like

  6. Let’s filter the column for "\lx " to get all headwords

  7. Now, select all again to paste the text in a new sheet and then find "\lx " with nothing.

  8. At this point, you should be able to Download the sheet as TSV file from within Google Sheet.

There are many way to skin a cat and this is one of it. :slight_smile:

@Dave_Seth

At bare minimum, we’d like the ability to export a UTF-8 encoded plain text file of one word, per line. Nothing else is required

The thing is, we don’t have any developers on WeSay at this time. I expect your teammate E.B. would be able to do this easily, if this is a Keyman need.

Any reason not to use Keyman 16?

FWIW, @Dave_Seth was a spam post – it was just a copy of @eddieantonio’s OP content with a link to a spam domain.

I am amazed by this. Is @dave_seth a real person account (account now deleted!) or was this entire account and post created as spam by a bot? Seems like a pretty sophisticated effort for a bot to achieve.

I reckon it was just a bot – there are numerous spam bots targeting discourse, and discourse does a good job of handling most of it but inevitably sometimes things slip through.

The text in question is a pretty common pattern – copying text wholesale out of another post and then appending a seemingly related link. AI means this is going to get more sophisticated and harder to detect but the end pattern will always be some sort of link going somewhere spammy.