Hi all. I came across this published paper by SIL and Huggingface in which they released basically a large portion of the Bloom Library data to the public:

I’m interested in doing text readability research on the Bikol languages (Sorsoganon and Masbate). These datasets are available as of now in the current Bloom Library but not in the snapshot of the data covered by the published paper. Can I extract and preprocess these datasets from Bloom and attribute the source if I’m keen on publishing the results? I know datasets have CC BY 4 licenses but can’t seem to find other explicit rules on research and publications on the website.

Any help would be nice. Thank you :slight_smile:

Hi Joseph,
I’ll send you an email to connect you to our researcher who created that original dataset.


Hi John,

Thank you very much :slight_smile: