Hi, I’m interested in the Khb language; however, how can I download the text? For example, in this book: Bloom Library: Home

It is written in the Talu script, but no EPUB is available, and the PDF available is written in the Hani script (I suppose Mandarin language).

Also, based on the terms of use (Bloom Library: Home), scraping and crawling are not permitted. Can you guide me?

Is there any way to download contents for research purposes in low-resource languages? Is there an API or something else?

Hi Amir,
We’re happy to help researchers get access to book text when we can. Please start by looking at sil-ai/bloom-lm · Datasets at Hugging Face and see if it gives you the easiest access. If not, we have an API based on the OPDS epub-reader standard which isn’t that convenient but has been used by others.


Thanks, @JohnHatton. The library provided by Hugging Face is very awesome. Thanks for your help.