The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Dagaare, and Ikposo. Each language includes 100 hours of transcribed audio speech from indigenous speakers of the language.
| Column | Description |
|---|---|
IMAGE_URL |
Provides the relative path to the images in the folder |
IMAGE_SRC_URL |
Provides the source path to the actual image online |
AUDIO_URL |
Provides the relative path to the local audio language in the Local Audio folder |
ORG_NAME |
Identifies the institution coordinating the audio collection |
PROJECT_NAME |
Provides the name of the project |
SPEAKER_ID |
Provides the ID number of the individual describing the image |
LOCALE |
Provides the local language IETF BCP 47 language tag of the audio file |
GENDER |
Provides the individual providing the audio description gender |
AGE |
Provides the individual providing the audio description age |
DEVICE |
Identifies the device from which the audio recording was done |
ENVIRONMENT |
Identifies the space within which the audio was recorded |
YEAR |
The year in which the audio was recorded |
| Locale ID | Name |
|---|---|
ak_gh |
Akan |
dga_gh |
Dagbani |
dag_gh |
Dagaare |
ee_gh |
Ewe |
kpo_gh |
Ikposo |
This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
Wiafe, I., Abdulai, J.-D., Ekpezu, A. O., Helegah, R. D., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSpeechData. Science Data Bank. https://doi.org/10.57760/sciencedb.22298