speech_data_ghana_ug

UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Dagaare, and Ikposo. Each language includes 100 hours of transcribed audio speech from indigenous speakers of the language.

Link(s) to Data Assets

Local Audios + AUDIO ID.csv

AUDIO_ID.csv Description

Column	Description
`IMAGE_URL`	Provides the relative path to the images in the folder
`IMAGE_SRC_URL`	Provides the source path to the actual image online
`AUDIO_URL`	Provides the relative path to the local audio language in the Local Audio folder
`ORG_NAME`	Identifies the institution coordinating the audio collection
`PROJECT_NAME`	Provides the name of the project
`SPEAKER_ID`	Provides the ID number of the individual describing the image
`LOCALE`	Provides the local language IETF BCP 47 language tag of the audio file
`GENDER`	Provides the individual providing the audio description gender
`AGE`	Provides the individual providing the audio description age
`DEVICE`	Identifies the device from which the audio recording was done
`ENVIRONMENT`	Identifies the space within which the audio was recorded
`YEAR`	The year in which the audio was recorded

Note: Local IDs

Locale ID	Name
`ak_gh`	Akan
`dga_gh`	Dagbani
`dag_gh`	Dagaare
`ee_gh`	Ewe
`kpo_gh`	Ikposo

Licensce

This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

CITATION

Wiafe, I., Abdulai, J.-D., Ekpezu, A. O., Helegah, R. D., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSpeechData. Science Data Bank. https://doi.org/10.57760/sciencedb.22298

This site is open source. Improve this page.