speech_data_ghana_ug

UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

Link(s) to Data Assets

AUDIO_ID.csv Description

Column Description
IMAGE_URL Provides the relative path to the images in the folder
IMAGE_SRC_URL Provides the source path to the actual image online
AUDIO_URL Provides the relative path to the local audio language in the Local Audio folder
ORG_NAME Identifies the institution coordinating the audio collection
PROJECT_NAME Provides the name of the project
SPEAKER_ID Provides the ID number of the individual describing the image
LOCALE Provides the local language IETF BCP 47 language tag of the audio file
GENDER Provides the individual providing the audio description gender
AGE Provides the individual providing the audio description age
DEVICE Identifies the device from which the audio recording was done
ENVIRONMENT Identifies the space within which the audio was recorded
YEAR The year in which the audio was recorded

Note: Local IDs

Locale ID Name
ak_gh Akan
dga_gh Dagbani
dag_gh Dagaare
ee_gh Ewe
kpo_gh Ikposo

CITATION

Wiafe, I., Abdulai, J., Ekpezu, A. O., Dodzi, R., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSPEECHDATA (Version 1.0.0) [Data set]. https://github.com/isaacwiafe/speech_data_ug