Audio recordings and transcripts of regional African American English varieties for ASR.
CORAAL is a collection of audio files paired with orthographic transcripts documenting regional varieties of African American Language.
The dataset is intended for training and evaluating speech recognition systems on non-mainstream English dialects.
Fine-tune Whisper or Wav2Vec2 on the time-aligned transcripts to improve recognition accuracy for African American Vernacular English across regions.
Develop speech-to-text features for apps that better handle regional AAE variations in customer service or accessibility tools.
Analyze phonetic and lexical patterns from multi-region recordings paired with aligned text for academic studies on language variation.
from datasets import load_dataset
ds = load_dataset("bezzam/coraal")CORAAL provides speech recordings from African American speakers in multiple U.S. regions with time-aligned transcripts, formatted for ASR via preparation scripts.
Verified reviews from the community shape this listing's rating.
Loading reviews…