The Arabic Speech Corpus for Isolated Words

The Arabic speech corpus for isolated words contains 9992 utterances of 20 words spoken by 50 native male Arabic speakers. It has been recorded with a 44100 Hz sampling rate and 16-bit resolution. This corpus is free for noncommercial uses in the raw format (.wav files) and other formats e.g. (MFCCs) are available under request.

Both a sample dataset and the complete dataset are now available. Each contains a file called README.txt that describes the dataset, and a file word_list.pdf that shows a table of the words in the dataset.

Sample (contains 20 utterances) (zip file 1.82 MB)

Complete Dataset (zip file 845.1 MB)

Label Coding

Each of the files has been labeled using the following coding system: S (Number of Speaker).(Number of Repetition).(Number of Word) The following is an example: S01.01.01. It represents the first speaker out of 50 speakers, the first recording out of 10 and the first word from the list of 20 words.

Cite

Abdulrahman Alalshekmubarak and Leslie S. Smith On Improving the Classification Capability of Reservoir Computing for Arabic Speech Recognition in Wermter, S., Weber, C., Duch, W., Honkela, T., Koprinkova-Hristova, P., Magg, S., Palm, G., Villa, A.E.P. (Eds.) , Artificial Neural Networks and Machine Learning-ICANN 2014, 24th International Conference on Artificial Neural Networks, Lecture Notes in Computer Science 8681, Springer Heidelberg, 2014, pages 225-232.

Contact Details

Please don't hesitate to contact us if you have any queries about the corpus.

Computing Science and Mathematics
School of Natural Sciences
University of Stirling
Stirling FK9 4LA
Scotland, UK
Cottrel Building, Room: 4X1
Tel:+44 1786 46 7421
aal[at]cs.stir.ac.uk