RIYADH: The King Salman Global Academy for Arabic Language has launched the Saudi Voices Blog project, aiming to complete its first phase by the end of 2024.
The project features spoken and transcribed Arabic, including both eloquent and slang (local dialects). According to the academy, the blog draws from various Arabic sources within ¶¶Òõ¶ÌÊÓƵ, adhering to the latest scientific standards.
It aims to promote research in Arabic audio blogging, gather audio data on Saudi dialects, and build an audio blog using modern methodologies.
The Saudi Voices Blog represents different societal classes, documents their dialects phonetically, and uses modern technologies to provide phonetic data for the scientific community.
It also offers machine-readable audio material with morphological, syntactic, lexical and semantic analysis for AI models.
Saudi Voices Blog aims to engage lexicon authors, AI researchers and those studying comparative linguistic phenomena, age-related linguistic differences and Arabic language policy.
It uses the latest international standards such as CODA and TEI for structuring and managing audio language data. The blog encourages participation across age groups to accurately represent Saudi dialects and their diversity.
The blog targets Saudi dialects from more than 40 locations within the Kingdom. A designated recorder at each location will capture voices from various participant categories, including children, young adults and the elderly, both men and women. The recordings will be uploaded to the Falak platform, which covers topics such as storytelling, places, foods, customs, traditions, holidays, daily situations, and quotes.
Once completed, the Saudi Voices Blog will be available to researchers and stakeholders for studies, application development, and adding new geographical points.
It will also help AI developers to overcome the lack of data needed to study Arabic dialects, societal linguistic differences, and automatic voice identification or transcription.
The project aims to strengthen the global standing of Arabic, raise awareness and facilitate its teaching and learning inside and outside ¶¶Òõ¶ÌÊÓƵ.
Dr. Ibrahim Abanmi, deputy secretary-general of the academy, emphasized the importance of Arab audio blogs in enhancing the academy’s role as a reference for developing such blogs.
Abanmi highlighted the impact of Arabic audio blogs in supporting scientific research and preserving the heritage of Saudi dialects across different social classes.
Abanmi said that the blog was an unprecedented addition to phonetics and language research by providing audio material representing various Saudi dialects.
Dr. Abdullah Al-Fifi, head of the linguistic computing department at the academy, said that the first phase began with 50 individuals collecting data on 50 Saudi dialects.
He said that about 250 people, representing various age groups and both genders in ¶¶Òõ¶ÌÊÓƵ, were participating in recording 2,500 audio hours for the blog.
After completing the high-quality recording and transcribing, the academy will implement a three-stage plan. The first stage involves labeling the audio data to enhance the podcast’s richness and usefulness.
The second phase will add new Saudi dialects not covered in the first phase, followed by expanding the blog’s geographical scope to include other countries.
Hajar Al-Shammari, a linguistic researcher in Saudi history, said that the Saudi Voices Blog was of international standing and offered many correlative products that stimulated research and studies, enriching the linguistic sector and its dialects.
The blog reflected the intellectual and cognitive richness of a region with diverse, intersecting dialects rooted in Arabic, a historical focal point connecting ancient civilizations in Asia, Africa and Europe, she added.
The blog allows linguistic and historical researchers to conduct specialized and interactive studies, contributing to significant outputs not only in the region but globally, given the Arabian Peninsula’s geopolitical and historical importance, Al-Shammari said.