How Different Is Arabic from Other Languages? The Relationship Between Word Frequency and Lexical Coverage

Ahmed Masrai, James Milton

Abstract


This study examines Zipf’s law as a predictor of the relationship between word frequency and lexical coverage in Arabic. Zipf’s law has been applied in a number of languages, such as English, French and Greek, and revealed useful information. However, word derivation processes are far more regular and extensive in Arabic than they are in English and it is suspected that how words are defined may significantly affect the outcome of this kind of analysis. The concept of the lemma as applied to English could be redrawn for Arabic entirely credibly. In this study, Arabic lemmatised frequency lists generated from a large Web-based corpus have been used to calculate coverage. Results show that Zipf’s law does apply in Arabic, and the findings suggest that the most frequent 9,000 lemmatised words provide approximately 95% coverage, and 14,000 words give nearly 98% coverage. These results suggest that the relationship between word frequency and coverage in Arabic is comparable, to a certain degree, to English and Greek, but not to French. However, the definition of the lemma used in this study is probably more relevant to European languages than to Arabic and if this was changed it would significantly change the results.


Keywords


Arabic corpus, lexical coverage, word frequency, vocabulary, Zipf’s law

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 Journal of Applied Linguistics and Language Research