============================================================================== Qalam: A Convention for Morphological Arabic-Latin-Arabic Transliteration --- Abdelsalam Heddaya <`abdu elsalaam Heddaayah> (heddaya@cs.bu.edu) with contributions from Walid Hamdy (hamdy@lids.mit.edu) M. Hashem Sherif (mhs@homxa.att.com) --- Created: 1985.11 Modified: 1986-1989 often Modified: 1990.01 Modified: 1990.12.21 Modified: 1990.12.31; accepted LAiLA upper case convention, added punctuation, , and Modified: 1991.01.23; added a couple of sentences. Modified: 1991.01.31; decided for Modified: 1991.08.22; cleaned up acknowledgements Modified: 1992.01.13; changed back to <~aa> --- DRAFT---DRAFT---DRAFT --- 0. Introduction --------------- Qalam is an Arabic-Latin-Arabic transliteration system between the Arabic script and the Latin script embodied in the ASCII (American Standard Code for Information Interchange) character set. The goal of the Qalam system is to transliterate Arabic script for computer communication by those literate in the language. The main consideration in the design of Qalam is suitability for transliteration, as well as reverse transliteration, both manually by humans and automatically by computers. Qalam also includes several Arabic script letters used to transliterate other languages *into* Arabic script. Finally, Qalam aims to serve all Arabic script languages, such as Farsi, Urdu, and Ottoman. Qalam is a morphological system in the sense that Arabic script words are transliterated based on spelling and diacritics (the marks that represent vowels in Arabic), rather than on phonetics. This makes it easy to deduce the Arabic script word from its transliteration (i.e., to transliterate the word back into Arabic script). The pronounciation of words, however, can still be deduced from the transliteration, because the (optional) inclusion of diacritic marks makes the transliterated word more pronouncable. We describe Qalam's mapping between Arabic letters and diacritics to ASCII characters. Each Arabic letter or diacritic maps into (and back from) one or two ASCII characters. The choice is made in order to approximate, as much as possible, the Arabic pronounciation, while maintaining the one-to-one morphological correspondence needed for unambiguousness of reverse transliteration into Arabic script. Arabic script letters that do not correspond to Latin sounds are represented with upper case letters or with two character sequences. Thus, Qalam uses upper-case ASCII characters to denote Arabic letters that are different from those denoted by the corresponding lower-case characters. This convention deviates from the common practice of inserting a dot beneath the letter or a dash above it. We give below the list of transliterations for Arabic letters and diacritics, followed by an example and a description of the rules of transliteration. 1. Character Mappings: ---------------------- 1.1. Letters: ------------- hamza ' 'alef aa zayn z qaaf q baa' b syn s kaaf k taa' t shyn sh laam l thaa' th Saad S mym m jym j Daad D nuwn n Haa' H Taa' T haa' h khaa' kh Zaa' Z waaw w daal d `ayn ` yaa' y dhaal dh ghayn gh raa' r faa' f taa' marbuwTah t or h haa' marbuwTah h 'alef maqSuwrah ae hamzat alwaSl e 1.2. Transliteration Letters: ----------------------------- These are characters used in the Arabic script to represent or transliterate letters from other languages such as English, French, German, etc. Egyptian sound g (= Arabic script with bar or dots, pronounced or ) English "v" sound v (= Arabic script with three dots) English "p" sound p (= Arabic script with three dots) 1.3. Diacritics : -------------------------- fatHah a kasrah i Dammah u shaddah double previous letter maddah ~aa sukuwn - tanwyn N 1.4. Punctuation: ---------------- question mark ? double quotes << >> single quotes < > , 2. Examples: ----------- The Qalam transliteration of the first in the , called goes as follows: bismi ellaahi elraHmaani elraHym 'alHamdu lillaahi rabbi el`aalamyn * alraHmaani elraHym * maaliki yawmi eldyn * 'iyaaka na`budu wa'iyaaka nasta`yn * 'ihdinaa elSiraaTa elmustaqym * SiraaTa alladhyna 'an`amta `alayhim * ghayri elmaghDuwbi `alayhim * walaa alDaalyn * 3. Qalam Rules and Conventions: ------------------------------- Transliterate a word by following its Arabic script spelling letter by letter, as well as any available diacritics (i.e., or ), and substituting the specified Latin script. The only frequent exception is the <'alef> in the definite article (i.e., ), which is better to write as if it is a , or (, or ) as the case may be. Diacritics are optional unless they are necessary to disambiguate the original Arabic script spelling. For example, the verb may be written , because the ambiguity does not affect the original Arabic script spelling. On the other hand, may stand for a as in the word or for a followed by a as in , in which case the between the and the is necessary. The <'alif> with a transliterates to <'a> if the is above, and to <'i> if it is below. That is, it is treated as if it is simply a with a or . The definite article (equivalent to "the" in English) should not be separated from the rest of the word by a hyphen; e.g. , meaning "the sun." Write the even if it is silent--. This is a case where literal transliteration is given precedence over phonetic transliteration to make reverse transliteration easy. Observe word boundaries in the original Arabic, e.g. <`abdalsalaam> is wrong, but <`abd alsalaam> is right. Arabic has no capitalization, and hence Arabic script transliterated by Qalam uses capitals to stand for letters that are different from those denoted by the corresponding lower case character. As a convention, we quote transliterated Arabic script text embedded in another script with Arabic script quotation marks and vice versa. 4. Technical Discussion: ------------------------ We would like to argue that Qalam is a superior code for communicating Arabic script text over data networks between heterogeneous computers. Qalam possesses the characteristics required of a good communication code: unambiguity, compactness, and simplicity of coding/decoding. (((Compatibility, Human readability, Code efficiency. Existing codes.))) Qalam's goals include supporting automatic transliteration by computers, as well as manual transliteration for typing in Arabic script using Latin script available on ASCII terminals. This permits computers that support the Arabic script directly to hide the transliterated text from the user. Thus, a personal computer user, for example, should be able to type in Arabic script a message, and have the machine transliterate it for submission to soc.culture.lebanon. Conversely, when this user receives an Arabic script message from soc.culture.lebanon, the computer would transliterate it back into Arabic script for display. The above scenario should hold equally true for text that mixes Latin and Arabic scripts. 5. Bugs: -------- The , should be distinct from the and both must differ from the . Qalam doesn't provide for transliterating the <'alif> written as a vertical bar shaped diacritic, as in archaic spellings of the . The only way to distinguish digraphs such as from the identically transliterated followed by , is to force the inclusion of a diacritic vowel between the two letters. Qalam needs a method to do so without including the vowel, since it's not always available in the original Arabic script text. 6. Acknowledgements: -------------------- Nayel el-Shafei provided the initial impetus for this work by researching the various transliteration systems in use in the US, and publishing the results on egypt-net in July 1985. C.I. Browne (cib%a@lanl.gov) provided, in August 1988, useful comments about the placement of "." (no longer in use by Qalam) and pointed out that was missing in an earlier draft of Qalam. Ali Mili, of the University of Tunis, commented on an early version of Qalam. Stavros Macrakis pointed out the absence of a convention for and the old form of <'alef> that appears as a vertical bar diacritic (e.g., in the ). The first problem has been corrected, but the second remains. In winter 1990/91, a debate surfaced on USENET about transliterating Arabic text, one particular proposal, called LAiLA, convinced us to use upper case Latin letters instead of special characters. References: ---------- @article{Becker87, AUTHOR = "J.D. Becker", TITLE = "Arabic word processing", JOURNAL = "Communications of the ACM", VOLUME = "30", NUMBER = "7", PAGES = "600--611", MONTH = "July", YEAR = "1987"} @article{Becker84, AUTHOR = "J.D. Becker", TITLE = "Multilingual word processing", JOURNAL = "Scientific American", VOLUME = "251", NUMBER = "1", PAGES = "", MONTH = "July", YEAR = "1984"} ==============================================================================