[Previous | Introduction] [Next | Appendix A]

Requirements For an Ethiopian Character Set

In our efforts to develop a robust and coherent standard for the Ethiopian script, we have established a set of governing principles that any standard for the comprehensive Ethiopian character set must meet.

These requirements are:

  1. The members of the set to define the Ethiopian domain must include those characters in both common and infrequent use by publishers of the present day.
  2. Every member of the domain requires a unique address.
  3. Characters of a common class be addressed contiguously.
  4. The standard must include a mechanism to facilitate future expansion of the domain.
Adherence to these principles ensures that the standard is unambiguous, consistent, and complete.


Proposed Encoding Standard For Ethiopian Script

The
table of Appendix A presents all characters in the awareness of the committee that satisfy the first condition. The proposed table meets condition two intrinsically. Characters are classified as either; punctuation, numeral, letter, or other. Members of the punctuation and numeral regions are ordered as per their functionality (described in the following).

Letters follow their traditional ordering. Recent extensions to the fidel for the syllabic series of Qe, ve, De, and Ge, follow the characters whose glyphs they are derived from. Consonant classes with 12 forms have their 5 extensions treated as a new, but intimately related consonant series. The base sound of the new consonant class is always the labiovelar formation of the consonant from which it decends (i.e. ``gW'' from ``g'' is treated in the address structure as a new, but related, consonant). All labiovelar forms of the consants are considered qualified members of the syllabic series of the consonant. This treatment of the labiovelar extensions to the fidel implements the system originally devised by Desta Tekle Wold. This defines the contiguous grouping of the table for requirement 3.

Twenty-four addresses (3 rows) are reserved at the end of the predefined region for Ethiopian characters not treated in this definition of the domain. This allocation is believed sufficient to meet the fourth and final condition.

Encoding Principle

1. The Ethiopian Script features Ethiopian punctuation marks, numerals, and the syllabary. We believe that a unique 16 bit representation must be used for each syllable of the Ethiopian script. We have here an attached hard copy of the script with its 16 bit hexadecimal code assignments that we feel is complete. (See Appendix A)

2. The Encoding Order Structure

U+1200 TO U+120A         Ethiopian Punctuation
U+120B TO U+121F         Ethiopian Numbers
U+1220 TO U+1376         Ethiopian Letters
U+1377 TO U+138F         Ethiopian Extended Letters
3. Note on ETHIOPIAN SPACE and ETHIOPIAN WORDSPACE

The traditional word spearator, U+1201, remains in modern use with white space and has taken on the role of comma in Eritrean use when white space is present. Modern WORDSPACE will be found in modern documents with greater left and right side ``white space buffering'' (x-offset and kerning) than will be found in older documents. Ethiopian space is intended to have, at a minimum, the width of the traditional word separator and at a maximum the width of the widest printed character in the code block (commonly 12E0 or 1348). Specifications for the width of U+1200 is not addressed by the Ethiopian script standard.

4. Note on ETHIOPIAN QUESTION MARK and ETHIOPIAN NON-SPACING GEMINATION MARK

U+1207 is no longer found in modern use but would be vital to the faithful republication of historic texts. Texts by Dehne and Dawkins provide examples. U+1209, the Ethiopian non-spacing gemination mark is found in modern use by linguists to denote consonant doubling of the consonant component of syllabic members of the Ethiopic writing system. Scholars Marcel Cohen and C.H. Dawkins authored two of the most imporant studies on Amharic. Their works depend on, and provide examples of the gemination mark. The noncircular shape of the geminations marks is considered both the identifying and definitive aspects of the marks' function.

5. Note on Character Name Alternatives

Character name alternatives are given for most Ethiopian punctuation in U+1200 TO U+120A. Names are given in both Amharic and Tigrigna -the primary languages of Ethiopia and Eritrea for whom the encoding is most important. Reference 7 provides a listing of the character names in Amharic. Tigrigna language names have been provided by native speakers from the EDIN computer network.

Name alternatives are given for homophonous Ethiopian letters following the practice in Ethiopia. Names are given in Amharic, Tigrigna alternatives may exist but are not known at the time of this writing (redundant homophonic series have been dropped by some Fidel standards for Tigrigna). When characters were known by more than one common name (``weha-he'' and ``bizuhan-he'' for 12A0 for instance), the name choice used in reference 7 is offered as the reference is likely the one most available to developers.


Ethiopian Script Sorting Order

1. The sorting order follows, with refinement, the ASCII model of Punctuations->Numerals->Letters.

a ) The punctuation order (excepting quotation and gemination) follows the degree to which the character provides pause to the flow of a sentence when read.

b ) The numeral order follows the value of the numeral.

c ) The order of the readable letters is adopted from the Desta ordering of the traditional Ethiopian layout of the syllables. Each consonant with all of its consonant/vowel variations precedes the next consonant.

2. The sorting order, therefore, adheres to the encoding order of the Ethiopian script set. If the need arises to adopt a different sorting order, it can be done at implementation level with out breaching the encoding order.


Ethiopian Script Future Extension

It is neither the purpose nor the necessity of the proposed standard for Ethiopian script to encode every offspring potentially born of the traditional writing system. Nor is it the purpose of the standard to arrest or stymie the growth of the writing system. It is made a requirement of the standard, that at a mimimum, it provides to publishers of the present day the characters and symbols whose presence they will depend on in a standard for Ethiopian script.

Characters added in addition to this mimimal requirement may be those important for the republication of historical documents, and letters devised to meet the needs of minority languages just beginning their literal histories.

Characters will not be submitted to the Unicode Consortium until sufficient information is available to make address assignments responsibly. This proposal gives address assignments to all characters the drafting committee has sufficient assignment for at the time of submital. Characters to be added in a future version of the Ethiopian script standard will follow the base set of the previous standard. The area following the base set is then to be known as the extension region. Section 2 of Appendix A discusses characters awaiting assignment in the extension region.

When it proves true that there are absolutely no alteratives but to introduce a new character class into the writing system; a set of rules to adhere to is presented in Appendix C.