[Previous | Introduction] [Next | Appendix A]

Requirements For an Ethiopic Character Set

In our efforts to develop a robust and coherent standard for the Ethiopic script, we have established a set of governing principles that any standard for the comprehensive Ethiopic character set must meet.

These requirements are:

  1. The members of the set to define the Ethiopic domain must include those characters in both common and infrequent use by publishers of the present day.
  2. Every member of the domain requires a unique address.
  3. Characters of a common class be addressed contiguously.
  4. The standard must include a mechanism to facilitate future expansion of the domain.
Adherence to these principles ensures that the standard is unambiguous, consistent, and complete.


Proposed Encoding Standard For Ethiopic Script

The
table of Appendix A presents all characters in the awareness of the committee that satisfy the first condition. The proposed table meets condition two intrinsically. Characters are classified as either; punctuation, numeral, letter, or other. Members of the punctuation and numeral regions are ordered as per their functionality (described in the following).

Letters follow their traditional ordering. Recent extensions to the fidel for the syllabic series of QHA, VA, DDA, and GGA, follow the characters whose glyphs they are derived from. Consonant classes with 12 forms have their 5 extensions treated as a new, but intimately related consonant series. The base sound of the new consonant class is always the labiovelar formation of the consonant from which it decends (i.e. ``GWAA'' from ``G'' is treated in the address structure as a new, but related, consonant). All labiovelar forms of the consants are considered qualified members of the syllabic series of the consonant. This treatment of the labiovelar extensions to the fidel implements the system originally devised by Desta Tekle Wold. This defines the contiguous grouping of the table for requirement 3.

Sixty-four addresses (4 rows) are reserved at the end of the predefined region for Ethiopic characters not treated in this definition of the domain. This allocation is believed sufficient to meet the fourth and final condition.

Encoding Principle

1. The Ethiopic Script features Ethiopic punctuation marks, numerals, and the syllabary. We believe that a unique 16 bit representation must be used for each syllable of the Ethiopic script. Further, a 16 bit representation is required for each element of the writing system serving a unique purpose and function in writing. Thus members having a single function and various written (or printed) forms do not require additional code assignments for their polymorphic syblings. We have here an attached hard copy of the script with its 16 bit hexadecimal code assignments that we feel is complete. (See Appendix A)

2. The Encoding Structure

U+1200 TO U+135F         Ethiopic Letters
U+1360 TO U+1368         Ethiopic Punctuation
U+1369 TO U+137C         Ethiopic Numbers
U+1380 TO U+13BF         Ethiopic Extended Letters
U+FDF0 TO U+FDFF         Ethiopic Private Use
3. Note on ETHIOPIC SPACE and ETHIOPIC WORDSPACE

The traditional word spearator, U+1361, remains in modern use with white space and has taken on the role of comma in Eritrean use when white space is present. Modern WORDSPACE will be found in modern documents with greater left and right side ``white space buffering'' (x-offset and kerning) than will be found in older documents. Ethiopic space is intended to have, at a minimum, the width of the traditional word separator and at a maximum the width of the widest printed character in the code block (commonly KXWA, CEE, or MWA). Specifications for the width of U+1360 is not addressed by the Ethiopic script standard.

4. Note on ETHIOPIC QUESTION MARK and ETHIOPIC NON-SPACING GEMINATION MARK

With the exception of some Eritrean press, U+1367 is no longer found in modern use but remains vital to the faithful republication of historic texts. Texts by Dehne and Dawkins provide examples. Hadas Eritrea is a weekly newspaper, with international distribution, demonstrating modern use.

The Ethiopic non-spacing gemination mark is described by Fulass:

``In fact, Ethiopian scholars do not speak of "gemination", but rather of "tightened or "maintained" consonants (t'bk' fidel - "tightened letter"). Though the Amharic syllabary does not indicate gemination (this is one of its main faults) some scholars make use of two dots paced over a consonant to show gemination. The two dots do not indicate "doubling", but are an abbreviated form of the leter THA of (THA)(BE)(QE). The two dot notation has been taken over for use in several Amharic grammars.''
Scholars Marcel Cohen and C.H. Dawkins authored two of the most imporant studies on Amharic. Their works depend on, and provide examples of the gemination mark. The noncircular shape of the geminations marks is considered both the identifying and definitive aspects of the marks' function.

The Unicode Consortium has indicated that the COMBINING DIAERESIS, U+0308, should be sufficient to indicate gemination and that a second gemination mark for Ethiopic is then unnecessary. Thus for those wishing to use an Ethiopic stylized (if not functional) alternative; we recommend the private use address U+FDFA be used.

5. Note on Character Name Alternatives

Character name alternatives are given for most Ethiopic punctuation in U+1361 to U+1368. Names are given in both Amharic and Tigrigna -the primary languages of Ethiopia and Eritrea for whom the encoding is most important. Reference 7 provides a listing of the character names in Amharic. Tigrigna language names have been provided by native speakers from the EDIN computer network.

Name alternatives are given for homophonous Ethiopic letters following the practice in Ethiopia. Names are given in Amharic, Tigrigna alternatives may exist but are not known at the time of this writing (redundant homophonic series have been dropped by some Fidel standards for Tigrigna). When characters were known by more than one common name (``weha-he'' and ``bizuhan-he'' for 1280 for instance), the name choice used in reference 7 is offered as the reference is likely the one most available to developers.


Ethiopic Script Sorting Order

1. The sorting order follows the standard Unicode model of Letters-> Punctuations-> Numerals.

a ) The order of the readable letters is adopted from the Desta ordering of the traditional Ethiopic layout of the syllables. Each consonant with all of its consonant/vowel variations precedes the next consonant.

c ) The punctuation order follows traditional schemes with the archaic members at the end.

b ) The numeral order follows the value of the numeral.

2. The sorting order, therefore, adheres to the encoding order of the Ethiopic script set. If the need arises to adopt a different sorting order, it can be done at implementation level with out breaching the encoding order.


Ethiopic Script Future Extension

It is neither the purpose nor the necessity of the proposed standard for Ethiopic script to encode every offspring potentially born of the traditional writing system. Nor is it the purpose of the standard to arrest or stymie the growth of the writing system. It is made a requirement of the standard, that at a mimimum, it provides to publishers of the present day the characters and symbols whose presence they will depend on in a standard for Ethiopic script.

Characters added in addition to this mimimal requirement may be those important for the republication of historical documents, and letters devised to meet the needs of minority languages just beginning their literal histories.

Characters will not be submitted to the Unicode Consortium until sufficient information is available to make address assignments responsibly. This proposal gives address assignments to all characters the drafting committee has sufficient assignment for at the time of submital. Characters to be added in a future version of the Ethiopic script standard will follow the base set of the previous standard. The area following the base set is then to be known as the extension region. Section 2 of Appendix A discusses characters awaiting assignment in the extension region.

When it proves true that there are absolutely no alteratives but to introduce a new character class into the writing system; a set of rules to adhere to is presented in Appendix C.