These requirements are:
Requirements For an Ethiopic Character Set
In our efforts to develop a robust and coherent standard for the
Ethiopic script, we have established a set of governing principles that
any standard for the comprehensive Ethiopic character set must meet.
The members of the set to define the Ethiopic domain must include
those characters in both common and infrequent use by publishers of the present day.
Every member of the domain requires a unique address.
Characters of a common class be addressed contiguously.
The standard must include a mechanism to facilitate future expansion of the domain.
Adherence to these principles ensures that the standard is unambiguous,
consistent, and complete.
Proposed Encoding Standard For Ethiopic Script
The table of Appendix A presents all characters
in the awareness of the committee that satisfy the first condition. The
proposed table meets condition two intrinsically. Characters are classified
as either; punctuation, numeral, letter, or other. Members of the punctuation
and numeral regions are ordered as per their functionality (described in the
following).
Letters follow their traditional ordering. Recent extensions to the fidel for the syllabic series of QHA, VA, DDA, and GGA, follow the characters whose glyphs they are derived from. Consonant classes with 12 forms have their 5 extensions treated as a new, but intimately related consonant series. The base sound of the new consonant class is always the labiovelar formation of the consonant from which it decends (i.e. ``GWAA'' from ``G'' is treated in the address structure as a new, but related, consonant). All labiovelar forms of the consants are considered qualified members of the syllabic series of the consonant. This treatment of the labiovelar extensions to the fidel implements the system originally devised by Desta Tekle Wold. This defines the contiguous grouping of the table for requirement 3.
Sixty-four addresses (4 rows) are reserved at the end of the predefined region for Ethiopic characters not treated in this definition of the domain. This allocation is believed sufficient to meet the fourth and final condition.
2. The Encoding Structure
U+1200 TO U+135F Ethiopic Letters U+1360 TO U+1368 Ethiopic Punctuation U+1369 TO U+137C Ethiopic Numbers U+1380 TO U+13BF Ethiopic Extended Letters U+FDF0 TO U+FDFF Ethiopic Private Use3. Note on ETHIOPIC SPACE and ETHIOPIC WORDSPACE
The traditional word spearator, U+1361, remains in modern use with white space and has taken on the role of comma in Eritrean use when white space is present. Modern WORDSPACE will be found in modern documents with greater left and right side ``white space buffering'' (x-offset and kerning) than will be found in older documents. Ethiopic space is intended to have, at a minimum, the width of the traditional word separator and at a maximum the width of the widest printed character in the code block (commonly KXWA, CEE, or MWA). Specifications for the width of U+1360 is not addressed by the Ethiopic script standard.
4. Note on ETHIOPIC QUESTION MARK and ETHIOPIC NON-SPACING GEMINATION MARK
With the exception of some Eritrean press, U+1367 is no longer found in modern use but
remains vital to the faithful republication of historic texts. Texts by
Dehne
and Dawkins provide examples.
Hadas Eritrea is a weekly newspaper, with international distribution,
demonstrating modern use.
The Ethiopic non-spacing gemination mark is described by
Fulass:
Scholars Marcel Cohen and
C.H. Dawkins authored two of the most imporant studies on Amharic. Their
works depend on, and provide examples of the gemination mark. The noncircular shape of
the geminations marks is considered both the identifying and definitive aspects of the
marks' function.
The Unicode Consortium has indicated that the COMBINING DIAERESIS, U+0308, should be sufficient to indicate gemination and that a second gemination mark for Ethiopic is then unnecessary. Thus for those wishing to use an Ethiopic stylized (if not functional) alternative; we recommend the private use address U+FDFA be used.
5. Note on Character Name Alternatives
Character name alternatives are given for most Ethiopic punctuation in U+1361 to U+1368.
Names are given in both Amharic and Tigrigna -the primary languages of Ethiopia and
Eritrea for whom the encoding is most important.
Reference 7 provides a listing of the character
names in Amharic. Tigrigna language
names have been provided by native speakers from the EDIN computer network.
Name alternatives are given for homophonous Ethiopic letters following the practice
in Ethiopia. Names are given in Amharic, Tigrigna alternatives may exist but are not
known at the time of this writing (redundant homophonic series have been dropped by
some Fidel standards for Tigrigna). When characters were known by more than one common
name (``weha-he'' and ``bizuhan-he'' for 1280 for instance), the name choice used in
reference 7 is offered as the reference is
likely the one most available to developers.
a ) The order of the readable letters is adopted from the Desta
Ethiopic Script Sorting Order
1. The sorting order follows the standard Unicode model of
Letters-> Punctuations-> Numerals.
c ) The punctuation order follows traditional schemes with the archaic members at the end.
b ) The numeral order follows the value of the numeral.
2. The sorting order, therefore, adheres to the encoding order of the Ethiopic script set. If the need arises to adopt a different sorting order, it can be done at implementation level with out breaching the encoding order.
Characters added in addition to this mimimal requirement may be those important for the republication of historical documents, and letters devised to meet the needs of minority languages just beginning their literal histories.
Characters will not be submitted to the Unicode Consortium until sufficient information is available to make address assignments responsibly. This proposal gives address assignments to all characters the drafting committee has sufficient assignment for at the time of submital. Characters to be added in a future version of the Ethiopic script standard will follow the base set of the previous standard. The area following the base set is then to be known as the extension region. Section 2 of Appendix A discusses characters awaiting assignment in the extension region.
When it proves true that there are absolutely no alteratives but to introduce a new character class into the writing system; a set of rules to adhere to is presented in Appendix C.