This page is provided to promote UTF-8 support for Ethiopic (Ge'ez/Fidel) electronic text. ``UTF-8'' is the ``UCS Text Format'' 8 bit encoding system of 16 bit text (``UCS'' is then ``Universal Character Set''). But why is 16 bit text important? Previously, PC operating systems would recognize only 7 or 8 bit character code systems (which limits the number of letters the computer can work with to 127 or 256). In these limited environments developers had to make do with what was available from the operating system. This meant breaking the Fidel syllabary into a group of 2-9 font sets or even breaking the fidels themselves into pieces by creating diactric marks. While extremely ingenuitive by the developer and allowing for most every kind of electronic publishing, the computer was still not yet conquered. The computer had merely been tricked into displaying Roman text with a Ge'ez typeface. The text was not yet Ge'ez.
Conquest:- 1996Shortly after the 100th anniversary of the Adowa Victory Fidel gained a universally recognized character code system and the computer had at last finally been conquered. Or was the conquest just beginning?
The computer operating systems of 1997 (and languages such as Java, Limbo, and Alef) no longer have the ``ASCII'' or ``ANSI'' limitations. They can support Fidel as Fidel. Now that we are no longer working with limited systems, it is is upon us to start using them. How to use the new operating system resources is what this page is about. Contributions from people willing to spend a little time experimenting are most welcomed!
The Linux operating system supports UTF-8 text streams. This means it will
recognize Ethiopic file and directory names in addition to document text. You
may create files, directories and do any thing you would normally do on a UNIX
system with 65,536 (2^16) letters available to work with. However,
only 512 character may be used at a single time (this may be a limit of the
CGA/VGA system in x86 architecture, or of terminals) in Linux consoles. 512 characters (supported after Linux v1.3.28) is fortunately enough for
both English and Ethiopic scripts. GohaTibeb, Dashen Engineering, and the
Ethiopian Science and Technology Commission have so far contributed fonts for
Linux use; they can be found at the ftp archive: where the fonts are found individually. You may like to download the complete
package
which details more on Ethiopic under Linux. Linux also uses National Language Support (NLS) whereby software and the
operating system can adjust their dialog language for individual users. The
Linux UTF-8 support and ease of use makes it an ideal test bed for NLS with
the Ge'ez script languages. The latest NLS package being developed with
Addis Abeba University's Computer Science Department may be downloaded from:
Linux Consoles & NLS
The UNIX world is moving towards Unicode and UTF formats. IBM's AIX is known to use UTF-8 streams natively (an Ethiopic tester is needed!). While Solaris and SGI have made strides in multilingualism it is not known if the latest OSs support UTF yet (help needed!).
The evolving HURD also supports UTF-8 streams but to date has not been evaluated for Ethiopic. The never quite complete Plan 9 and its follow-up Brazil are UTF-8 native systems. UTF-8 place in the UNIX future is clearly evident.
GNU who is crafting The HURD is responsible for a large library of UNIX resource that is often preferred over the vendor supplied equivalents. GNU is rapidly internationalizing is software on the portable object approach to NLS. UTF-8 streams will be essential to support for Eritrean and Ethiopian languages. The Addis Abeba University Computer Science department will shortly be working with GNU in this effort.
The
Unicode X-Term
is an X11 resource that will interpret UTF-8 streams on a variety of UNIX operating
systems (even those that do not use UTF-8 natively). UXterm comes with its own
Unicode font, but Ethiopic is not included. However, you may download two varieties
specially tailored for UXterm from hereUXterm
in example.
UXterm is generally satisfactory for the viewing Ethiopic UTF-8 text, however line breaks may appeared where the are not expected (unusally not a problem in HTML). 9term is the Plan-9 terminal also supporting UTF-8 that has been ported to X11. Unfortunately 9term is seen to die on Solaris and Linux systems when it tries to read the Ethiopic portion of UTF-8 text. These two examples hightlight the need to evaluate UTF-8 applications in the Ethiopic (3-byte) text range.
A number of web browsers are beginning to speak UTF-8. Netscape run under Windows NT is able to display UTF-8 text. The Accent web browser which also comes as a Netscape plugin needs further study. The Tango browser can read Ethiopic in UTF-8 now -see sample files and setup information at the Ethiopian News Headlines
When Lynx 2.7 is released it will support UTF-8 text for web browsing. You may add on the extensions to version 2.6 by downloading them from
-DSLANG_MBCS_HACK
.
If you have Ethiopic fonts installed in either Linux or for UXterms you may proceed:
lynx http://www.cs.abyssiniacybergateway.net/fidel/let/yoHens-utf8.html
Ge'ez output from Java interpretters can be viewed in Linux consoles and UXterms.
SelamAlem.java : Note that in JDK-1.02 the System.out.println was unable to print Ethiopic UTF text to stdout. Perhaps this will be corrected in JDK-1.1. After numerous permutations writeUTF to a file name was found to work as shown below:
import java.io.* ;
class SelamAlem {
public static void main(String args[]) throws IOException {
DataOutput out = new DataOutputStream( new FileOutputStream("alem.out") ) ;
out.writeUTF("\u1220\u120B\u121D\u1361\u12D3\u1208\u121D!") ;
}
}
The Unicode escape string above is not overly practical to type by hand. Were the above code written with a 1997 release of Multilingual Emacs and saved with a .java or .JAVA extension, the conversion to the Unicode escapes is automatic. The ``sera2any'' resource also offers Java output options.