AfricanLanguageComputing-forWWWandother

(from the list of my contributions to the Africa_web_content_owner (AWCO) list; source: AWCO archives )


[africa_web_content_owner] African language computing (for WWW & other)
Feb 11, 2000

Mike asked me to lead a discussion in the area of African languages and web content, though I doubt I am really qualified to do so. Nevertheless, with the caveat that this is an area in which I'm still learning, and with a focus on text content, permit me to pass on several items that may be of interest in computer work in African languages, including generating text web content. Hopefully some of this will be of interest and use...

First, three relevant software items (NB - I get no benefits from sale of any of them!). There are probably others...

The Somali wordprocessor with spellchecker that was mentioned is for sale at: http://www.somitek.com/

An Oromo wordprocessor (in Qubee transctiption) with thesaurus is for sale at: http://www.oromosoft.com/

A multilanguage translation software for 33 world languages (including Arabic & Swahili, as well as English, French and Portuguese) called "Universal Translator Deluxe" permits "omnidirectional" translation, creation of font sets, and production of web content in any of the languages (I have never used or seen this and so have no idea how sophisticated any of these features are, but it is an indication of how things are developing and what will be possible). It is for sale at: http://www.lingotalk.com/multilingual/unidelux.html

There are several points I've noted in my limited exposure to African languages on the web and the technical issues involved in that as well as in African language wordprocessing:

1. For African languages that use the (unmodified) Latin alphabet, there are no special technical problems to localization or web content production. This is I believe the case for Somali (a non-Latin alphabet [presumably modified Arabic script, Abajada?] apparently also exists but is not much used).

2. For African languages that in officially adopted orthography use the Latin alphabet with a few extra or different characters/letters (I believe this includes almost all languages of West Africa) there are several approaches:
(a) The "correct" one. That is, in a wordprocessor to have a font that includes these characters (these often exist but with little standardization so that what works on one person's computer might not on another without reconfigurations of some sort). For the web, that means these added characters in the text with a standard code for each character and some standard set of glyphs that a browser would call up to represent them (somebody please correct as necessary). I'm not sure what would happen if one blocked and copied such an African language web text to a wordprocessor screen - that's one of the kinds of questions that arise.
(b) The "old correct" or out-of-date one. That is, for some languages such as Bambara some digraphs or accented characters used in European languages were employed before the special characters were adopted (e.g. "ny" or "n tilda" for the "n with left hook"; "e accent grave" for the "open e"). This is what I do when typing Bambara in e-mail. It lets one produce & present text, but is not satisfactory to those who have learned in &/or are used to using the current orthography. Also, accents may be confused with tone indicators in some of the tonal languages (Bambara, Yoruba[?]).
(c) The capital letter solution. Use capital letters in place of the special characters (e.g., "E" for the "open e"). Personally I find this hard to read. An example in Bambara is at this address, but I've just had trouble getting it now: http://callisto.si.usherb.ca/~malinet/index_ba.html
(d) The "little image file" solution where little image files are used for the special characters inserted as needed in the text. A site where that was done, again for Bambara, is http://www.djembe.com
(e) The "big image file" solution. Where the entire text is turned into an image file rather like Adobe. I think this is what the "Universal Translator Deluxe" does. Anyway, an example in Swahili is at: http://willow.ncfes.umn.edu/pubs/howtos/ht_pruneswahili/kupogoa_mti.htm
(f) The "to h*** with it" solution. That is, just use the closest standard Latin letter for each special character (e.g., "e" for the "open e") as if it were in the category #1 above. This was done with Bambara, Wolof, and Soninke at: http://www.bok.net/pajol/index.ba.html The advantage is that it gets the material out there in readable form quickly, rather than waiting for all the technical solutions. The disadvantage

3. For African languages with their own scripts, such as Ethiopian & Eritrean languages, special coding may have to be done (someone may be working on it now for all I know) or else one form or another of image files. As I mentioned earlier, this is a task somewhat similar to what has already been done or is being done for other Semitic languages (namely Arabic and Hebrew) and for languages of south Asia

Anyone interested in looking at some of the special "extended Latin" characters used in some African languages and the codes assigned to these can access a database on letters available at: http://www.eki.ee/letter/

All the best!

Don

Donald Zhang Osborn, Ph.D. osborndo@...
consultant @ NRMP-Assistant-Mali@...
ANRM, IK, & ICT in the vernacular bisharat@...

"A mechanism of world inter-communication will be devised, embracing
the whole planet, freed from national hindrances and restrictions,
and functioning with marvelous swiftness and perfect regularity."
Shoghi Effendi, 1936.

"The appropriate technology for the developing
third world is electronic digital technology."
Arthur C. Clarke, 1980.


< 3MonthsSummary-personalclarification | AWCO | AsmaraDeclaration >

Page last modified on 2016-11-04 17:34
Powered by PmWiki