Category Archives: ICT

WAWDT: FCC turning off low-income access to broadband?

The item that prompted me to begin writing about “Why are we doing this?” (WAWDT) was a news report about the US Federal Communications Commission (FCC) decision to rescind authorization for 9 internet providers to provide subsidized broadband to low income households under the Lifeline program (CNN; The Hill; Gizmodo). This is not the biggest issue out there, but in the torrent of news, it was in a way one item too many.

The details are a bit complicated, but the immediate effect seems to run contrary to the new FCC commissioner Ajit Pai‘s stated desire to end the digital divide. The Lifeline program (FCC general & consumer pages) began in 1985 as a way of assuring telephone access to people otherwise unable to access essential communications services (such as poor and elderly). Internet broadband was added to the Lifeline program in March 2016, in recognition of the increasingly essential nature of broadband – such as for students who need good internet access for their schoolwork.

The 9 companies that had been granted this status so far (out of a total of 117 applicants listed on the FCC’s Lifeline Broadband Provider Petitions & Public Comment Periods page, accessed 4 Feb. 2017) have had their status downgraded to pending. They are, in the order they appeared on the list:

The timeline and final outcome are uncertain. According to the Washington Post,

By stopping companies … from accessing the Lifeline program, Pai may be signaling his intention to apply more restrictions to the Lifeline program, policy analysts said. One such restriction could be a strict cap on the program’s budget, which is indirectly funded through fees in the bills of telephone customers.

Expansion of the Lifeline program to include broadband seemed a positive way to address one aspect of increasing inequality – access to information via the internet. Its ending or curtailment would certainly be a loss. Hopefully this can be reinstated or otherwise moved forward again in a way that benefits eligible people.

The FCC has at least one other potential WAWDT item on its policy agenda – overturning net neutrality as a governing principle of the internet.


Visualizing language, development, education & ICT connections

A few years ago, I came across the following “model of development communication with regard to language(s) and education” by Ekkehard Wolff, a professor emeritus and former Chair of African Studies at the University of Leipzig. It was presented in a 2006 working document entitled “Optimizing Learning and Education in Africa – the Language Factor: A Stock-taking Research on Mother Tongue and Bilingual Education in Sub-Saharan Africa” (later revised and published in 2011 as “Optimising Learning, Education and Publishing in Africa: The Language Factor“).*

What first struck me was that this simple triangular model portraying the relative strength of links among development, language, and education captures the essence of the situation as regards African languages in development and education programming in Africa.

Secondly, the model could easily reflect development communication – or extension work – in a mostly monolingual country, where almost everyone speaks a single tongue as their first language (“L1”), and those who don’t mostly have that same language as an “L2.” Language is not a factor that needs particular attention beyond the appropriate use of the common tongue.

Third, it is significant, though not surprising, that this came in discussion of education. The field of education tends to give more attention to issues relating to language and languages, for instance in research and policy recommendations on mother-tongue based/multilingual education, than does the field of development studies. (For a more complete discussion, see Prof. Dr. Wolff’s chapter 1 in the last version of the above-cited document).

And finally, it also occurred to me that one could readily extend this model in a third dimension by adding another factor: information and communications technology (ICT). ICT after all is (1) a more or less established dimension of development assistance (per ICT4D), (2) a feature of some projects to assist education, and also (3) the focus of a range of language technology and localization efforts. So the connections of ICT with all three are natural.

Expanding the model

The expanded model with four factors – language, development, technology, and ICT – is a triangular pyramid or tetrahedron that allows us to visualize six related pairs of factors and characterize their relative weight in development communication (programming, extension, etc.).

These six pairs with comments (those on the first three are Wolff’s) are:

  • Development ↔ Education: “Widely accepted on a priori groun ds, but with little understanding of exact nature of relationship”
  • Education ↔ Language: “Little understood outside expert circles,   particularly in terms of MoI [medium of instruction] vs. SoI [subject of instruction]”
  • Language ↔ Development: “Largely ignored”
  • Development ↔ ICT: Established in development thinking and practice as ICT4D
  • Education ↔ ICT: Established connection, often as part of ICT4D or as  local-level projects
  • Language ↔ ICT: Linkage well established for major languages as “localization” (“L10n”), but not as well supported in terms of policy or technology, for less-resourced languages

This model also facilitates visualization of other dynamics beyond the language-development-education triangle introduced by Wolff, each of which which involve ICT. Specifically:

  • Links among language-development-ICT (is L10n part of ICT4D projects? do L10n projects address development needs?)
  • Links among language-education -ICT (does use of ICT in education projects include localized content or interfaces?)
  • Links among development-education-ICT (how are ICT4D and ICT4E linked?)

Language belongs in the picture

Overall, any such model incorporating language among the dynamics of development helps expand thinking about development and learning processes. Communication is fundamental to development and education, and one of the principal uses of ICT, and language is fundamental to communication.

Why has language been so neglected in this regard (particularly in Africa)? That is another discussion. In the meantime, Prof. Dr. Wolff’s chapter (referenced above) is highly recommended as an analysis of the state of affairs and disciplinary divides involved.

* Hassana Alidou, et al. 2006. Optimizing Learning and Education in Africa – the Language Factor:  A Stock-taking Research on Mother Tongue and Bilingual Education in Sub-Saharan Africa. Paris: Association for the Development of Education in Africa. (NB- This document carries the note that it is a draft and not for dissemination, however it is widely available on the web and has been cited in at least two published books.)
Adama Ouane and Christine Glanz, eds. 2011. Optimising Learning, Education and Publishing in Africa: The Language Factor A Review and Analysis of Theory and Practice in Mother-Tongue and Bilingual Education in sub-Saharan Africa. Hamburg: UIL & Tunis: ADEA.


International Decade of Languages?

As we draw to the end of 2008 – which is designated as, among other things, International Year of Languages (IYL) – I wanted to ask what’s next? And to propose the possibility of an International Decade of Languages to follow up on issues that the IYL dealt with as well as some others.

A year is a short time to do much more than raise awareness, achieve some limited project results, and begin to link and expand networks interested in such a vast topic as languages. Is it time to prepare the rationale and plans for a longer term campaign?

Issues that could be addressed by an International Decade of Languages might include:

  • What more can be done for endangered languages and their speakers, from documentation and preservation, to development and education
  • Highlight the situation of languages that are not on lists of endangered languages like the Red Book, but are contracting or not being developed for education and advancement of their first language speakers.
  • Explore how the languages of the least powerful regularly get less attention in education and development, than those of the more powerful, even when significant numbers of speakers are involved.
  • Related to the above, consider the importance of languages in achieving the Millennium Development Goals, the objectives of the UN Literacy Decade, etc.
  • Discuss how to develop language policy and planning worldwide, on country, regional and global levels.
  • Consider the importance of language education for individuals and in regard to other goals of education and language development.
  • Develop an official International Declaration of Linguistic Rights for ratification by the UN and the world’s countries.
  • Explore how localization of ICT and application of human language technologies can impact language preservation, development, arts, and learning.
  • Consider whether, how and when to adopt an official international auxiliary language (or to just let English continue to evolve into this role de facto).
  • And others.

There is a little bit of time yet to consider such a concept before the end of the IYL – which was officially launched on the last International Mother Language Day (21 Feb. 2008) and will officially close on the next (21 Feb. 2009). Should proclamation of an International Decade of Languages be a recommendation to come out of the IYL experience?


Mass digitization and oral traditions

In the previous post, I looked at a possible ramification of “mass digitization” of text. But what about the spoken word? And more precisely verbal presentations, performance, and broadcast in languages often described as having “oral traditions” (and generally less material in writing)? Can we do something significant to capture significant amounts of speech in such languages in digital recordings?

There are some projects to digitize older recordings on tape, and certainly a need for more effort in this area, but what I am thinking of here is recording contemporary use of language that is normally ephemeral (gone once uttered), along with gaining access to recordings of spoken language that may not be publicly accessible. One place to start might be community radio stations in regions where less-resourced languages are spoken.

The object would be to build digital audio libraries for diverse languages that don’t have much if any publication in text. This could permit various kinds of work. In the case of endangered tongues, this kind of thing would fall under the rubric of language documentation (for preservation and perhaps revitalization), but what I am suggesting is a resource for language development for languages spoken by wider communities.

Digital audio is more than just a newer format for recording. As I understand it, digital storage of audio has some qualitative differences, notably the potential to search by sound (without the intermediary of writing) and eventually, one presumes, to be manipulated and transformed in various ways (including rendering in text). Such a resource could be of use in other ways, such as collecting information on things like emerging terminologies in popular use (a topic that has interested me since hearing how community radio stations in Senegal come up with ways to express various new concepts in local languages). Altogether, digital audio seems to have the potential to be used in more ways than we are used to thinking about in reference to sound recordings.

Put another way, recordings can be transcribed and serve as “audio corpora,” in a more established way. But what if one had large volumes of untranscribed digital recordings, and the potential to search the audio (without text) and later to convert it into text (accuracy in this area, which would not involve the normal training involved with use of current speech-to-text programs will be one of the challenges)?

Can digital technology do for audio content something analogous to what it can do for text? What sort of advantages might such an effort bring for education and development in communities which use less-resourced languages? Could it facilitate the emergence of “neo-oral” traditions that integrate somehow with developing literate traditions in the same languages?


Can we localize entire libraries?

How close are we to being able to localize entire libraries?

The question is not as crazy as it might seem. Projects for “mass digitization of books” have been using technology like robots for some years already with the idea of literally digitizing all books and entire libraries. This goes way beyond the concept of e-books championed by Michael Hart and Project Gutenberg. Currently, Google Book Search and the Open Content Alliance (OCA) seem to be the main players among a varied lot of digital library projects. Despite the closing of Microsoft’s Live Search, it seems like projects digitizing older publications plus appropriate cycling of new publications (everything today is digital before it’s printed anyway) will continue to expand vastly what is available for digital libraries and book searches.

The fact of having so much in digital form could open other possibilities besides just searching and reading online.

Consider the field of localization, which is actually a diverse academic and professional language-related field covering translation, technology, and adaptation to specific markets. The localization industry is continually developing new capacities to render material from one language in another. Technically this involves computer assisted translation tools (basically translation memory and increasingly, machine translation [MT]) and methodologies for managing content. The aims heretofore have been pretty focused on particular needs of companies and organizations to reach linguistically diverse markets (localization is relatively minor still in international development, and where markets are not so lucrative).

I suspect however that the field of localization will not remain confined to any particular area. For one thing, as the technologies it is using advance, they will find diverse uses. In my previous posting on this blog, I mentioned Lou Cremers‘ assertion that improving MT will tend to lead to a larger amount of text being translated. His context was work within organizations, but why not beyond?

Keep in mind also that there are academic programs now in localization, notably the Localisation Research Centre at the University of Limerick (Ireland), which by their nature will also explore and expand the boundaries of their field.

At what point might one consider harnessing of the steadily improving technologies and methodologies for content localization to the potential inherent in vast and increasing quantities of digitized material?


Paradigm shift on machine translation?

Multilingual #95, coverThe April-May issue of Multilingual, which I’m just catching up with, features seven articles on machine translation (MT). Having a long term interest in this area (which is not to say any expertise) and in its potential for less-widely spoken languages, and having broached the topic on this blog once previously, I thought I’d take a moment to briefly review these articles. They are (links lead to abstracts for non-subscribers):

The evolution of machine translation — Jaap van der Meer
Machine translation: not a pseudoscience — Vadim Berman
Putting MT to work — Lou Cremers
Monolingual translation: automated post-editing — Hugh Lawson-Tancred
Machine translation: is it worth the trouble? — Kerstin Berns & Laura Ramírez
Challenges of Asian-language MT — Dion Wiggins & Philipp Koehn
Advanced automatic MT post-editing — Rafael Guzmán

In the first of the articles, Jaap van der Meer characterizes changes in attitudes about MT over the last 4 years as “revolutionary” — a move “from complete denial [of MT’s utility] to complete acceptance.” What happened? The answer seems to be a number of events and changes rather than a single triggering factor, perhaps an evolution to a “tipping point” of sorts. There have been ongoing improvements in MT, there was the establishment of the Translation Automation User Society (TAUS) in 2004 which “helped stimulate a positive mindset towards MT,” and the empowerment of internet users in the use of MT. Van der Meer also points out a shift in emphasis from finding “fully automated high quality translation” (FAHQT) to what he calls “fully automated useful translation” (FAUT – an acronym that presumably should not be read in French). The latter is not only a more realistic goal, but also one that reflects needs and uses in many cases.

As for the future, van den Meer sees a “shift from traditional national languages to ever more specialized technical languages.” My question is whether we can at the same time also see significant moves for less widely spoken languages.

Van den Meer’s article sets the tone and has me asking if indeed we are at a point where a fundamental shift is occurring the way we think of MT. The other articles look at specific issues.

Vadim Berman looks at some hurdles to making MT work, highlighting the importance of educating users – including mention of a recurrent theme: the importance of clean text going into the translation.

Two of the articles, by Lou Cremers and by Kerstin Berns and Laura Ramírez, discuss the practical value of MT in enterprise settings.

Cremers has some interesting thoughts about the utility of MT in an enterprise setting, something that has long seemed impractical, certainly when compared to translation memory (TM). He begins by noting that “a high end MT system will really work if used correctly, and may save a considerable amount of time and money,” and then procedes to discuss several factors he sees as key to getting good ROI: terminologies and dictionaries; quality input text; volume (pointing out among other things the fact that good MT will tend to lead to a larger amount of text being translated – a key point for considering the value of MT in other spheres of activity I might add); and workflow.

The “correct use” of MT relates largely to the quality of the text: “surprisingly simple writing rules governing he use of articles and punctuation marks will drastically improve MT output.”

Cremers offers a summation which seems to speak for several of the articles:

It’s not the absolute quality of the MT output that is important, but rather how much time it saves the translator in completing the task. In that way it is not different from TM. In both cases, human intervention is needed to produce high-quality translations.

Berns and Ramírez walk through the costs and benefits of MT in a business context. Here the issue is investing in a system but the reasoning could be applicable to different settings. They suggest that the kind of material to be translated is (unsurprisingly) a good guide to the potential utility of MT:

Do you have large text volumes with very short translation times and a high terminology density? Then it is very likely that MT will be a good solution for you. On the other hand, if you have small text volumes with varying text types and complex sentence structures, then it probably will bu too much effot to set up an effective process.

Two of the articles, by Hugh Lawson-Tancred and Rafael Guzmán, discuss “post-editing” as a tool to improve the output of MT.

Lawson-Tancred suggests – contrary to several of the other authors – that the utility of preparing the text going into MT may not be so critical, and that “the monolingual environment of the post-editor is a better place to smooth out the wrinkles of the translation process….” Interestingly, this concept focuses on context, with the basic unit for processing being 5-20 words (that is between the word level of dictionaries and whole sentences). His concludes by speculating that automated post-editing could “develop into a whole new area of applied computational linguistics.”

Guzmán, who has written a number of other articles on post-editing, discusses the use of TM in the context of verifying (post-editing) the product of MT. This basically involves ways of lining up texts in the source and translated languages for context and disambiguation. There are several examples using Spanish and English.

Finally, Dion Wiggins and Philipp Koehn discuss MT involving Asian languages, which most often entails different scripts. There are examples from several Asian languages illustrating the challenges involved.

This is an interesting set of articles to read to get a sense of the current state of the art as regards the application and applied research on MT. It’s a bit of a stretch for a non-specialist with limited context like me to wrap his mind around the ensemble of technical concepts and practices. One does come away, though, with the impression that MT is already a practical tool for a range of real-world tasks, and that we will be seeing much more widespread and sophisticated uses of it, often in tandem with allied applications (notably TM and post-editing). Are we seeing a paradigm shift in attitudes about MT?

At this time I’d really like to see a program to encourage young computer science students from diverse linguistic backgrounds in developing countries and indigenous communities to get into the field of research on MT. I’m convinced that it has the potential if approached strategically to revolutionize the prospects for minority languages and the ways we think about “language barriers.” That is more than just words – it has to do with education, knowledge and enhanced modes of communication. By extension, the set of human language technologies of which MT is a part, can in one way or another play a significant role in the evolution of linguistic diversity and common language(s) over the coming generations.