Technical decisions in software engineering are often driven by politics. The architecture committee decides that “all new code should use threading library X” or “we should be moving to using e-mail system Y”, and everyone is supposed to follow. This may be annoying, but true pain begins when the architects committee changes its mind. This, however, is not really new or specific to software development.
I was thinking on how Uzbek and many other languages of Soviet minorities changed their alphabet 4 (four) times over as little as 70 years. Changing language’s alphabet is a huge engineering project not unlike changing data format of a big database. You need to change how books and newspapers are printed, update road signs and education materials, design and manufacture new typewriters and/or computer keyboards, teach millions of people how to use the new alphabet, and in the process you break backward compatibility with existing body of printed texts.
Uzbek Alphabet 1.0: Arabic based
Uzbek language was originally written in Arabic script, which was arguably a poor match to the language structure, and the literacy rate was low. When the Soviets came, their goal was to increase the literacy rate: partly because they shared some ideas of Enlightenment, and partly because they wanted to influence the population via printed propaganda.
Uzbek Alphabet 2.0: Latin based
In 1924 new Uzbek alphabet was designed based on the Latin script, and it was gradually released into production from 1928 to 1940. Early Soviets were not Russian nationalists, in fact they were internationalists that were expecting the World Revolution to begin soon. Therefore, Latin script was considered more suitable for new development, even though it was not compatible with Cyrillic, the script used by the majority of the population of the USSR.
Uzbek Alphabet 3.0: Cyrillic based
However, during this time the political tide had changed. The World Revolution did not happen. USSR became a highly centralized states, with national republics having only token rights. Russian became virtually the only language of politics and business, and Latin alphabet fell out of favor. In 1940 Uzbek was quickly transitioned to a new alphabet based on Cyrillic, and using Latin script became gravely frowned upon. Uzbek is not unique in this transition: dozens of other languages were switched from Latin to Cyrillic around that time.
Uzbek Alphabet 4.0: Latin again
Cyrillic alphabet was used exclusively until 1992, when Uzbekistan gained independence. The new Uzbek state did not want to align with Soviet/Russian legacy, and wanted closer ties with Turkey. They implemented another version of the alphabet based on the Latin script, but different from the one from 1924: it was more similar to Turkish. Deploying version 4.0 to production, however, was met with some difficulties. Independent Uzbekistan did not have the resources or the political will of the Soviet Union, so the process was a long one, and still continues, with versions 3.0 and 4.0 coexisting, and with new Latin script gradually replacing the Cyrillic.
Parallels with software development
Similar stories occur daily in the software development world, creating huge mess. Imagine the state of the body of Uzbek literature: it currently consists of at least four mutually incompatible corpora, which probably resembles the state of many existing databases. Political powers rarely consider their own mortality when making important technical decisions. Unfortunately, it is usually impossible to avoid the devastating effects of such major changes, but our job as engineers is to mitigate them as much as possible.
If the alphabet has changed once, it will probably change again. So, each data entry in Uzbek should include the version of the alphabet somewhere in the metadata. Automatic procedures for conversion between alphabets would also be helpful. Or, we should just use Latin language to store data: it is dead and therefore unlikely to change 🙂