Is Turkish the only locale with unusual capitalization?

It is well known in some circles that Turkish locale has unusual capitalization rules that may break otherwise solid code (e.g. see here: “Turkish Java Needs Special Brewing“).

In a nutshell, we take it for granted that upper case of “i” is “I”, and lower case of “I” is “i”. This is not true in Turkish where upper case of i is dotted İ, and lower case of I is dotless ı. This may have looked like a clever idea in the 1920s, but it sometimes causes a lot of grief in the 21st century.

I was wondering: is Turkish the only locale that has “strange” capitalization rules?

To answer this, first let’s define what is “normal”. As code, unless specifically localized, mostly deals with 26 letters of the Latin alphabet, I define “normal” as

"abcdefghijklmnopqrstuvwxyz".ToUpper().Equals("ABCDEFGHIJKLMNOPQRSTUVWXYZ") &&
"ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToLower().Equals("abcdefghijklmnopqrstuvwxyz")

I wrote a little program in C# that does the “normality” test for each culture defined in the system. It turns out that on my machine out of 354 defined cultures the following exhibit unusual behavior:

az, az-Latn, az-Latn-AZ, tr, tr-TR

All of them are varieties of Turkish and Azerbaijani locales, and the anomaly is limited to the i/I pair. Azerbaijan used Cyrillic script until independence from the Soviet Union, and then in 1991 switched to a slightly extended version of the Turkish alphabet, inheriting the i/I anomaly.

Leave a Reply

Your email address will not be published. Required fields are marked *