Why identifiers with non-English letters are bad for your health

We just ran into a very funny story with UNICODE identifiers in C#.

It turned out that an identifier like ХTranfer was spelled with Russian letter Х (U+0425) instead of Latin X (U+0058). Furthermore, when we tried to build this code on the server, it complained that identifier ÕTransfer was not found. It turns out that some source files with the Russian identifier were saved in UNICODE, but others were saved as ANSI on a machine with Russian locale (code page 1251), where Russian letter Х is encoded as 0xD5. The build server used Latin locale (code page 1252). It read the UNICODE files OK, keeping Russian Х, but in ANSI files 0xD5 corresponds to the letter Õ (U+00D5). Ka-boom! Identifiers no longer match, the build fails.

C# authors probably felt very proud of themselves when they allowed UNICODE letters in the identifiers. The following is legal C# code (compiles on my machine):

class Проверка
    public static void הדפס_הודעה(string 信息, bool νέαΓραμμή)
        if (νέαΓραμμή) Console.WriteLine();

The trouble with it is two-fold:

1. Different identifiers may look the same. XTransfer, ХTransfer, and ΧTransfer are indistinguishable to a human, but are different to the compiler, because the first one uses English X (U+0058), the second one uses Russian Х (U+0425), and the third uses Greek Χ (U+03A7).

2. The same identifier may “split” into a number of different identifiers because of code page differences. UNICODE and ANSI files look the same to a human and to the compiler on a particular machine, but ANSI file may be interpreted differently on a machine with different locale.

Resume: don’t do it. Stick to the Latin alphabet – for purely technical reasons.

PS. I am not sure how people end up creating mixed-language identifiers by accident, but I was told it is not unheard of. I assume they just forget to switch keyboard to English soon enough and then just copy-and-paste the identifier whenever the compiler complains.

Leave a Reply

Your email address will not be published. Required fields are marked *