F# Enums and Discriminated Unions

Discriminated Unions
Degenerate Discriminated Unions
Enums
Annoyances

Discriminated Unions

In F# (and OCaml) a discriminated union is a type that can have one or more "branches", something similar to VARIANT of COM days. Each branch is a record type that may contain zero or more fields. E.g. the following code

ype Shape =
 | Point
 | Circle of double
 | Square of double
 | Rectangle of System.Drawing.Point
 | ArbitraryLine of System.Drawing.Point array

defines a (floating) shape that can be a variety of different types. Internally, F# creates an inner class for each "branch" of the union, and all these inner classes are derived from Shape:

// approximate reflector output
class Shape
{
  public class _Point : Shape {}
  public class _Circle : Shape { public double Value; }
  public class _Square : Shape { public double Value; }
  public class _Rectangle : Shape { public Point Value; }
  public class _ArbitraryLine : Shape { public System.DrawingPoint[] Value; }
  
  public static Shape Circle(double x) { return new _Circle... }
}

Discriminated unions come very handy with pattern matching. One of the problems of the discriminated unions is that by default they don't have a suitable ToString(), which in particular makes testing difficult. Diagnostic messages like expected: _Rectangle, actual: _Rectangle don't really help.

Fortunately, a ToString() implementation may be added by hand:

type Shape =
 | Point
 | Circle of double
 ...
 override self.ToString() =
  match self with
   | Point -> "Point"
   | Circle(x) -> String.Format("Circle({0})", x)
   ...

The branches of a discriminated union may be accessed either by short name, e.g. Rectangle, or by long name, e.g. Shape.Rectangle. In case of ambiguity, the compiler does not warn you, and appears to choose the most recent suitable definition:

type Pen =
 | Ball
 | Point
 
let x = Point       // Pen.Point
let y = Shape.Point // Shape.Point

Degenerate Discriminated Union

A degenerate discriminated union is a discriminated union where none of the branches has any members. Note, that this is not an official F# definition, I just use it for the purpose of this article. A degenerate union may serve as an enum-like object:

type Color =
 | Red
 | Green
 | Blue

The F# compiler appears to have a special optimization for this case. No derived classes _Red, _Green, and _Blue will be created. Instead class Color will have three static members for red, green, and blue:

// approximate reflector output
class Color
{
  public static Color _red;
  public static COlor _green;
  public static Color _blue;
  
  public static Color Red { get { return _red; } }
}

This optimization is not just an implementation detail. Since all the "guts" of the generated classes are public, this affects interoperability of your F# object with other languages.

There is a number of problems with using degnerate discriminated unions as enums that may or may not be important:

  • Discriminated union branches are comparable via > and <, but "next" or "previous" are not defined. In particular it is not possible to write for color in Red..Blue do...
  • No conversion to integer
  • No combining values together: although you can happily apply [<Flags>] to it, it does not mean anything
  • No default ToString() implementation as mentioned above

Enums

Enums are defined very similar to the degenerate discriminated unions, but each "branch" is assigned an integer value:

type ColorEnum =
 | Red = 1
 | Green = 2
 | Blue = 3

You must explicitly assign values to all branches. You may assign the same value to multiple branches. No warning is issued in this case.

The enums are translated into native .NET enums as expected:

// approximate reflector output
enum ColorEnum
{
  Red = 1,
  Green = 2,
  Blue = 3
}

There is a number of important distinctions between enums and discriminated unions:

  • Enum values must be specified by the fully qualified name: ColorEnum.Red.
  • Enums may be (explicitly) converted to and from integers and used in "for" loops.
  • Enum values may be combined using the ||| operator ([Flags] attribute is currently not checked).
  • Enums have a default ToString() implementation that returns the symbolic name, e.g. "Red".

Annoyances

  1. Rule of the least astonishment is violated: in other languages you can leave out the integer values and it gives you a default-ordered enum from zero up. In F# it gives you a completely different object.
  2. The syntactic distinction between denegerate unions and enums is quite subtle. The consequences, however, are significant, for both external interface and internal usage.
  3. You must explicitly assign integral values to all enum constants, even if you don't care about them.
  4. There is no warning if you assign the same value to multiple constants by mistake. This may easily happen when adding or removing a value from a large enum.
  5. You must write a manual ToString() implementation if you choose to use discriminated unions. This implementation is tedious to write and may easy get out of sync with the actual branches.