Counting Your Characters in 9 Languages

 

A physicist, an engineer, and a prorgammer were required to prove that every odd number is prime. The physicist said: let's take a couple of first odd numbers: 1, 3,5,7 are all prime. Granted, 9 is not prime, but 11 and 13 are. Now, let's take a random odd number, say... 41. It is prime. Hence, all odd numbers are prime, and as for the number nine - it definitely was an experiment error. The engineer said: out of first 10 odd numbers (1, 3, ... 19) 80% are prime. This accuracy is good enough for practical needs.

The programmer wrote 14 programs in 9 different languages, all of which started to print continuously: 1 is prime, 1 is prime, 1 is prime...

This story started on a forum, as many other stories these days. Someone asked a question on how to count the number of occurances of each character in a string and output it to the screen. E.g. for string "dbaacbaa" the output would be

a - 4
b - 2
c - 1
d - 1

The original poster asked to write a program in C#, but since it is a such as textbook question, people started oferring their solutions in different languages. Indeed, it is quite educational how different languages approach this simple problem and how much code it requires. From one hand, it is simple enough to be programmed in several lines of code, from the other hand it is much more interesting than "hello, world!".

Below are solutions for various languages. Of course, implementation for each solution is somewhat subjective - while all the solutions do the job, how well they represent particular language ideology is, definitely a matter of opinion. For some languages I even included more than one solution, to reflect different styles.

But even such a short example can give us some valuable observations:

  1. Java and C# yield larger source, because of mandatory class and method declarations.
  2. A built-in "group by" functionality helps shorten the code.
  3. In Perl group-by is done manually, but the code is still short thanks to returning default value for items not present in hash table. However, it may be dangerous in other contexts.
  4. C++/STL suffers from lack of group_by, lack of anonymous functions, and various language pecularities such as 'cannot use bind2nd with a function that accepts a reference'.
  5. Python version is probably the most elegant (in my opinion), but the least optimal.
  6. F# is my favorite - short, readable version that uses reasonable algorithm.
C

#include <stdio.h>

 

int main()

{

    char* str = "dbaacbaa";

    int counts[256] = {0};

    char* pCurrentChar;

    int i;

 

    for (pCurrentChar = str; *pCurrentChar; ++pCurrentChar)

    {

        counts[(unsigned char)*pCurrentChar]++;

    }

 

    for (i=1; i<256; i++)

    {

        if (counts[i])

        {

            printf("%c - %d\n", (char)i, counts[i]);

        }

    }

 

    return 0;

}

C (hacker style)

main()

{

    char *d = "dbaacbaa", c=0;

    int counts[256] = {0}, *n = counts + (((char)255<0)?128:0);

    while ( ++n[*d],*++d );

    while (++c) n[c]?printf("%c - %d\n", c, n[c]):0;

}

C++ using vector

#include <vector>

#include <iostream>

#include <algorithm>

#include <functional>

 

using namespace std;

 

inline void increment( vector<int>* v, char c)

{

    ++(*v)[(unsigned char)c];

}

 

int main()

{

    string str = "dbaacbaa";

    vector<int> counts(256);

 

    for_each(str.begin(), str.end(), bind1st(ptr_fun(increment), &counts));   

 

    for (int i=0; i<256; ++i)

    {

        if (counts[i])

        {

            cout << char(i) << " - " << counts[i] << endl;

        }

    }

}

C++ using map

#include <map>

#include <iostream>

#include <algorithm>

#include <functional>

 

using namespace std;

typedef map<char, int> count_map;

 

inline void increment( count_map* m, char c)

{

    count_map::iterator ptr = m->find(c);

    if (ptr == m->end())

    {

        (*m)[c] = 1;

    }

    else

    {

        ++ptr->second;

    }

}

 

inline void print( pair<char,int> const& p )

{

    cout << p.first << " - " << p.second << endl;

}

 

int main()

{

    string str = "dbaacbaa";

    count_map counts;

 

    for_each(str.begin(), str.end(), bind1st(ptr_fun(increment), &counts));   

    for_each(counts.begin(), counts.end(), print);

}

C#

using System;

using System.Collections.Generic;

 

namespace CountCharactersCSharp

{

    class Program

    {

        static void Main(string[] args)

        {

            string data = "dbaacbaa";

            SortedDictionary<char, int> counts = new SortedDictionary<char, int>();

 

            foreach (char c in data)

            {

                int n;

                if (!counts.TryGetValue(c, out n)) n=0;

                counts[c] = n+1;

            }

 

            foreach (KeyValuePair<char, int> pair in counts)

            {

                Console.WriteLine("{0} - {1}", pair.Key, pair.Value);

            }

        }

    }

}

C#, using LINQ

using System;

using System.Linq;

 

namespace CountCharactersLinq

{

    class Program

    {

        static void Main(string[] args)

        {

            string str = "dbaacbaa";

 

            var counts =

                from c in str

                group c by c into g

                orderby g.Key

                select new { Symbol = g.Key, Count = g.Count() };

 

            foreach (var c in counts)

            {

                Console.WriteLine("{0} - {1}", c.Symbol, c.Count);

            }

        }

    }

}

C#, port of C hack

using System;

 

namespace CountCharactersUnsafeCSharp

{

    class Program

    {

        static void Main()

        {

            string data = "dbaacbaa";

            unsafe

            {

                fixed (char* dPtr = data.ToCharArray())

                {

                    char* d = dPtr;

                    int* count = stackalloc int[65536];

                    while (*d != 0) ++count[*d++];

                    char c = '\0';

                    while (++c != 0)

                        if (count[c] > 0)

                            Console.WriteLine("{0} - {1}", c, count[c]);

                }

            }

        }

    }

}

F#

#light

open Seq

 

"dbaacbaa" |> group_by (fun x->x) |> sort |>

iter (fun (a,b) -> printfn "%c - %d"  a  (length(b)))

Haskell

-- http://community.livejournal.com/ru_programming/1154661.html?thread=17787749#t17787749

import Data.List

import Control.Monad

 

printCount str = do

    forM_ (group $ sort str) $ \cs ->

        putStrLn $ [head cs] ++ " - " ++ (show $ length cs)

 

main = do

    printCount [ 'd', 'b', 'a', 'a', 'c', 'b', 'a', 'a' ]

J

# http://community.livejournal.com/ru_programming/1154661.html?thread=17785957#t17785957

# not verified

s=.'abaacdabba'

/:~ (0&{;#)/.~ s

Java, using array
import java.util.*;

public class Program { 
    public static void main(String argv[]) {
        String s = "dbaacbaa";
        int[] counts = new int[65536]; 

        for (char c:s.toCharArray()) {
            ++counts[(int)c];
        }

        for (int i=0; i<65536; ++i) {
            if (counts[i] > 0) {
                System.out.format("%c - %d%n", (char)i, counts[i]);
            }
        }
    }
}
Java, using map
import java.util.*;

public class Program { 
    public static void main(String argv[]) {
        String s = "dbaacbaa";
        Map<Character,Integer> counts = new TreeMap<Character, Integer>();

        for (char c:s.toCharArray()) {
            if (counts.containsKey(c)) {
                counts.put( c, counts.get(c) + 1 );
            }
            else {
                counts.put(c, 1);
            }
        }

        for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
            System.out.format("%c - %d%n", entry.getKey(), entry.getValue());
        }
    }
}
Perl
$str = 'dbaacbaa';
$h{$_}++ foreach split //, $str;
print "$_ - $h{$_}\n" foreach sort keys %h;
Python
# http://community.livejournal.com/ru_programming/1154661.html?thread=17777253#t17777253
a = "dbaacbaa"
print "\n".join("%s - %s" % (i, a.count(i)) for i in sorted(set(a)))