Set and MultiSet Collections for .NET 1.1

Motivation

Unfortunately, .NET framework collections lack support for an ordered set of objects, similar to STL's set, or Java's HashSet and TreeSet. SortedList comes close, but there are several problems with it:

Download

www.ikriv.com.Collections.zip contains source code (C#, VS.NET 2003), signed binaries, and documentation in .CHM format. See also readme.html inside the archive.

Design Goals

We attempted to create containers that behave similarly to STL's set and multiset. In particular, we support:

Why Yet Another Set Collection?

A number of existing set implementations can be found on the Internet:

Most of these implementations are ultimately based either on Hashtable or on a regular array. SortedList is also based on array. None of them is based on a balanced tree structure.

In particular, Extensions.Set and Iesi.Collections.HashedSet are based on Hashtable.
Lambda.Collections.Generic.Set is based on .NET 2.0 generic Dictionary, which in fact is also hash table.
Iesi.Collections.SortedSet and SetCollection.Set are based on array.
Iesi.Collections.ListSet stands out, because it is based on linked list, but this generally performs similar to or worse than array.

Hashtable-Based Implementations

Hash table has very fast insertion and removal. Ideally, these take constant time on average, but in real life it may be worsened by non-uniform hash function and excessive collisions. On the other hand, hash table has some significant downsides:

  1. Hash table is not sorted.
  2. Hash table requires good hash function.
  3. Hash function must be compatible with key comparison algorithm.
  4. Multiple equivalent keys are not allowed → support for multisets is not possible.

Hash table is not sorted: there is no efficient way to iterate through hash table elements in sorted order: from small to large, or vice versa.

Hash Table Requries Good Hash Function: Strings have a hash (allegedly good) hash function provided by .NET framework. Most other objects must invent their own, and creating a good hash function is not easy. Bad hash function may result in severe performance degradation, down to linear inserts, finds and removes.

Hash Function Must Be Synchronized with Key Comparison: It is imperative that if two keys compare equal, they have equal hash codes. Otherwise, hash table will not work properly. E.g., if we use case-insensitive comparison for string keys, we must use case-insensitive hash code provider. This requirement is easy to overlook and hard to verify. It makes putting non-trivial objects into hash table difficult and error-prone.

Multiple Equivalent Keys Are Not Allowed Hash table is not designed to hold several keys that compare equal. Therefore, it cannot support multisets.

Array-Based Implementations

Arrays are usually kept sorted, which allows for logarithmic search time. However, arrays take linear time to add or remove an element. Arrays can support multisets, although none of provided implementation does so.

Other Issues

None of the above implementations returns contained object as a result of search operation. They just return a Boolean that indicates whether an equivalent key has been found in the set. This is unsatisfactory for keys with non-trivial comparison, e.g. case insensitive strings. For instance, if we searched for "test" and the search returned true, there is no way to know (quickly) whether the set actually contains "test", or "Test", or "TEST", or, perhaps "TeSt".

Feature Matrix

Here is a small feature comparison chart for these implementations and www.ikriv.com.Collections.*Set:

Feature Extensions Iesi/SortedSet Iesi/HashedSet Lambda SetCollection www.ikriv.com
Sorted no yes no no yes yes
Add() complexity const n const const n*logn logn
Remove() complexity const n const const n2 logn
Find/Contains complexity const logn const const logn logn
Find/Contains returns object no no no no no yes
Union/Intersection/etc. yes yes yes yes yes yes
Multisets no no no no no yes
Supports .NET 1.x yes yes yes no yes yes
Tested with nUnit no no no no no yes

Legal Stuff

English

You can download the library and use it as you wish. If you change it, please clearly mark your version is derived from mine, and that it is modified.

Legalese

www.ikriv.com.Collections library is Copyright (c) 2005, Ivan Krivyakov.

Permission is hereby granted, free of charge, to any person obtaining a copy of library binaries, source code, documentation andother accompanying files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

  1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
  2. Modified versions of the Software shall be clearly marked as such. In particular, original author (Ivan Krivyakov) and the fact that the version has been modified shall be mensioned in the copyright notice, license, or similar documentation of the modified version.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Feedback

Please kindly send your comments, bug reports, feature requests and other thoughts to this e-mail address.