Network Challenged Streams

Executive Summary

  1. Don't use StreamReader to read text from a socket. If you do, you open yourself to random hangs.
  2. You can use XmlTextReader to read form a scoket, but you must be careful, or you open yourself to (predictable) hangs.

Downloads

Download test source code and binaries (25K).

Background

I was writing a piece of software that implements a request/response mechanism over a TCP socket. The protocol stipulates that both requests and responses are XML fragments. I ended up with a combination of XmlTextReader, a StreamReader and a NetworkStream, and it worked alright, until it started to hang on long messages. The client would behave as if received only a part XML message and wait for the rest, while I could clearly see that the entire message was delivered.

I started digging to the bottom of this, and found that the root cause of my headaches is weird (if not to say buggy) behavior of the StreamReader class when it points to a network stream.

Streams and TextReaders

In .NET Framework, classes derived from Stream represent streams of binary data. They operate in terms of bytes. Classes derived from TextReader represent text input and operate in terms of characters. The class that converts a binary Stream to a text input is called StreamReader. Its job is to read the underlying binary stream and convert incoming bytes into characters using the specified encoding.

For some encodings this process is straightforward. ASCII encoding converts exactly one byte to exactly one character. UTF-16 encoding converts every two bytes to a character. Other encodings, such as popular UTF-8, have variable number of bytes per character. This creates certain complications. For example, if the binary input is broken into chunks of equal size, a multi-byte character may begin in one chunk and end in the next.

Bytes (UTF-8):

Chunk 1 Chunk 2
97 97 97 194 163 98 98 98

Characters:

Chunk 1 Chunk 2
a a a £ b b b

Another important feature of the network streams is that "no more data" and "end of stream" events are clearly distinct. The server may stop sending data, wait for 10 minutes and then close the connection. Or it may wait for 10 minutes and send more data. Until the connection is closed, there is no way to know whether more data will arrive or not. Contrast this to file streams where both "no more data" and "end of stream" are achieved simultaneously at the end of file.

Problems with StreamReader over a NetworkStream

StreamReader can accept any stream as its data source. Unfortunately, StreamReader does not behave well when working with a NetworkStream.

StreamReader.Read() will always request more data from the underlying network stream, even if there is some text left in the read buffer.

This may not sound like something important, but let's consider the following scenario. The server sends a message to the client which is 552 bytes long. The message ends with the characters "</message>". Then the server stops transmitting and waits for the client to acknowledge the message. Suppose the client reads the message 512 characters at a time:

private static void ReadText(NetworkStream stream)
{
    var buffer = new char[512];

    using (var reader = new StreamReader(stream)))
    while (true)
    {
        var chars = reader.Read(buffer, 0, buffer.Length);
        if (chars == 0) return;
        var s = new string(buffer, 0, chars);
        if (s.EndsWith("</message>>")) 
        {
            // send acknowledgement
        }
    }
}

This simplistic client ignores the fact that the character sequence "</message>" may be split between two chunks of input. Suppose the messages are padded, so that it never happens.

Unfortunately, the code above does not work: the client hangs. The following sequence of events leads to the hang:

  1. The client requests up to 512 characters from the StreamReader.
  2. StreamReader requests up to 1024 bytes from the network stream. 1024 bytes is the stream reader's default buffer size.
  3. StreamReader returns first 512 characters of the message to the client. The remainder of the message is stored in the stream reader's buffer.
  4. The client again requests up to 512 characters from the StreamReader.
  5. The StreamReader does not return the remainder of the message to the client. Instead, it requests additional 1024 bytes from the network stream. The server is waiting for the client to respond, so it won't send any more data. The client is waiting for the StreamReader to retrieve remainder of the message, and the StreamReader is waiting for more data from the server. The system enters a deadlock state and hangs.

You can observe this scenario in practice using server scenario 1 and client scenario 2 in my test code: see "Running the tests".

The hang could have been prevented if StreamReader.Read() call immediately returned when at least some data is available, just like NetworkStream.Read() does. Unfortunately, this is not the way StreamReader.Read() works. This makes StreamReader unsuitable for request-response style network communication.

One may suggest that the problem would go away if the client always requests exactly 1024 characters from the stream reader. This way the client will exhaust the stream reader's buffer on every read, and no leftover characters would cause the hang. This solution "almost" works. Keep in mind that the client requests 1024 characters, and the stream reader's buffer is 1024 bytes. If all characters were exactly one byte, this solution would indeed work. However, if the stream can contain multi-byte (international) characters, we may end up with a situation when 1024 characters is not exactly 1024 bytes. Thus, we will lose the buffer alignment, there will be leftover characters in the buffer, and the client will hang. This is demonstrated by server scenario 2 and client scenario 3.

Reading XML from NetworkStream

As you remember, in my real life scenario the sevrer and the client exchange XML messages. Naturally, I wanted to use .NET's built-in XML parser to parse those messages. Fortunately, this can be done without running into the StreamReader bugs, but there is plenty of caveats.

XmlDocument.Load

XmlDocument.Load(Stream) seems like a good way to load an XML document from a socket, but it will not return until the input stream is closed. This makes XmlDocument.Load() unsuitable for the request-response scenario.

As strange as it seems, XmlDocument.Load() has legitimate reasons not to return until the end of the stream. Even after the root element is closed, XML document may have comments, processing instructions and whitespace. Thus, there is no way to know whether we read the whole document until the input stream is closed.

XmlTextReader Class

To quote MSDN, XmlTextReader "provides fast, non-cached, forward-only access to XML data". It can work with either a binary Stream or a TextReader. Amazingly, when XmlTextReader is supplied with a binary stream, it does not create an intermediate text reader. Instead, it calls the binary stream directly and converts bytes to characters using its own mechanisms. In other words, XmlTextReader has two branches of code: one that works with text input and the other that works directly with binary input, bypassing TextReader parsing mechanisms.

The binary parser built into the XmlTextReader does not suffer from the hanging bug like StreamReader. This allows us to parse XML from a network stream, provided we feed the network stream directly to XmlTextReader. If we create an intermediate StreamReader,

Loading Root Element into an XmlDocument

XmlTextReader is a great class, but it parses the document one node at a time, which may be too low-level. If we want to retrieve an entire root element of the incoming XML message, we can use ReadSubTree() method as follows:

private void ReadXmlSubtree(NetworkStream stream)
{
    using (var xmlReader = new XmlTextReader(stream))
    {
        while (xmlReader.Read())
        {
            if (xmlReader.NodeType != XmlNodeType.Element) continue;
            using (var subTreeReader = xmlReader.ReadSubtree())
            {
                var doc = new XmlDocument();
                doc.Load(subTreeReader);
                Console.WriteLine("Received XML document: {0}", doc.OuterXml);
                return;
            }
        }
    }
}

The subTreeReader exits when it reads the end element tag, and therefore does not hang forever. This method is illustrated by client scenario 6.

Summary of Test Results

I developed 4 server and 7 client scenarios that re-create different situation that may occur on the wire. The protocol for all scenarios is as follows:

  1. The client connects to the server.
  2. The client sends scenario number to the server
  3. The server produces valid XML whose exact content depends on the scenario.
  4. The server sends this XML to the client, either in one piece or in chunks.
  5. The client tries to process the XML on its end. Different client scenarios use different processing techniques.
  6. The client sends and ACK byte to the server.
  7. The connection is closed on both ends.

Note that in some scenarios client processing hangs. In this case the client must be forcefull terminated, and the server never receives the ACK byte.

Server Scenarios

There are four server scenarios that are defined below. In every scenario the server sends valid XML to the client:

  1. Send the text '<?xml version='1.0'?><message>aaa...aaa</message>', all in one transmission. There are 512 characters 'a' in the message.
  2. Send the text '<?xml version='1.0' encoding='utf-8'?><message>', then wait 100 ms and send the text 'aaa...aaa£</message> with 1023 characters 'a'. The message encoding is UTF-8. All characters, except for the pound sign £ are represented by one byte on the wire. The pound sign is represented by two bytes.
  3. Same as #2, but with 4095 characters 'a'.
  4. Send the text '<?xml version='1.0' encoding='utf-8'?><message>', then wait 100 ms and send the text 'aaa...aaa</message> with 4048 characters 'a'. The message encoding is UTF-8. All characters are represented by one byte on the wire.

Client scenarios and their interaction with the server scenarios are shown in the table below:

Client scenario Server scenario 1 Server scenario 2 Server scenario 3 Server scenario 4
1. Read as binary, 512 bytes at a time. Works Works Works Works
2. Read as text, 512 characters at a time. Hangs Hangs Hangs Hangs
3. Read as text, 1024 characters at a time. Works Hangs Hangs Works
4. Use XmlDocument.Load() directly from the NetworkStream. Hangs Hangs Hangs Hangs
5. Use XmlDocument.Load() from a TextReader that points to the NetworkStream. Hangs Hangs Hangs Hangs
6. Use XmlReader.ReadSubTree() directly from the NetworkStream. Works Works Works Works
7. Use XmlReader.ReadSubTree() from a TextReader that points to the NetworkStream. Works Works Works Hangs

Running the Tests

To run the server, open the command line, navigate to the directory with NetworkChallengedStreams.exe and issue the following command:

start NetworkChallengedStreams.exe /server

To run the client, issue the following command:

NetworkChallengedStreams.exe serverScenario clientScenario

where serverScenario is server scenario number (1-4) and clientScenario is client scenario number (1-7).

Conclusion

  1. Don't ever use StreamReader with NetworkStream. If you have to, read the bytes and decode them yourself. Keep in mind that a multi-byte character may straddle a chunk boundary.
  2. If your input is XML, you don't need to decode characters yourself. You can parse it using the ReadSubTree() method.

Feedback

Questions? Comments?
Drop me a line