Redis demystified

Disclaimer: this is a summary for my records after a deep dive into current state of Redis. I tried to be as precise as possible (after all, I am writing it so I won’t forget what I learned), but some details may still be inaccurate or incomplete.

Basic Functionality

Redis stands for “remote dictionary server”, and in its core supports GET/SET operation for simple key-value pairs. Both the key and the value are strings (to be precise, the value is a byte array, but more often than not it stores encoded strings).

Basically, at its core Redis is a glorified hash table (Dict[str,str] in Python terms), that runs in a separate process either on the local machine or on a remote machine, and accepts updates over TCP.

r = Redis(host, port, decode_responses=True) # automatically convert byte array values to strings
r.set(key, value)
value = r.get(key)

From these humble roots, Redis grew to implement many other features:

Automatic value expiration.
Transactions/atomicity support.
Complex value types: hash tables, lists, sets, sorted sets.
Simple queues based on lists.
More complex queues a.k.a. streams.
Event broadcasting a.k.a. channels.
Persistence support.
Support for Redis clusters.

Running Redis

If you have Docker installed, it is very easy to run Redis on your machine, be it Windows, Linux or Mac:

docker pull redis:latest
docker run -d --name my-redis -p 6379:6379 redis:latest

You may need to use sudo docker instead of docker in some cases.

Sample code

I wrote a small set of Redis samples available at https://github.com/ikriv/redis-samples.

Of course, there are plenty of other Redis samples on the Internet.
ChatGPT was also very helpful in answering questions and providing sample code.

Expiration of values

It is possible to specify absolute expiration time or relative time-to-live when setting a Redis value. The value will be automatically deleted when the time is up.

Concurrency guarantees and transactions

In addition to simple get/set, Redis provides multiple keys operation that can set/get several keys atomically, as well as more general transactions support. This is something a regular hash table can’t do.

There are also “incr” and “decr” operations for numeric values (which are actually strings, you can set a key to “1234” and then “incr” it, but if you set it to “bob”, “incr” will throw an error).

Value types beyond strings

In addition to string values, Redis supports “hashes” (sub-dictionaries), lists, regular sets, and sorted sets. I suppose all these can be emulated using simple keys and transactions, but having a dedicated API makes it faster and more convenient.

“Hash” is basically a dictionary within a dictionary.

# set some subkey to some value
r.hset(key, {subkey:value})

# simple login analytics: increment value in subkey user_name by 1, 
# or set to 1 if it doesn't exist
r.hincrby("logins_by_user", user_name, 1)

The difference between a hash and a bunch of compound keys of the form “key:subkey” is concurrency guarantees.

It is possible to set multiple values at once atomically.
It is possible to retrieve all subkeys of a key atomically.
It is possible to delete the entire hash atomically.

For lists, which are actually more like deques, one can efficiently push items into either end of the list, get a count, get a range of items, delete an item, delete the entire list etc. All these operations are atomic.

r.rpush("my_list", value)
value = r.lpop("my_list") # returns None if nothing's in the list

For sets, one can add and remove items from a set, get a count, and get the entire set. Set ignores duplicate items.

Sorted set (zset) is like a set, but it maintains a sort order.

basic-redis.py sample with get/set, expiration, list, etc.

Lists as queues

It is possible to repurpose a list as a queue.

# producer
r.rpush("my_list", value)

# consumer
list_name, value_name = r.blpop("my_list") # blocks

The reason blpop returns a tuple is because it can listen to multiple lists/queues.

This is a very basic queue: there is no redelivery, ACK, or monitoring of any kind. It is also entirely in memory by default: if the Redis process dies, unprocessed messages die with it.

list_as_queue.py sample, using multiple processes.
list_as_queue_asyncio.py, sample using single process and async/await.

Channels

Channels stand out from the rest of Redis functionality because they don’t store any values at all. Channels are most similar to radio broadcasts: the producer sends a message on a specific channel. Anyone tuned to that channel will receive the message. If no receivers are tuned in when the broadcast occurs, the message is lost forever, there is no recording or storage.

# producer
r.publish("redis_classic_rock_station", "yesterday all my troubles seemed so far away")

# consumer
with r.pubsub() as listener:
  listener.subscribe("redis_classic_rock_station")
    for message in listener.listen(): # infinite loop
      match message:
        case {"type": "message", "data": data}:
           print(data)
        # ignore messages of other types like 'subscribe'

In addition to the simple case above, more complex scenarios are supported:

It is possible to subscribe one listener to multiple channels at once.
It is possible to subscribe to a “pattern”, e.g. "redis_*_station", i.e. to an any channel whose name matches a pattern.

redis_channels.py sample, using multiple processes.
redis_channels_asyncio.py, using async/await.

Streams

Redis streams provide a more feature-reach interface for queues. On top of the read/write functionality, they support:

Acknowledgment.
Examination of “pending” messages that were not ACK’ed by any consumer.
“Fan-out” or multiple delivery using groups, see below.

A “group” is similar to a “fan-out” exchange in RabbitMQ. When subscribing, a consumer may specify the stream and a group within the stream. E.g. when handling an orders stream, one group of consumers can do order fulfillment, and the other group can do analytics. Each message would be delivered to one of the “fulfillment” consumers and one of the “analytics” consumers.

We ignore group when writing to the stream:

r.xadd("my_stream", {"event_type": "login", "user": user})

Reading message for a group requires multiple lines of code:

while True:
    # response is an array of [(stream1, messages1), (stream2, messages2)]
    response = r.xreadgroup("my_group", consumer_name, {"my_stream": ">"}) # ">" means "all messages"

    for _stream, messages in response:
        for message_id, message_data in messages:
             # message_data is what was passed to r.xadd()
             if message_data.get("event_type") == "login":
                # ... do something ...

Complete sample: redis_streams.py.

Persistence

Redis operates strictly in memory by default. This means that whenever Redis process crashes or is restarted, all pending data is lost.

There are two persistence modes to combat this: snapshots and the journal, a.k.a. append-only file or AOF.
Snapshots are taken periodically (e.g. every second), and minimize, but don’t completely prevent data loss.
The AOF file records every change and thus prevents data loss, but it slows down the operation considerably.

According to ChatGPT, without persistence a Redis instance can potentially handle up to a million updates per second, and with AOF this drops to about 100K/second. Of course, exact figures depend on the hardware, network speed, data structure, and other factors, but these are the orders of magnitude one can expect.

Clusters

Initially Redis was limited to a single process, which naturally puts boundaries on availability and how much data it can store.

Modern versions of Redis can run in cluster mode, but this introduces various limitations on transactions and concurrency, as atomic operations across node boundaries are problematic. E.g. two keys from two different nodes cannot be updated atomically.

Channels performance was not great with clusters up to Redis 6, as all messages were sent to all nodes, regardless of presence of actual consumers. Redis 7 applies a better algorithm, but if many messages are sent between clusters, it can still cause issues.

Documentation

I found it somewhat difficult to find good documentation for the Python client specifically (redis-py).

Redis documentation is written in terms of text commands entered via redis-cli e.g. see documentation for BLPOP.

Redis-py documentation is rather rudimental and mostly refers to the description of corresponding raw Redis commands, e.g. here’s the documentation for the blpop() method.

This sometimes makes it hard to figure out how exactly the Python client behaves. E.g. it looks like the list of message “types” for channel messages is not really documented anywhere except the source code.

In conclusion

Redis is a very popular and very fast data storage solution that can be used as a cache, a database, a message queue, and an event broadcasting service. For most of those things the functionality is rather limited in comparison to specialized software, but a lot of times what’s available in Redis is more than enough for a “normal” project.

Scalability is important, but most projects don’t start with a billion users and a trillion requests per second. It’s critical to recognize the limitations before they lead to problems, but at the same time there is no need to jump to overly complex solutions where a simple solution would work just fine.

Ivan Krivyakov

Premature optimization is the root of all evil