Redis sharding with Erlang

I've been using a great Erlang Redis library erldis recently and as our data set has grown much larger, we have decided we need to start sharding. Unfortunately the erldis library does not have support for sharding so I've forked the repository and made the changes. You can find my fork here. Pull request here.

A little background

Redis has been really great for us so far but as we get more data and more operations per second, we realized we need to start utilizing multiple Redis instances. Sharding basically splits up your keyspace across multiple servers in a consistent fashion. The typical example for finding out where a key belongs is by doing something like:

hash(Key) mod N

Where N is the index to a list of shards. This allows all of your clients to be consistent with knowing where a key belongs. The above algorithm is rather naive and the implementation that I've added to erldis is a bit different.

Hash Ring

In erldis when you create a ring each item or node is replicated on the ring multiple times, (128 by default in erldis), so that the hash ring is more evenly distributed. When a key is hashed, it is placed somewhere on the ring, and if you move clockwise the next node that is encountered is the one that will contain the key. If you are already at the end of the ring it just moves around the circle to the first node. Here is the Erlang representation of a ring with NumReplicas=2 with three nodes:

Eshell V5.8  (abort with ^G)
1> hash_ring:create_ring(["redis01", "redis02", "redis03"], 2).
{2,
 [{"redis01",
   545684121738837007922020352670259130010627269168},
  {"redis03",615626879509609616059735092207848107514679530568},
  {"redis02",831940194106586542005694020470691157762681039493},
  {"redis02",862200733042270842852150771881884931000854854035},
  {"redis01",976184092715215764651301117507107557316103934589},
  {"redis03",
   1324913878318290717698351186295127020927484244520}]}

The hash algorithm used above is the name of the node concatenated with the replica number. So if you look at the ring above, the first item, 545684121738837007922020352670259130010627269168 was generated by doing:

Eshell V5.8  (abort with ^G)
1> <<Int:160/unsigned-integer>> = crypto:sha("redis012"), Int.
545684121738837007922020352670259130010627269168

The hashed string is "redis01" + 2 (which is the replica number). By knowing the algorithm for where a node lies on the ring, a client in any language can take advantage of the shards.

Adding/Removing nodes

The first thing I have to say about adding nodes is to not do it. Salvatore Sanfilippo wrote a great post about sharding Redis, and his recommendation is that re-hashing is difficult and instead you should create a large number of small Redis instances up front, that way you don't have to add more nodes down the road.

For example, you might want to create 128 small Redis instances (multiple per box) and as you grow, just move those instances to their own dedicated (or larger) boxes. This lets you have a large amount of shards up front so you don't have to worry about re-hashing for a hopefully long time.

And for removing nodes, failures etc,. try to use Redis's replication and use some form of HA so that if a Redis instance dies, you have another one that is up to date to takeover (keepalived, Linux HA, etc,.).

Final thoughts

Redis has been fantastic and I'm really looking forward to finishing up on our Redis "cluster" and modifying our clients to support sharding. As for right now, I'm going to be looking into ways to migrate our existing Redis data into the shards in real-time and then "flipping the switch" on the clients.

Written on April 10, 2011