I wrote a post last year on consistent hashing for Redis and Memcached with ketama: http://engineering.wayfair.com/consistent-hashing-with-memcached-or-redis-and-a-patch-to-libketama/. We’ve evolved our system a lot since then, and I gave a talk about the latest developments at Facebook’s excellent Data@Scale Boston conference in November: https://www.youtube.com/watch?v=oLjryfUZPXU. We have some updates to both design and code that we’re ready to share.
To recap the talk: at any given point over the last four years, we have had what I’d call a minimum viable caching system. The stages were:
- Stand up a Master-slave Memcached pair.
- Add sharded Redis, each shard a master-slave pair, with loosely Pinstagram-style persistence, consistent hashing based on fully distributed ketama clients, and Zookeeper to notify clients of configuration changes.
- Replace (1) with Wayfair-ketamafied Memcached, with no master-slaves, just ketama failover, also managed by Zookeeper.
- Put Twemproxy in front of the Memcached, with Wayfair-ketamafied Twemproxy hacked into it. The ketama code moves from clients, such as PHP scripts and Python services, to the proxy component. The two systems, one with configuration fully distributed, one proxy-based, maintain interoperability, and a few fully distributed clients remain alive to this day.
- Add Redis configuration improvements, especially 2 simultaneous hash rings for transitional states during cluster expansion.
- Switch all Redis keys to ‘Database 0′
- Put Wayfairized Twemproxy in front of Redis.
- Stand up a second Redis cluster in every data center, with essentially the same configuration as Memcached, where there’s no slave for each shard, and every key can be lazily populated from an interactive (non-batch) source.
The code we had to write was
- Some patches to Richard Jones’s ketama, described in full detail in the previous blog post: https://github.com/wayfair/ketama.
- Some patches to Twitter’s Twemproxy : https://github.com/wayfair/twemproxy, a minor change, making it interoperable with the previous item.
- Revisions to php-pecl-memcached, removing a ‘version’ check
- A Zookeeper script to nanny misbehaving cluster nodes. Here’s a gist to give the idea.
Twemproxy/Nutcracker has had Redis support from early on, but apparently Twitter does not run Twemproxy in front of Redis in production, as Yao Yue of Twitter’s cache team discusses here: https://www.youtube.com/watch?v=rP9EKvWt0zo. So we are not necessarily surprised that it didn’t ‘just work’ for us without a slight modification, and the addition of the Zookeeper component.
Along the way, we considered two other solutions for all or part of this problem space: mcRouter and Redis cluster. There’s not much to the mcRouter decision. Facebook released McRouter last summer. Our core use cases were already covered by our evolving composite system, and it seemed like a lot of work to hack Redis support into it, so we didn’t do it. McRouter is an awesome piece of software, and in the abstract it is more full-featured than what we have. But since we’re already down the road of using Redis as a Twitter-style ‘data structures’ server, instead of something more special-purpose like Facebook’s Tao, which is the other thing that mcRouter supports, it felt imprudent to go out on a limb of Redis/mcRouter hacking. The other decision, the one where we decided not to use Redis cluster, was more of a gut-feel thing at the time: we did not want to centralize responsibility for serious matters like shard location with the database. Those databases have a lot to think about already! We’ll certainly continue to keep an eye on that product as it matures.
There’s a sort of footnote to the alternative technologies analysis that’s worth mentioning. We followed the ‘Database 0′ discussion among @antirez and his acolytes with interest. Long story short: numbered databases will continue to exist in Redis, but they are not supported in either Redis cluster or Twemproxy. That looks to us like the consensus of the relevant community. Like many people, we had started using the numbered databases as a quick and dirty set of namespaces quite some time ago, so we thought about hacking *that* into Twemproxy, but decided against it. And then of course we had to move all our data into Database 0, and get our namespace act together, which we did.
Mad props to the loosely confederated cast of characters that I call our distributed systems team. You won’t find them in the org chart at Wayfair, because having a centralized distributed systems team just feels wrong. They lurk in a seemingly random set of software and systems group throughout Wayfair engineering. Special honors to Clayton and Andrii for relentlessly cutting wasteful pieces of code out of components where they didn’t belong, and replacing them with leaner structures in the right subsystem.
Even madder props to the same pair of engineers, for seamless handling of the operational aspects of transitions, as we hit various milestones along this road. Here are some graphs, from milestone game days. In the first one, we start using Twemproxy for data that was already in Database 0. We cut connections to Redis in half:
Then we take another big step down.
Add the two steps, and we’re going from 8K connections, to 219. Sorry for the past, network people, and thanks for your patience! We promise to be good citizens from now on.