Scaling Redis and Memcached at Wayfair

I wrote a post last year on consistent hashing for Redis and Memcached with ketama: http://engineering.wayfair.com/consistent-hashing-with-memcached-or-redis-and-a-patch-to-libketama/. We’ve evolved our system a lot since then, and I gave a talk about the latest developments at Facebook’s excellent Data@Scale Boston conference in November: https://www.youtube.com/watch?v=oLjryfUZPXU. We have some updates to both design and code that we’re ready to share.

To recap the talk: at any given point over the last four years, we have had what I’d call a minimum viable caching system. The stages were:

  1. Stand up a Master-slave Memcached pair.
  2. Add sharded Redis, each shard a master-slave pair, with loosely Pinstagram-style persistence, consistent hashing based on fully distributed ketama clients, and Zookeeper to notify clients of configuration changes.
  3. Replace (1) with Wayfair-ketamafied Memcached, with no master-slaves, just ketama failover, also managed by Zookeeper.
  4. Put Twemproxy in front of the Memcached, with Wayfair-ketamafied Twemproxy hacked into it. The ketama code moves from clients, such as PHP scripts and Python services, to the proxy component. The two systems, one with configuration fully distributed, one proxy-based, maintain interoperability, and a few fully distributed clients remain alive to this day.
  5. Add Redis configuration improvements, especially 2 simultaneous hash rings for transitional states during cluster expansion.
  6. Switch all Redis keys to ‘Database 0′
  7. Put Wayfairized Twemproxy in front of Redis.
  8. Stand up a second Redis cluster in every data center, with essentially the same configuration as Memcached, where there’s no slave for each shard, and every key can be lazily populated from an interactive (non-batch) source.

The code we had to write was

  1. Some patches to Richard Jones’s ketama, described in full detail in the previous blog post: https://github.com/wayfair/ketama.
  2. Some patches to Twitter’s Twemproxy : https://github.com/wayfair/twemproxy, a minor change, making it interoperable with the previous item.
  3. Revisions to php-pecl-memcached, removing a ‘version’ check
  4. A Zookeeper script to nanny misbehaving cluster nodes. Here’s a gist to give the idea.

Twemproxy/Nutcracker has had Redis support from early on, but apparently Twitter does not run Twemproxy in front of Redis in production, as Yao Yue of Twitter’s cache team discusses here: https://www.youtube.com/watch?v=rP9EKvWt0zo. So we are not necessarily surprised that it didn’t ‘just work’ for us without a slight modification, and the addition of the Zookeeper component.

Along the way, we considered two other solutions for all or part of this problem space: mcRouter and Redis cluster. There’s not much to the mcRouter decision. Facebook released McRouter last summer. Our core use cases were already covered by our evolving composite system, and it seemed like a lot of work to hack Redis support into it, so we didn’t do it. McRouter is an awesome piece of software, and in the abstract it is more full-featured than what we have. But since we’re already down the road of using Redis as a Twitter-style ‘data structures’ server, instead of something more special-purpose like Facebook’s Tao, which is the other thing that mcRouter supports, it felt imprudent to go out on a limb of Redis/mcRouter hacking. The other decision, the one where we decided not to use Redis cluster, was more of a gut-feel thing at the time: we did not want to centralize responsibility for serious matters like shard location with the database. Those databases have a lot to think about already! We’ll certainly continue to keep an eye on that product as it matures.

There’s a sort of footnote to the alternative technologies analysis that’s worth mentioning. We followed the ‘Database 0′ discussion among @antirez and his acolytes with interest. Long story short: numbered databases will continue to exist in Redis, but they are not supported in either Redis cluster or Twemproxy. That looks to us like the consensus of the relevant community. Like many people, we had started using the numbered databases as a quick and dirty set of namespaces quite some time ago, so we thought about hacking *that* into Twemproxy, but decided against it. And then of course we had to move all our data into Database 0, and get our namespace act together, which we did.

Mad props to the loosely confederated cast of characters that I call our distributed systems team. You won’t find them in the org chart at Wayfair, because having a centralized distributed systems team just feels wrong. They lurk in a seemingly random set of software and systems group throughout Wayfair engineering. Special honors to Clayton and Andrii for relentlessly cutting wasteful pieces of code out of components where they didn’t belong, and replacing them with leaner structures in the right subsystem.

Even madder props to the same pair of engineers, for seamless handling of the operational aspects of transitions, as we hit various milestones along this road. Here are some graphs, from milestone game days. In the first one, we start using Twemproxy for data that was already in Database 0. We cut connections to Redis in half:

redis-twemproxy-migration-1

Then we take another big step down.

redis-twemproxy-migration-2

Add the two steps, and we’re going from 8K connections, to 219. Sorry for the past, network people, and thanks for your patience! We promise to be good citizens from now on.

Front end talks

Andrew Rota and Matt DeGennaro of Wayfair Engineering are giving a talk on a Javascript framework we have written at Wayfair called ‘Tungsten,’ which shares goals and ideas with React.js but interoperates with Backbone, which we use heavily. It’s at the BostonJS meetup tonight at Bocoup, with a $5 cover: http://www.meetup.com/boston_JS/events/221038649/, on a double bill with Calvin Metcalf. It should be a great night.

Andrew has given a couple of talks at national conferences recently, on other front-end topics. First there was cssdevconf 2014, on web components (slides here: http://www.slideshare.net/andrewrota/web-components-and-modular-css), and more recently React.js Conf 2015, where he spoke about the interoperation of web components and React. Wow, that was a hot conference! Tickets could be had for only a few minutes before it was sold out. Fortunately the Facebook conference people are always really great about posting video, and here’s his talk on Youtube, with slides: http://www.slideshare.net/andrewrota/the-complementarity-of-reactjs-and-web-components.

Update: the slides from the presentation last Thursday night are up, here: http://www.slideshare.net/andrewrota/an-exploration-of-frameworks-and-why-we-built-our-own-46467292

PHP Static Analysis with HHVM and Hussar

Wayfair Engineering places special emphasis on software testing as a means of maintaining stability in production. The DevTools team, which I am a member of, has built and integrated a number of tools into our development and deploy process in order to catch errors as early as possible, especially before they land in master. If you missed it, last week we released sp-phpunit, a script to manage running PHPUnit tests in parallel.

Today we’re open-sourcing hussar, another tool we’ve been using as part of our deploy process, that performs PHP static analysis using HHVM. The name comes partly from the cavalry unit in Age of Empires II — a classic strategy game where a few of us on DevTools still fight to the end — but mainly from the fact that it’s a good, open name that shares a few letters with the tool’s main feature: HHVM static analysis.

Put simply, hussar builds and compares HHVM static analysis reports. It maintains a project workspace and shows errors introduced by applying patches or merging branches. With hussar, projects that cannot yet run on HHVM are able to realize the benefits of static analysis and catch potentially fatal errors prior to runtime. Here is a list of errors hussar can catch. The tool displays these errors in a formatted report. This means our engineers get the safety of strong typing and static code analysis in addition to the flexibility of development they’re accustomed to.

We wrote hussar as a preparatory step toward possibly running Wayfair on HHVM. When we first tried to use HHVM, we discovered that it lacked support for a number of features and extensions used throughout our codebase. Recognizing both the performance and code quality benefits it could provide, we hacked together a script that would get at least the code analysis component working. Over the past few months this script has gone through several iterations as we worked on edge cases and ironed out false-positives to increase its accuracy and utility. The result is a tool that reliably reports legitimate errors.

We’re using hussar as part of our deploy process in addition to our integration and unit tests. Since we’ve started using the tool it has made us aware of a number of errors that slipped through our other tests. This multi-faceted approach to testing has allowed us to be more confident in deployments while keeping productivity high.

Our engineers are also able to run hussar against their code in advance of the deployment process, so ideally any errors are caught even before code review. We’re using a remotely triggered Jenkins job to coordinate running hussar builds on a dedicated testing cluster. HHVM is a bit heavy, so each machine has 6gb RAM, and reports are written to a shared filesystem to avoid repeating work. Run time is usually five minutes or less.

We also generate a full report nightly and are working through the backlog of existing errors. Each resolved error improves our codebase and brings us one step closer to the possibility of running our sites on the HHVM platform.

We think hussar solves the “backsliding” problem likely faced by all project maintainers with large PHP codebases when considering a migration onto HHVM. That is, it’s usually impractical to address all the static analysis issues at once, since tech debt continues to accumulate as developers work through the backlog. This is solved by hussar’s focus on preventing new errors, which allows momentum to build in the efforts to clean the rest of the codebase. For us, the proof of this is that the number of errors found by static analysis across our codebase has been steadily declining since we started using hussar.

For more details on how hussar works and information on how you can start using it to analyze your own code, head over to the project’s GitHub page. We hope you find it useful, and welcome any contributions!

Sweet Parallel PHPUnit

We write a lot of PHP unit tests at Wayfair, and we want to be able to run them as fast as possible, which seems like a good use case for parallelization. Running tests in parallel is not built in to PHPUnit, but there are ways to do it. When we looked we found three: parallel-phpunit, ParaTest, and GNU Parallel. All met some of our needs, but none was exactly what we wanted, so we got to work.

After hacking for a bit, we settled on these requirements:

  • Easy to set up
  • Fast to run
  • Minimalist configuration and resource usage
  • Not dependent on PHP, because chicken-before-egg

and these specifications for input, operation and output:

  • Use suites–work with existing test suites, or make suites as needed out of individual test files
  • Run suites in parallel
  • Preserve exit codes and errors

We looked at GNU Parallel. It worked, but it was an additional dependency, and it is not conveniently packaged for a broad set of platforms. It also ended up running more slowly than backgrounding in the shell, and since we didn’t need any of the fancier/nicer features of it, we cut it out of our scripts.

ParaTest is awesome, but it uses PHP, which complicates things when we’re testing new versions or features of PHP.

Parallel-phpunit was the closest existing tool to what we wanted, but we didn’t like the overhead of invoking a separate PHPUnit process for each file. The logical design of our new ‘Sweet Parallel PHPUnit’ is a suite-enabled bash test-runner script, similar to parallel-phpunit, with output and error codes handled to our liking. The Linux PIPESTATUS array variable was the key to doing this last part in bash.

So we finally got everything working, as you can see https://github.com/wayfair/sp-phpunit on github, and it was time for the moment of truth. Did it actually work any faster than our other options, on our own largest battery of tests? YES! We cut our run time down by 36% relative to the fastest alternative, while maintaining a small memory footprint! Before opensourcing it, we also generated some generic tests, to convince ourselves that our success wasn’t a coincidental artifact of our own test suite.

We wrote scripts to handle a few different scenarios. Here is what they generate:

  • One massive file with 2500 unit tests.
  • 25 folders each with 100 files, each containing exactly 1 unit test.
  • 10 folders each containing 10 files that have 1 unit test that sleeps for between 0 and 2 seconds
  • Same thing, but with one anomalous file, hand-edited, that sleeps for 30 seconds instead of 0-2.
  • 25 files each containing 100 unit tests.

Below you can see the results of each suite run against the other parallel options, as well as PHPUnit directly for comparison.

Running with 6 Parallel threads, average over 5 runs(minutes:seconds)
Test Case sp-phpunit paratest Parallel-phpunit phpunit(original)
2500 files with 1 00:08.93 02:00.00 03:05.67 00:34.87
25 files with 100 tests 00:01.35 00:02.12 00:02.65 00:01.47
One file with 2500 tests 00:01.83 00:01.99 00:01.73 00:01.49
100 files with sleeps 00:18.51 00:19.45 00:22.85 01:40.36
100 files with sleeps(one file sleeps for 30 seconds) 00:45.47 00:30.47 00:32.55 02:10.55

You can see that the more files you have, the more sp-phpunit really shines. We happen to have many small files, with many quick tests spread out among them, so our real test suite is most like the first line in the table, and the improvements are dramatic.

The TODO list for this project is not empty. The way that sp-phpunit generates its temporary suites has no knowledge of how long each sub test/file will take. This can lead to some bad luck where, for example, if you do 6 parallel runs, 5 might finish in 3 seconds, but one that happens to contain the slow tests will take say another minute to finish. This is clearly shown in the last row of our table. The ‘sleep 30′ is added to a bunch of other tests, and the cumulative effect, because of the grouping that we do, pushes the cumulative time for sp-phpunit higher than the other frameworks.

In upcoming versions I’d like to implement something so that you can pass in a folder to create a suite from. Also since this was created for our system only, I’m sure there are some options that other people will want or need that we have not implemented simply because the default behavior works for us. I hope this has given some insight into how and why we built sp-phpunit. We hope others find it as useful as we have. If you do happen to check it out and have some results you would like to share, please reach out. We’d be very excited to hear about it!

WebPerf on Location Boston, with Jack Wood

Catchpoint is running an event today, called “WebPerf on Location Boston”, part of a series of such events in different cities. It starts at 2, and our very own Jack Wood, Wayfair CIO, is speaking at 3:30. Details and the link to register are here. It’s at Battery Ventures in the seaport area, and it should be an excellent afternoon.

Web components talk at CSS Dev Conference

Andrew Rota, an engineer in our client technologies group, is speaking in a bit, at the CSS Dev conference in New Orleans, on “Web Components and Modular CSS #components”. It’s a great talk, about these emerging standards and what’s possible to do with them. If you’re there, check it out: http://cssdevconf2014.sched.org/. We’ll post the links to slides and video when they come out.

noc

Here’s our Frontline team at work, in our spiffy network operations center:

Wayfair network operations center

Wayfair network operations center

 

 

Selling home goods on the internet isn’t rocket science, but if you actually wanted to send a couch to the moon, you’d want to plan for and monitor that from a room like this. If you can’t make out the clocks on the wall, those are the times in Seattle, Ogden Utah, Boston/NYC/Hebron Kentucky, London/Galway, Berlin, and Sydney.

Wayfair (W): ready for lift-off.

Mobile Development Tech Talk Series, second installment

Wayfair’s mobile architect, Jeff Ladino, is giving the second of a series of tech talks, on Wayfair’s mobile applications.  The event will be held at Wayfair’s new offices, 4 Copley Place, on Tuesday, September 9th, from 6 pm to 7:30.  Pizza will be served.  Details on the talk, and free Eventbrite signup, are at this link: https://www.eventbrite.com/e/mobile-development-tech-talk-series-tickets-12526809023.

Mobile Development Tech Talk Series

Wayfair’s mobile architect, Jeff Ladino, is giving the first of a series of tech talks, on Wayfair’s mobile applications.  The event will be held at Wayfair’s new offices, 4 Copley Place, on August 5th, from 6 pm to 7:30.  Pizza will be served.  It should be a great talk, in our great new space.  Details on the talk, and free Eventbrite signup, are at this link: https://www.eventbrite.com/e/mobile-development-tech-talk-series-tickets-12304949435.