PHP Static Analysis with HHVM and Hussar

Wayfair Engineering places special emphasis on software testing as a means of maintaining stability in production. The DevTools team, which I am a member of, has built and integrated a number of tools into our development and deploy process in order to catch errors as early as possible, especially before they land in master. If you missed it, last week we released sp-phpunit, a script to manage running PHPUnit tests in parallel.

Today we’re open-sourcing hussar, another tool we’ve been using as part of our deploy process, that performs PHP static analysis using HHVM. The name comes partly from the cavalry unit in Age of Empires II — a classic strategy game where a few of us on DevTools still fight to the end — but mainly from the fact that it’s a good, open name that shares a few letters with the tool’s main feature: HHVM static analysis.

Put simply, hussar builds and compares HHVM static analysis reports. It maintains a project workspace and shows errors introduced by applying patches or merging branches. With hussar, projects that cannot yet run on HHVM are able to realize the benefits of static analysis and catch potentially fatal errors prior to runtime. Here is a list of errors hussar can catch. The tool displays these errors in a formatted report. This means our engineers get the safety of strong typing and static code analysis in addition to the flexibility of development they’re accustomed to.

We wrote hussar as a preparatory step toward possibly running Wayfair on HHVM. When we first tried to use HHVM, we discovered that it lacked support for a number of features and extensions used throughout our codebase. Recognizing both the performance and code quality benefits it could provide, we hacked together a script that would get at least the code analysis component working. Over the past few months this script has gone through several iterations as we worked on edge cases and ironed out false-positives to increase its accuracy and utility. The result is a tool that reliably reports legitimate errors.

We’re using hussar as part of our deploy process in addition to our integration and unit tests. Since we’ve started using the tool it has made us aware of a number of errors that slipped through our other tests. This multi-faceted approach to testing has allowed us to be more confident in deployments while keeping productivity high.

Our engineers are also able to run hussar against their code in advance of the deployment process, so ideally any errors are caught even before code review. We’re using a remotely triggered Jenkins job to coordinate running hussar builds on a dedicated testing cluster. HHVM is a bit heavy, so each machine has 6gb RAM, and reports are written to a shared filesystem to avoid repeating work. Run time is usually five minutes or less.

We also generate a full report nightly and are working through the backlog of existing errors. Each resolved error improves our codebase and brings us one step closer to the possibility of running our sites on the HHVM platform.

We think hussar solves the “backsliding” problem likely faced by all project maintainers with large PHP codebases when considering a migration onto HHVM. That is, it’s usually impractical to address all the static analysis issues at once, since tech debt continues to accumulate as developers work through the backlog. This is solved by hussar’s focus on preventing new errors, which allows momentum to build in the efforts to clean the rest of the codebase. For us, the proof of this is that the number of errors found by static analysis across our codebase has been steadily declining since we started using hussar.

For more details on how hussar works and information on how you can start using it to analyze your own code, head over to the project’s GitHub page. We hope you find it useful, and welcome any contributions!

Sweet Parallel PHPUnit

We write a lot of PHP unit tests at Wayfair, and we want to be able to run them as fast as possible, which seems like a good use case for parallelization. Running tests in parallel is not built in to PHPUnit, but there are ways to do it. When we looked we found three: parallel-phpunit, ParaTest, and GNU Parallel. All met some of our needs, but none was exactly what we wanted, so we got to work.

After hacking for a bit, we settled on these requirements:

  • Easy to set up
  • Fast to run
  • Minimalist configuration and resource usage
  • Not dependent on PHP, because chicken-before-egg

and these specifications for input, operation and output:

  • Use suites–work with existing test suites, or make suites as needed out of individual test files
  • Run suites in parallel
  • Preserve exit codes and errors

We looked at GNU Parallel. It worked, but it was an additional dependency, and it is not conveniently packaged for a broad set of platforms. It also ended up running more slowly than backgrounding in the shell, and since we didn’t need any of the fancier/nicer features of it, we cut it out of our scripts.

ParaTest is awesome, but it uses PHP, which complicates things when we’re testing new versions or features of PHP.

Parallel-phpunit was the closest existing tool to what we wanted, but we didn’t like the overhead of invoking a separate PHPUnit process for each file. The logical design of our new ‘Sweet Parallel PHPUnit’ is a suite-enabled bash test-runner script, similar to parallel-phpunit, with output and error codes handled to our liking. The Linux PIPESTATUS array variable was the key to doing this last part in bash.

So we finally got everything working, as you can see https://github.com/wayfair/sp-phpunit on github, and it was time for the moment of truth. Did it actually work any faster than our other options, on our own largest battery of tests? YES! We cut our run time down by 36% relative to the fastest alternative, while maintaining a small memory footprint! Before opensourcing it, we also generated some generic tests, to convince ourselves that our success wasn’t a coincidental artifact of our own test suite.

We wrote scripts to handle a few different scenarios. Here is what they generate:

  • One massive file with 2500 unit tests.
  • 25 folders each with 100 files, each containing exactly 1 unit test.
  • 10 folders each containing 10 files that have 1 unit test that sleeps for between 0 and 2 seconds
  • Same thing, but with one anomalous file, hand-edited, that sleeps for 30 seconds instead of 0-2.
  • 25 files each containing 100 unit tests.

Below you can see the results of each suite run against the other parallel options, as well as PHPUnit directly for comparison.

Running with 6 Parallel threads, average over 5 runs(minutes:seconds)
Test Case sp-phpunit paratest Parallel-phpunit phpunit(original)
2500 files with 1 00:08.93 02:00.00 03:05.67 00:34.87
25 files with 100 tests 00:01.35 00:02.12 00:02.65 00:01.47
One file with 2500 tests 00:01.83 00:01.99 00:01.73 00:01.49
100 files with sleeps 00:18.51 00:19.45 00:22.85 01:40.36
100 files with sleeps(one file sleeps for 30 seconds) 00:45.47 00:30.47 00:32.55 02:10.55

You can see that the more files you have, the more sp-phpunit really shines. We happen to have many small files, with many quick tests spread out among them, so our real test suite is most like the first line in the table, and the improvements are dramatic.

The TODO list for this project is not empty. The way that sp-phpunit generates its temporary suites has no knowledge of how long each sub test/file will take. This can lead to some bad luck where, for example, if you do 6 parallel runs, 5 might finish in 3 seconds, but one that happens to contain the slow tests will take say another minute to finish. This is clearly shown in the last row of our table. The ‘sleep 30′ is added to a bunch of other tests, and the cumulative effect, because of the grouping that we do, pushes the cumulative time for sp-phpunit higher than the other frameworks.

In upcoming versions I’d like to implement something so that you can pass in a folder to create a suite from. Also since this was created for our system only, I’m sure there are some options that other people will want or need that we have not implemented simply because the default behavior works for us. I hope this has given some insight into how and why we built sp-phpunit. We hope others find it as useful as we have. If you do happen to check it out and have some results you would like to share, please reach out. We’d be very excited to hear about it!

WebPerf on Location Boston, with Jack Wood

Catchpoint is running an event today, called “WebPerf on Location Boston”, part of a series of such events in different cities. It starts at 2, and our very own Jack Wood, Wayfair CIO, is speaking at 3:30. Details and the link to register are here. It’s at Battery Ventures in the seaport area, and it should be an excellent afternoon.

Web components talk at CSS Dev Conference

Andrew Rota, an engineer in our client technologies group, is speaking in a bit, at the CSS Dev conference in New Orleans, on “Web Components and Modular CSS #components”. It’s a great talk, about these emerging standards and what’s possible to do with them. If you’re there, check it out: http://cssdevconf2014.sched.org/. We’ll post the links to slides and video when they come out.

noc

Here’s our Frontline team at work, in our spiffy network operations center:

Wayfair network operations center

Wayfair network operations center

 

 

Selling home goods on the internet isn’t rocket science, but if you actually wanted to send a couch to the moon, you’d want to plan for and monitor that from a room like this. If you can’t make out the clocks on the wall, those are the times in Seattle, Ogden Utah, Boston/NYC/Hebron Kentucky, London/Galway, Berlin, and Sydney.

Wayfair (W): ready for lift-off.

Mobile Development Tech Talk Series, second installment

Wayfair’s mobile architect, Jeff Ladino, is giving the second of a series of tech talks, on Wayfair’s mobile applications.  The event will be held at Wayfair’s new offices, 4 Copley Place, on Tuesday, September 9th, from 6 pm to 7:30.  Pizza will be served.  Details on the talk, and free Eventbrite signup, are at this link: https://www.eventbrite.com/e/mobile-development-tech-talk-series-tickets-12526809023.

Mobile Development Tech Talk Series

Wayfair’s mobile architect, Jeff Ladino, is giving the first of a series of tech talks, on Wayfair’s mobile applications.  The event will be held at Wayfair’s new offices, 4 Copley Place, on August 5th, from 6 pm to 7:30.  Pizza will be served.  It should be a great talk, in our great new space.  Details on the talk, and free Eventbrite signup, are at this link: https://www.eventbrite.com/e/mobile-development-tech-talk-series-tickets-12304949435.

Wayfair at Boston TechJam 2014

Wayfair engineering will be out in force at Boston TechJam 2014, on Thursday, June 12, 2014, at Boston City Hall Plaza, starting at 4 pm.  We’ll be in the middle row, one slot away from the beer tent, and we will have some fun stuff, including a giant game of Jenga.  You can meet Steve Conine, co-founder, owner, chairman and CTO of Wayfair, as well as other engineering team members.  If you don’t know Wayfair engineering, we’re an intense but friendly team of ~300 people, at the largest and fastest-growing e-commerce company in Boston.  We used to call ourselves the biggest Boston tech company you’ve never heard of, but that’s all changing fast.  Come and hang out with us!

Consistent hashing with Memcached or Redis, and a patch to libketama

Summary

This is a howto for consistent hashing of keys in key-value stores, with a focus on cache servers.  We have code to share, in the form of a set of patches to ketama on github.  At Wayfair, we have been using this code, or slightly less evolved versions of it, with our production Redis servers for a long while, and our production Memcached servers for a short while.  It is now mature enough that we can release the code and describe our results.  The code is cache-server agnostic: there is nothing specific to Memcached or Redis in it, and it could be used with other servers.  The internal project that led to this was one of those little efforts intended to smooth over some blips in our fault log, and which has now produced a very useful set of building blocks for scalable systems.

Before I get started, I have three public messages, for three audiences:

  • To the non-free key-value-store salesmen of the world: one of my main purposes here is to continue to be able to use free key-value stores, as my team helps to scale Wayfair up into larger and more numerous data centers.  This relationship that you keep trying to initiate is just not going to work.  Please stop calling me.
  • To Brad Fitzpatrick and the other authors of Memcached: Memcached, you’re still the one, after all these years!  You rock the low-latency performance like nobody else.
  • To the author of Redis, with gratitude I say this: I’m sure Redis cluster is going to be an awesome thing for a lot of users.  But Salvatore, don’t sweat it too hard.  We’ve got you covered, with this technique, for some scaling scenarios that people can go very far with.  As long as plain old Redis is fast and awesome, we won’t be needing Redis cluster in order to go wide and large with Redis.

Background

Consistent hashing is an idea that has been around for a while. It solves a common problem with sharded cache server clusters, which can be disrupted when a single node fails and sets off a festival of rehashing, key shuffling and cache-filling database calls. The history of the idea and its implementations is roughly this: Danny Lewin, an Akamai founder, thought of it. (I won’t dwell on the details, but if you don’t know the story, look it up: in addition to being a hero of applied mathematics for this invention, he is a hero, full stop, for what he tried to do just before he was killed during 9/11.)   A group of MIT professors and students including him wrote a paper in 1997 (Karger, Leighton, Lewin et al.), and Akamai  has an in-house implementation.  Various people, at various times, have written versions in Perl, Python, PHP, and other languages. Richard Jones, then CTO of Last.fm, wrote an open source C library called ‘ketama’ in 2007, with wrappers or ports in a few languages.  Akamai alumni at Basho have incorporated an Erlang version of the idea into Riak.  Jeremy Zawodny and others have written excellent blog posts describing configuration niceties of the way the idea is used at Craig’s List and other places.  I have put a list of links at the end of this post.   Of the implementations we looked at, ketama has the broadest language/platform coverage.  We started using it early in our process of design and prototyping, and although we have found reasons to patch it, we never felt compelled to switch to something else or write an alternative from scratch.  My thanks go to Peter Newell, now of Srsly Software, for finding it, and for convincing me that it was going to work.  He did that by making  sure you could write the keys from client code in any one of the supported languages, and read them from any of the other languages, and be sure they would always be on the right servers.  I’m not sure about the other implementations, but the Riak and ketama versions strictly observe the original Akamai design of scattering a large number of points (160 of them) around a unit circle, a new set for each shard.  Ketama calls this circle the ‘continuum’.  I call it the ‘ring’.  This is done so that when a node disappears, the keys from the vanished shard are distributed  to the other members of the ring in even quantities.  If you have n shards, with m as the maximum amount of memory used on any one of them, you only need m/(n-1) extra space on each shard to hold all the extra keys for a single node failure, m/(n-2) to support 2 node failures, etc.  If you were to divide the key space naively as with a pizza cutter, with one slice per shard, you would need 2m on each shard to support a single node failure, 3m to support 2 failures (worst case, when the nodes are contiguous in the ring), etc.

What’s up with the name ‘ketama’?  A Google search tells us that ‘ketama’ can be a type of marijuana or a flamenco band.  Last.fm is a music site, so the flamenco interpretation is tempting.  On the other hand ‘hashing’ and ‘hash’ make an obvious play on words.  A quick look at the code removes any ambiguity.  There are functions in the library called ‘ketama_roll’ (create) and ‘ketama_smoke’ (destroy).  Draw your own conclusions about people in the music business, their sly senses of humor, and what they might be smoking at parties.  Not to mention that there were a couple of half-baked things in the code.  Heh. Absent our patches it used to read from the configuration files too often, didn’t use shared memory in exactly the way we wanted, and could not modify its configuration on the fly.  We’re actually not making a pull request yet, because we have a not-quite-fully-tested version that eliminates shared memory from the system altogether. But these are quibbles, about problems we were able easily either to solve, or to work around with wrapper code up the stack.  Mad props, and many thanks, to Mr. Jones for his excellent library.

Pre-existing state at Wayfair

Before implementing the consistent hash, we had been using Memcached at Wayfair in a very simple configuration: twin large-memory machines, one master, one slave.  Why not a sharded system?  Simple.  We could afford the big boxes, and we hadn’t ever had a very large key space.  But we knew that we would eventually come to a maximum height for those twin mounds of memory, and that we would want to move to a larger number of cache servers that we could scale horizontally.  As it says in all the Memcached documentation: if you need more space, just add more nodes!  So we decided to get ahead of the problem.

Why both Memcached and Redis, you may ask?  The short answer is that we’re in a period of experimenting with key-value stores for various purposes.  For now, we’re using Memcached for completely transient in-memory data, where every cache key has a lazy-initialization path.  We use Redis for in-memory data that is backed by disk storage, where each Redis node replicates to a slave, which writes to the disk.  The master does not write to disk, and when failover occurs to the slave, they switch roles.  This is typically, but not exclusively, for values that do not change very often, and are populated by a batch process of some kind.  If there’s no lazy-initialization path for a key, it’s going in Redis, at least for now.

Design considerations

How should the clients find and talk to the nodes of the cluster?  Put the list of nodes in a configuration file, and you’re all set, right?  Heh.  After looking at the literature and giving the matter some thought, we came up with three candidate architectures.

  1. 3-tiered design. All the clients talk to a central dispatcher (with a SPOF-avoiding twin, naturally), which keeps track of which nodes are in the cluster, probably with some sort of monitor background thread that updates an in-memory configuration. The dispatcher helps with race conditions and ‘thundering herd’ problems in failover situations, and reduces the number of connections among different servers.
  2. Fully distributed configuration. The clients are occasionally updated with fresh lists of all the nodes in the cluster, and they are set up to try a progression of candidates if their default choice is unavailable.  This is also good for thundering herds and race conditions.  However, it might lead to a lot of running through lists of possible shard locations under stress.  This seems wasteful, but is perhaps not disastrous.
  3. Distributed configuration, but with an external nanny, or let’s say an ‘animal trainer’.  In this scenario the clients would check a primary shard location, possibly with a locally configured failover-to-secondary option.  But the overall system would depend on an independent monitor of the members of the ring. Zookeeper is good at this kind of thing, so we started experimenting with it in this role.
Our decision-making process was primarily focused around failover scenarios.  We had two very different ones to consider, between Memcached and Redis.  We’ll call Memcached, as we use it, the lazy cache, and Redis the persistent cache.  With the lazy cache, if you can’t find the node on the ketama ring, you need to look for the next node on the ring. You don’t need a master-slave configuration, because the keys migrate lazily to different arcs on the ring when a node disappears.  However, once you’ve done that a few times, you might not want all your clients to be repeating the process of checking with the dead node on every request.  So we asked Zookeeper to monitor which members of the ring are available, and to notify the agents on the client nodes when the list of members of the ring has changed.

With the persistent cache, you can’t do that, because there is no lazy initialization path for the cached items.  We configure our persistent cache ring with the consistent hash for simplicity’s sake, and so we can scale all this out horizontally as far as we like.  But if an arc disappears, the failover is to a waiting hot spare slave backup, which can be hustled into service quickly by Zookeeper, whose agents give the ip address of the shard to the slave node, and turn it into the master.

Implementation

The first operating system on which we started to use ketama was FreeBSD. The library wouldn’t compile, so I wrote some autotools scripts, and hacked the code to get it to build on Mac OSX, Solaris, FreeBSD, and Linux.  Ketama uses md5 as its hashing algorithm by default, but we discovered a patch file already in the tarball that implemented a faster algorithm, FNV-1a.  So Steve and Viktoras from our team merged all those changes together, ran the resulting system through Valgrind to look at memory usage, and made some improvements.  They also changed the way the code loads its configuration, to make it easier to change the list of servers on the fly.

In the mean time we switched our servers from FreeBSD to Linux.  Thanks anyway, FreeBSD!  Your strict, old-school C compiler and standard library made it a lot easier to see the need for some improvements that will help us even on Linux, which had been allowing the looser code to run.

Deployment, Results

Rolling this out was quite a thing.  We handed the project off to Andrii from Site Reliability Engineering, and Aaron, from our Storefront team, to tame the animal trainer, as it were, write a Zookeeper script, and refactor the cache client wrapper code until it behaved as desired.  Here is a picture of a test we ran, of the spiffy new 14-node cluster in our Massachusetts data center (it has a twin in Washington state, and more coming soon).  In these graphs, we’re counting requests to the different nodes of the cluster.  The gaps are periods where we forced a particular node, or more than one, off line.  You can see that traffic to the other nodes picks up when that happens.  We set Zookeeper to fail over upon 5 consecutive failed health checks, so it won’t be too twitchy.  The failover takes 5-7 seconds, and during that time, it’s not as if the requests are failing–they are just going back to the source, or origin, as the content delivery network people would say.

Charts of requests to the 14 nodes of a memcached cluster, with distribution governed by ketama

 

 

Happy hashing with this, if it fits the size and performance requirements of the cache that you have.  And if it doesn’t, and you’re willing to use some even more involved techniques, you can always scale Memcached to Facebook scale!  The Facebookies describe all that in this blog post, which has a link to a Usenix paper they wrote.

References

Here’s a little bibliography of consistent hashing.  It includes the original research paper, some blog posts about its use at various sites, and links to a bunch of code:

Wikipedia article on consistent hashing

Lecture: Consistent Hashing, Danny Lewin, and the Creation of Akamai, by Tom Leighton, Akamai founder.

MIT/Akamai paper (Karger, Leighton, Lewin et al.). Other downloads here, if that goes away.

Consistent hashing in Riak.

Mathias Meyer, author of Riak Handbook, on practical consistent hashing

Another general treatment, by the audiogalaxy guy

On sharding in redis, by the author of redis

Jeremy Zawodny on sharding at Craig’s List

Richard Jones’s blog post on using his code at Last.fm

Richard Jones’s ketama code on github

Perl consistent hashing module

Python ‘hash_ring’ module