Tungsten in the news

There’s a great interview with our own Matt DeGennaro by Paul Krill of Infoworld that came out a few days ago. The topic is Tungsten.js, our awesome framework that ‘lights up’ the DOM with fast, virtual-DOM-based updates, React-style, and can be integrated with Backbone.js and pretty much whatever other framework you want. It’s spiffy, it has a logo,
we do it github-first, and we’re getting a lot of mileage out of it at Wayfair. Matt mentions the templating aspect of our composite system: we use server-side PHP, including Mustache templates, and then our client-side pages, also including Mustache templates as needed, get dynamic updates via Tungsten.js. That works great for us because Mustache has implementations in both Javascript and PHP, among many other languages.

What’s that you say? The PHP implementation of Mustache is not fast enough for you? Well, we’ve got you covered! Adam Baratz just put up a blog post yesterday on a server-side optimization that’s been working well for us. We use John Boehr’s excellent PHP mustache extension, which is written in C++, and is much faster than vanilla PHP Mustache. Inspired by another snippet of PHP/Mustache code, we’ve even added lambdas to that, as Adam explains. I had to do a double-take the first time he explained that to me. As far as I can tell, the PHP community, of all groups of web programmers, is the least likely to care about lambdas in particular, and any kind of functional programming in general. And yet, we’re finding lambdas very useful for our globalization efforts, and we’re starting to use them for other things as well.

We’re still working on a date, but Adam, Matt and Andrew Rota will be giving a talk on all of this at the Boston Web Performance Meetup, hosted at Wayfair, in the near future.

Wayfair Labs in the news

Scott Kirsner has a terrific piece about tech talent wars in Boston, that was in Beta Boston on Friday, and then in the print edition of the Boston Globe on Sunday, October 26th. It features Wayfair Labs, which is our hiring and onboarding program for level 1 engineers in most of the department (a few specialized roles excepted). I am the director of it, so if you have any questions, please reach out.


Rendering Mustache templates with PHP

For the past couple years, Wayfair’s front-end stack has relied heavily on Mustache templates. They’ve let our growing front-end team focus on the front-end. They allow us to share more code between server and client as we push towards a Tungsten-powered future.

Anyone who’s seen a Mustache template knows that they’re pretty simple to write. Rendering them can be another story. We began using a pure PHP implementation. This got us off the ground, but as we expanded our use of templates, we ran into the unfortunate truth that such a library could never be faster than the pure PHP pages we were hoping to replace. Yet, we wanted to make it work, for the mentioned organizational and architectural reasons.

To understand why rendering Mustache in PHP is slow, first you have to understand what goes into rendering Mustache. Consider a canonical template/data example:

Hello {{name}}
You have just won {{value}} dollars!
Well, {{taxed_value}} dollars, after taxes.

  "name": "Chris",
  "value": 10000,
  "taxed_value": 10000 - (10000 * 0.4),
  "in_ca": true

This template must be tokenized, then each token must be rendered. Mustache is simple enough that tokenizing mostly amounts to splitting on curlies. Not a big deal for PHP. The real trick is in replacing {{name}} with the correct content. This is fine when you have a flat set of key/value pairs. Consider this example:


  "inner": 1,
  "outer": {
    "inner": 2

The output should be “12”. When rendering the {{#outer}} section, the renderer must know which “inner” to display. This is typically implemented by turning the data hash into a stack. When entering/exiting sections, data is pushed/popped. To get a value, start at the top and descend until you find a match.

This is an easy operation to describe, but it makes for some slow PHP. It was the major performance bottleneck with the Mustache implementation we first used, and it’s an issue with another popular implementation.

Bearing this in mind, we sought a more radical solution. Enter php-mustache, a C++ implementation of Mustache as a PHP extension. C++ is much better than PHP at traversing stacks. Witness this before/after from when we first deployed php-mustache:

php-mustache performance

This chart shows the render time for the product grid on our browse pages (for example, Beds). It’s a complex mustache template with a lot of data and a lot of nesting. The X-axis is clock time, the Y-axis is render time in milliseconds.

This kind of lift allowed us to justify making Mustache a standard instead of an occasional tool. And, courtesy of the open source world, we didn’t even have to write it. However, it became something of a double-edged sword. As Wayfair operates stores in multiple countries, we have to localize a lot of strings. We started handling this by building all of them in PHP and loading them into the template. This led to some thick code in some cases, which occasionally created friction around using Mustache. The typical i18n solution for Mustache involves lambdas, which unfortunately were not implemented in php-mustache… until now! If you’re a performance-minded Mustache user, we hope you’ll check it out.


When you write your first web application, chances are you’re going to query a database. When you write it in PHP, chances are it’ll look like this:

$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product");
$row = $result->fetch_assoc();

Before long, you have to start handling user input, which means escaping:

$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product WHERE name = " . mysqli_real_escape_string($mysqli, $product_name));
$row = $result->fetch_assoc();

As your application grows, you start writing code like this a lot. You may start encapsulating it in a DAO, but they do little besides erect walls around this chimeric code. “Okay,” you say. “This is fine, because it’s only me. I’m a Responsible Engineer and I don’t have to sugar-coat things for myself.” But soon, this project is going gangbusters. You’ve got a team, and then a large one, and now there’s no rug large enough under which you can hide this mess. And woe unto you should you decide you need connection pooling or any other resource management.

One solution to this problem is an ORM. But, some people prefer having their database interactions more “managed” than “abstracted away.” Instead your code could look more like this:

$pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
$statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");
$statement->bindParam(":name", $product_name);
$row = $statement->fetch(PDO::FETCH_ASSOC);

A little more verbose, yes, but also easier to read and less error-prone. This is PDO. It’s a PHP extension that provides a vendor-agnostic interface to various relational databases. It pairs a well-structured API for performing queries with a series of different database drivers.

When Wayfair began adopting PDO, our database access was relatively managed. An in-house library managed connections over the course of a request, but building queries involved a whole lot of string concatenation. Complex queries would get unwieldy. Engineers with prior PDO experience wanted to know why we weren’t using it. However, to convince engineers new to PDO that it would make their lives easier, it had be as low friction as the existing library and produce output in the same format.

Simplifying PDO syntax was the easy part. Technically, the example given is shy on error handling. The PDO constructor can throw exceptions. Related functions return a boolean value, indicating whether they succeeded. So a “correct” PDO example would look like this:

$pdo = null;
try {
  $pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
} catch (Exception $e) {
  // logging, etc., if you want to note when you were unable to get a connection

$statement = false;  // PDO::prepare() will return false if there’s an error
if ($pdo) {
  $statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");

$row = null;
if ($statement) {
  $statement->bindParam(":name", $product_name);

  if ($statement->execute()) {
    $row = $statement->fetch(PDO::FETCH_ASSOC);

// now, do something with $row

Awesome, I know, right? Sure, the PDO API is, on the whole, “nicer,” but no one’s going to want to deal with it if they’re forced to jump through these kinds of hoops. And who could blame you? At Wayfair, we place a lot of value on developer ergonomics. These are problems we strive to solve well when rolling out new internal tools. We landed on a slight extension to PDO that would yield this syntax:

$statement = PDO::new_statement("PT", "SELECT * FROM product WHERE name = :name"); // the first argument refers to the desired host/database
$statement->bindParam(":name", $product_name);
$row = $statement->fetch(); // PDO::FETCH_ASSOC is now the default fetch style

We pulled all the boilerplate into a factory function. It does the necessary error handling and reporting. If everything succeeds, it’ll return a standard-issue PDO statement object. If there are errors, it will return a special object which acts like a statement that’s failed, but will return empty result sets if asked. We felt comfortable that this would remove most of the friction around using PDO while preserving the underlying interface. Anyone who wants finer-grained control can still utilize the stock API.

The trickier problem was “make output the same.” While PDO looks the same with each driver, the drivers don’t necessarily behave the same. The documentation isn’t always clear about these differences. We needed to do a fair amount of testing and source code reading to suss out the effects.

While my examples have used MySQL, Wayfair is an MSSQL shop. We had been using the mssql extension. It uses a C API called DBLIB to talk to the server. Microsoft doesn’t maintain an open source version. FreeTDS is the commonly-used free implementation. One of the PDO drivers also uses DBLIB, but it returns column data differently. Instead of returning strings as strings and ints as ints, the PDO DBLIB driver returns everything as a string. We had to patch it to use the expected data types. To be able to differentiate between quoting strings as VARCHAR vs. NVARCHAR, we also added a parameter type. We also added support for the setting connection timeouts (PDO defines a PDO::ATTR_TIMEOUT constant, but it has no effect with the DBLIB driver).

Another reason we were first attracted to PDO was for prepared statements. Since MSSQL supports them, it seemed like this could be an opportunity for a performance gain. However, after digging into the driver internals, we found that the DBLIB driver only emulates them. Microsoft has an ODBC driver for Linux. We tested it in conjunction with PDO’s ODBC driver, but found the two to be incompatible. We were able to get it working with the plain odbc extension, but (amazingly) found prepared statements to be slower than regular queries. Since using prepared statements would’ve necessitated a nontrivial change in coding style, we decided against investigating the speed difference.

We’re currently working on deploying SQL Relay. Preliminary tests have proven out that it reduces network load without adding much overhead. It has a PDO driver, so we’ll be able to swap it into our stack without having to change how queries are made.

Tungsten.js: UI Framework with Virtual DOM + Mustache Templates

Performance is top priority here at Wayfair, because improved performance means an improved customer experience. A significant piece of web performance is the time it takes to render, or generate, the markup for a page. Over the last several months we’ve worked hard to improve the render performance on our customer facing sites, and ensure it’s easier for our engineers to write code that results in page renders with optimal performance.

We had been using Backbone.js and Mustache templates for our JavaScript modules at Wayfair for some time, but last year we realized that our front-end performance needed an upgrade. We identified two areas for improvement: efficiency of client-side DOM updates and abstracting DOM manipulation away from engineers.

The first issue was a result of the standard render implementation in Backbone. By default, the Backbone render implementation does nothing. It is up to developers to implement the render function as they see fit. A common implementation of this (and the example given in Backbone Docs) looks something like this:

render: function() {

The problem with this implementation is two-fold: first, the entire view is unnecessarily re-rendered with jQuery’s $().html() whenever render is called, and second, the render method always manipulates the DOM regardless of whether the data changed, so engineers must be explicit about when render is called to avoid unnecessary expensive DOM updates. The solution to both of these problems is a mix of only calling render when the engineer is sure the entire view needs to be re-rendered and then writing low-level DOM manipulation code when only portions of the view need to be updated, or the update needs to be more precise (e.g., changing a single class on an element in the view). All of this means that engineers have to be very aware of the state of the DOM at all times, and have to be aware of the performance consequences of any DOM manipulations. This makes for view modules that are hard to reason about and include low-level DOM manipulation code.

To address both of these problems, we investigated front-end frameworks that would abstract the DOM from the developer while also providing high-performance updates. The primary library we looked at was React.js, a UI library open-sourced by Facebook that utilizes a one-way data flow with virtual DOM rendering to support high-performance client-side DOM updates. We really liked React.js, but encountered one major issue: the lack of support for templates which enabled high-performance server-side rendering.

On modern web sites and applications, HTML rendering occurs at two points: once on page load when the DOM is built from HTML delivered from the server, and again (0 to many times) when JavaScript updates the DOM after page load, usually as a result of the user interacting with the page. The initial rendering happens on the server with a multi-page site like Wayfair, and we’ve put a lot of work into making sure it’s as fast as it can be. HTML markup is written in Mustache templates and rendered via a C++ mustache renderer implemented as an extension for PHP. This gives us server-side rendering at speeds faster even than native PHP views.

Since server-side rendering is an important part of our web site, we were glad that React.js comes with this feature out of the box. Unfortunately while server-side rendering is available with React.js, it’s significantly slower than our existing C++ Mustache setup. In addition to performance, rendering React.js on the server would have required Node.js servers to supplement our PHP servers. This new requirement for UI rendering would have introduced complexity as well as a new single point of failure into our server stack. For these reasons, as well as the fact that we already had existing Mustache templates we wished to reuse, we decided React.js wasn’t a good fit.

Where do we go from here? We liked many of the concepts React.js introduced us to, such as reactive data-driven views and virtual DOM rendering, but we didn’t want our choice of a front-end framework to dictate our server-side technologies, and dictate a replacement of Mustache rendering via C++ and PHP. So, after some investigation of what else was available, we decided to take the concepts we liked from React.js and implement them ourselves with features that made sense for our tech stack.

Earlier this year, we wrote Tungsten.js, a modular web UI framework that leverages shared Mustache templates to enable high-performance rendering on both server and client. A few weeks ago we announced that we were open sourcing Tungsten.js, and today we’re excited to announce that primary development on Tungsten.js will be “GitHub first,” and all new updates to the framework can be found on our GitHub repo: https://github.com/wayfair/tungstenjs.

Tungsten.js is the bridge we built between Mustache templates, virtual-DOM, and Backbone.js. It uses the Ractive compiler to pre-compile Mustache templates to functions that return virtual DOM objects. It uses the virtual-DOM diff/patch library to make intelligent updates to the DOM. And it uses Backbone.js views, models, and collections as the developer-facing API. At least, it uses all these libraries for us here at Wayfair. Tungsten.js emphasizes modularity above all else. Any one of these layers in Tungsten.js can be swapped out for a similar library paired with an adaptor. Backbone could be swapped out for Ampersand. virtual-DOM could be swapped out another implementation. Mustache could be swapped out for Handlebars, Jade, or even JSX. So, more generally, Tungsten.js is a bridge between any combination of markup notation (templates), a UI updating mechanism, and a view layer for developers.

We don’t expect Tungsten.js to be the best fit for everyone, but we think it fits a common set of uses cases very well. We’ve been using it on customer-facing pages in production for a while here at Wayfair and so far we’ve been very happy with it. Our full-stack engineers frequently tell us they far prefer using Tungsten to vanilla Backbone.js + jQuery, and we’ve improved client-side performance now that DOM manipulation is abstracted away from developers. And while we weren’t trying to be the “fastest” front-end framework around, it turns out that when we re-implemented Ryan Florance’s DBMonster demo from React.js Conf in Tungsten.js, the browser’s frame rate ends up being, give or take, at the same level as both React and Ember with Glimmer.

Here at Wayfair we have a saying that “we’re never done”. That’s certainly the case with Tungsten.js, which we’re constantly improving. We have a lot of ideas for Tungsten.js in the coming months, so watch the repo for updates. And of course we welcome contributions!

No-follow SEO link highlighter Chrome extension

Cari, who is a developer on our SEO team, just wrote a Chrome extension that’s up on both github (https://github.com/wayfair/nofollow_highlighter) and the Chrome web store (click here to add to Chrome). If you don’t know this subject matter, here’s a classic explanation from Matt Cutts of Google, from a few years ago: https://www.mattcutts.com/blog/pagerank-sculpting/. A lot has changed in SEO since then, but this basic idea has become a constant in internet life: if you’re compensating people for a promotional activity, they need to make that clear to Google with a ‘nofollow’ link. Here’s an example of a blogger who is doing it right, with a disclaimer saying she was compensated with a gift card, to keep the FTC happy, and a ‘nofollow’ link with the yellow highlight from our plug-in, indicating that the link properly warns Google not to pass page rank, or whatever they’re calling ‘Google mojo’ these days, through to the destination. The other links on the page don’t show up color-coded one way or the other, because they go to domains we don’t care about.

A green highlight, indicating that Google thinks we want it to pass page rank, would be a problem. If you don’t like the idea of green being a problem, the default yellow and green colors are configurable to whatever you want.

If your promotions people are working with a blogger who forgets to do that, or misspells ‘nofollow,’ or anything along those lines, it’s on you to get that cleaned up in a hurry. It’s suboptimal to have to ‘view-source’ on every page or write your own crawler: enter the browser-based ‘no-follow’ extension. There have been a few of these for different browsers over the years, but none did exactly what we wanted, so we rolled our own.

The features of ours that we like are:

  • Configurable list of domains whose reputation you are trying to defend.
  • Click-button activation/deactivation on pages, which is persistent.
  • Aggressive defense against misspellings, bad formatting, etc.

The configurable list is important, because if you’re looking over a page that links to one of your sites, and it links to several other sites, it’s best if you don’t have to puzzle over which links you care about.

The persistent flagging of pages that you care about is important, because if you’re engaged in a promotional activity with a site, odds are someone at your company is going to be that site from time to time. Tell your colleague to enable the plugin and to be on the lookout for green links, and you’ve got a visual cue that’s hard to miss, for problems that might arise.

The defense against misspellings, special characters, and the like, is for this scenario. Brian, head of Wayfair SEO: “Hey Bob, did you put ‘nofollow’ on those links?” Bob: “Yup”. Brian: “kthxbye”. But in fact, although Bob is telling the truth, he actually put smart quotes, rather than ascii quotes, around ‘nofollow,’ so Google will not recognize the instruction. It’s funny: browsers do a great job of supplying missing closing tags, guessing common spelling mistakes, etc., because their mission in life is to paint the page, regardless of the foibles and carelessness of web page authors. Nationwide proofreaders’ strike? No problem, browsers will more or less read your mind! But Google’s mission in life is to crawl the web and pass page rank, so you have to tell it very clearly not to do that.

And of course, when all your links are clean, you can have a luau party in your Tiki hut, like our SEO team:
SEO tiki hut luau

TechJam 2015

Come hang out with us tomorrow, June 11th, from 4 pm to 9 pm, at TechJam. Not to be too transparent, but we’re hiring! We will be at booth 43, bostontechjam@wayfair.com, #btj2015. Steve Conine, Wayfair Founder and CTO, and I will be there, along with a bunch of our colleagues in Wayfair engineering.

We will have a Money Booth where people can enter by checking in on Facebook or tagging Wayfair in a picture on Instagram/Twitter, #wayfairbtj2015. The person who grabs the most money will win that amount in Wayfair Bucks.

We will also have some Wayfair swag at the booth.

vim emacs talk by Aaron Bieber

I can’t believe I’m writing a post about vim and emacs in the year 2015! But our very own Aaron Bieber just spoke at the vim meetup on how he’s been secretly using emacs all the time for a few months, and is now coming out of the closet as an emacs user. Vim vs. emacs is an eternal holy war, and pretty much the opposite of a topic that I would normally want to write about. But Aaron is the opposite of a holy warrior, as anyone at Wayfair Engineering can tell you. Here’s the announcement of the talk: http://www.meetup.com/The-Boston-Vim-Meetup/events/222395931/, here’s his personal blog post on the topic: http://blog.aaronbieber.com/blog/2015/01/11/learning-to-love-emacs/, and here’s the video: https://www.youtube.com/watch?v=JWD1Fpdd4Pc, with cool jazz!

Evil mode is what makes this possible, of course. I used to be Aaron’s manager, and as someone who has used his .vimrc / .vim-folder setup, after watching this talk I’m at least going to try his .emacs file, as an adjunct to my crusty old pseudo-Python-IDE thing. As an engineering manager, the key comment to me was the thing about how ctags (for tab completion) works just as well in both environments. As long as people are using something that helps them save time that would otherwise be spent on meaningless drudgery, to each his own!

Announcing Tungstenjs

Matt DeGennaro and Andrew Rota of our Javascript team recently spoke at the BostonJS meetup on a library we have written called Tungstenjs, which we have opensourced today. It takes the fast-virtual-dom-update idea from React.js and makes it usable with other frameworks, including Backbone.js. It ships with a Backbone adapter. There’s a server-side component too, using npm and Mustache templating, but perhaps I should just let the ‘readme’ tell you: https://github.com/wayfair/tungstenjs. We’ve been using it on Wayfair, and it’s awesome.

Stackdive: the evolution of Wayfair’s stack

Jack Wood and I, CIO and Chief Architect of Wayfair, spoke at Stackdive, at Wayfair’s on April 23. Here’s the matching blog post we published on the Stackdive site, now crossposted here.

Jack and I are both long-time software guys who now spend somewhat less of our time thinking about what to build, and somewhat more about how to keep a large number of systems running well. The emergent DevOps culture of the last few years has made it easy for people like us to move between these worlds. Are they really separate worlds anymore? In 2009 John Allspaw and Paul Hammond, heads at the time of ops and development, respectively, at Flickr, gave an influential talk at Velocity called “10+ deploys per day: Dev and Ops Cooperation at Flickr”. It’s about their deploy tool and IRC bots, sure, but it’s also about culture, and especially how to get dev and ops team out of adversarial habits and into a productive state, with a combination of good manners and proper tooling. We have cribbed quite a bit from that family of techniques since 2010, but Wayfair has had an evolution of stack and culture that’s distinctive, and we’re going to try to give a close-up picture of it.

Let’s start with a brief overview of the customer-facing Wayfair stack overs the year since our founding. We’re going to stick to the customer-facing systems, because although there is a lot more to Wayfair tech than what you see here, we’re afraid this will turn into a book if we don’t bound it somehow.

Early 2002 (founding, in a back room of Steve Conine’s house): Yahoo Shopping for a bout 5 minutes, then ASP + Microsoft Access.steveinthe00s


Late 2002: ASP + SQL Server

The middle years: Forget about the tech stack for a while, add VSS (Windows source control) relatively early.  Hosting goes from Yahoo shopping, to Hostway shared, to Hostway dedicated, to Savvis managed, to a Savvis cage (now CenturyLink) in Waltham.  Programmers focus on building a custom RDBMS-backed platform supporting the business up to ~$300M in sales…

2010: Add Solr for search because database-backed search was slow, small experiments with Hadoop, Netezza, Seattle data center.  Yes, you read that right.  2010.  8. years. later.  That’s what happens when you’re serious about the whole lean-startup/focus-on-the-business/minimum-viable-stack thing.  But now we’ve broken the seal.

2011-2012: Add PHP on FreeBSD, Memcached, MongoDB, Redis, MySQL, .NET and PHP services, Hadoop and Netezza powering site features, RabbitMQ, Zookeeper, Storm, Celery, Subversion everywhere and git experiments, ReviewBoard, Jenkins, deploy tool.  Whoah!  What happened there!?  More on that below.

2013: Serve the last customer-facing ASP request, serve west-coast traffic from Seattle data center on a constant basis, start switching from FreeBSD to CentOS, put git into widespread use.

2014: Vertica.

2015: Dublin data center.

SiteSpect has been our A/B testing framework from very early on, and at this point we have in-line, on-premise SiteSpect boxes between the perimeter and the php-fpm servers in all three data centers.

Here’s a boxes-and-arrows diagram of what’s in our data centers, with a ‘Waltham only’ label on the things that aren’t deplicated, except for disaster recovery purposes.  Strongview is an email system that we’re not going to discuss in depth here, but it’s an important part of our system.  HP Tableau is a dashboard thing that is pointed at the OLAP-style SQL Servers and Vertica.



Why did we move from ASP to PHP, and why not to .NET?  That’s one of the most fascinating things about the whole sequence.  Classic ASP worked for us for years, enabling casual coders, as well as some hard-core programmers who weren’t very finicky about the elegance of the tools they were using, to be responsive to customer needs and competitor attacks.  Of course there was a huge pile of spaghetti code: we’ll happily buy a round of drinks for the people with elegant architectures who went out of business in the mean time!

But by 2008 or so, classic ASP had started to look like an untenable platform, not least because we had to make special rules for certain sensitive files: we could only push them at low-traffic times of day, which was becoming a serious impediment to sound operations and developer productivity. Microsoft was pushing ASP.NET as an alternative, and on its face that is a natural progression. We gave it a try. We found it to be very unnatural for us, a near-total change in tech culture, in the opposite direction from where we wanted to go: expensive tools, laborious build process, no meaningful improvement in the complexity of calling library code from scripts, etc., etc. We eventually found our way to PHP, which, like ASP, allowed web application developers to do keep-it-simple-stupid problem solving, but to rationalize caching and move our code deployment and configuration management into a good place.  In the early days of Wayfair, when there was not even version control, coders would ftp ASP scripts to production.  That’s a simple model that has a fire-and-forget feel to an application developer that is very pleasant.  Something goes wrong?  Redeploy the old version, work up a fix, and fire again, with no server lifecycle management to worry about.  It is a lot easier to write a functional tool to deploy code, when you don’t have to do a lot of server lifecycle management, as you do with Java, .NET, in fact most platforms.  So we got that working for PHP on FreeBSD, but soon applied it to ASP, Solr, Python and .NET services, and SQL Server stored procedures or ‘sprocs’.  Obviously, by that point we had had to figure out server lifecycles after all, but it’s hard to overstate the importance of the ease of getting to a simple system that worked. The operational aspects of php-fpm were a great help in that area.  The core of Wayfair application development is what it has always been: pushing code and data structures independently, and pushing small pieces of code independently from the rest of the code base.  It’s just that we’re now doing it on a gradually expanding farm of more than a thousand servers, that spans three data centers in Massachusetts, Washington state, and Ireland.

It’s funny.  Microservices are all the rage right now, and I was speaking with a microservices luminary a couple of weeks ago.  I described how we deploy PHP files to our services layer, either by themselves, or in combination with complementary front-end code, data structures, etc., and theorized that as soon as I had a glorified ftp of all that working in 2011, I had a microservices architecture.  He said something like, “Well, if you can deploy one service independently of the others, I guess that’s right.”   Still, I wouldn’t actually call what we have a micro-services architecture, or an SOA, even though we have quite a few services now.  On the other hand, there’s too much going on in that diagram for it to be called a simple RDMS-backed web site.  So what is it?  When I need a soundbite on the stack, I usually say, “PHP + databases and continuous deployment of the broadly Facebook/Etsy type.  With couches.”  So that’s a thing now.

Let’s dig in on continuous deployment, and our deploy tool.  Here’s a chart of all the deploys for the last week, one bar per day, screenshot from our engineering blog’s chrome:


Between 110 and 210 per day, Monday to Friday, stepping down through the week, and then a handful of fixes on the weekend.  What do those numbers really mean in the life of a developer?  There’s actually some aggregation behind the numbers in this simple histogram.  We group individual git changesets into batches, and then test them and deploy them, with zero downtime.  The metric in the histogram is changesets, not ‘deploys’.  Individual changesets can be deployed one by one, but there’s usually so much going on that the batching helps a lot.  Database changes are deployed through the same tool, although never batched.  The implementation of what ‘deploy’ means is very different for a php/css/js file on the one side, and a sproc on the other, but the developer’s interface to it is identical.  Most deploys are pretty simple, but once in a while, to do the zero downtime thing for a big change, a developer might have to make a plan to do a few deploys, along the lines of: (1) change database adding new structures, (2) deploy backwards-compatible code, (3) make sure we really won’t have to go back, (4) clean up tech debt, (5) remove backwards-compatible database constructs.  From the point of view of engineering management, the important thing is to allow the development team to go about their business with occasional check-ins and assistance from DBAs, rather than a gating function.

Memcached and Redis are half-way house caches and storage for simple and complex data structures, respectively, but what about MongoDB and MySQL?  Great question.  In 2010 we launched a brand new home-goods flash-sale business called Jossandmain.com.  We outsourced development at first, and the business was a big success.  We went live with an in-house version a year later, in November, 2011.  Working with key-value stores that have sophisticated blob-o-document storage-and-retrieval capabilities has been a fun thing for developers for a while now.  It had the freshness of new things to us in 2011.  There were no helpful DAOs in our pile of RDBMS-backed spaghetti code at the time, so the devs were in the classic mode of having to think about storage way too often.  Working with medium-sized data structures (a complex product, an ‘event’, etc.) that we could quickly stash in a highly available data store felt like a big productivity gain for the 4-person that built that site in a few months.  So why didn’t we switch the whole kit and caboodle over to this productivity-enhancing stack?  First of all, we’re not twitchy that way.  But secondly, and most importantly, what sped up new feature development had an irritating tendency to slow down analysis.  And those document/k-v databases definitely slow you down for analysis, unless you have a large number of analysts with exotic ETL and programming skills.   We love how MongoDB has worked out for our flash sale site, but as we extrapolated the idea of adopting it across the sites that use our big catalogue, we foresaw quagmire.  By 2011, we were a large enough business that a big hit to analyst productivity was much worse than a small cramp in the developers’ style.

Around the same time, we began to experiment with moving some data that had been in SQL Server into MySQL and MySQL Cluster.  The idea was to cut licensing cost and remove some of the cruft that has accumulated in our sproc layer.  We have since backed off that idea, because after a little experimentation it began to seem like a fool’s errand.  We would have been moving to a database with worse join algorithm implementations and a more limited sproc language, which in practice would have meant porting all our sprocs to application code, a huge exercise of zero obvious benefit to our customers.  Since the sprocs are already part of the deployment system, the only compensation besides licensing cost would have been increased uniformity of production operations, which would have been standardized on Linux, but in the end we did not like that trade-off.

Wow! Stored procedures along with application code, colo instead of cloud.  We’re really checking all the boxes for Luddite internet companies, aren’t we!?  I can’t tell you how many times I’ve gotten a gobsmacked look and a load of snark from punks at meetups who basically ask me how we can still be in business with those choices.

Let’s take these questions one at a time.  First, the sprocs.  When we say sproc, of course, we mean any kind of code that lives in the DBMS.  In SQL server, these can be stored procedures, triggers, or functions.  We also have .NET assemblies, which you can call like a function from inside T-SQL.  Who among us programmers does not have a visceral horror for these things?  I know I do.  The coding languages (T-SQL, PL/SQL and their ilk) are unpleasant to read and write, and in many shops, the deployment procedures can be usefully compared to throwing a Molotov cocktail over a barbed-wire fence, to where an angry platoon of DBAs will attempt to minimize the damage it might do.  Not that we don’t have deployment problems with sprocs once in a while, but they’re deploy-tool-enabled pieces of code like anything else, and the programmers are empowered to push them.

Secondly, the cloud.  If we were starting Wayfair today, would we start it on AWS or GCP?  Probably.  But Wayfair grew to be a large business before the cloud was a thing.  Our traffic can be spiky, but not as bad as sports sites or the internet as a whole.  We need to do some planning to turn servers on ahead of anticipated growth, particularly around the holiday season, but it’s typically an early month of the new year when our average traffic is above the peak for the holidays of the previous year, so we don’t think we’re missing a big cost-control opportunity there.  Cloud operations certainly speed up one’s ability to turn new things on quickly, but the large-scale operations who make that economical typically have to assign a team to write code that turns things *off*.  One way or the other, nobody avoids spending some mindshare to nanny the servers, unless they don’t care about the unit economics of their business.  In early startup mode, that’s often fine.  Where we are?  Meh.  It’s a problem, among many others, that we throw smart people at.  We think our team is pretty efficient, and we know they’re good at what they do.  What is the Wayfair ‘cloud’, which is to say that thing that allows our developers to have the servers they need, more or less when they need them?  It looks something like this:


We’re afraid of vendor lock-in, of course, with some of the hardware, which we mostly buy and don’t rent:


But the gentleman on the right makes sure we get good deals.

That’s it for now.  Another day, we’ll dig in on the back end for all this.