Wayfair’s 3D Model API

When we started Wayfair Next – Wayfair’s R&D group that explores next generation experiences, we spoke to a lot of AR and VR application developers who were interested in creating experiences for us. In line with our engineering culture, we wanted to create the experiences ourselves and figure out how to help our customers better visualize our products. Home decor lends itself very well to the application of AR and VR technology. As we ventured further into AR and VR territory, we realized that visualization technology was not the barrier. The bottleneck was the dearth of 3D content out there, especially in the E-commerce V-commerce space.

In January 2016, we were beginning to digitize our catalog and create 3D models of our products to be used for photographic rendering. More on that later. While developing WayfairView, our AR application for Google’s Tango platform, we started laying down the foundation for a 3D content pipeline to create app-ready real-scale 3D models of our products.

We saw an opportunity here – why not empower other application developers to create amazing 3D V-commerce experiences in AR and VR just like we are, and push the envelope of mixed reality!

Real estate developers are beginning to use VR experiences to showcase spaces to their clients and customers. Why not decorate the space with real products!

Game developers can begin to embed 3D models of real furniture products in their games!

3D interior design applications that assist with room planning and creating inspirational content can use and offer Wayfair products!

Is this beginning to sound like an affiliate model for product sales via partners… but in mixed reality apps and 3D games? That’s exactly what it could be – Developers can choose to redirect users to Wayfair and earn money when product purchases originated from their apps.

So lo and behold, we’re releasing our first public API that offers 3D models of our products for developers, so that you as developers can spend less time worrying about content and more time enhancing your app’s experience.

There are two ways to interact with our API, using standard HTTP GET requests or using GraphQL. If you haven’t heard of GraphQL, it’s Facebook’s internally developed interface to their data, which they have opensourced for general use, with implementations in a number of languages. It’s a very fast on-ramp for React.js-based apps, but we have found it quite useful at Wayfair, along with its ecosystem of tooling, even though we don’t use React. You as a developer can use Wayfair’s instance of GraphiQL to get information about our 3D ready products.

What kind of 3D models you ask? For now we’re offering models in the universally supported Autodesk FBX format. If you’re a Unity developer, we also offer platform specific Unity asset bundles.

Below are some examples of data users will be able to retrieve from the API –

Information about all dining chairs that are 3D ready –

  "pdp": "https://www.wayfair.com/Daniel-Tufted-Side-Chair-HOHN2381-HOHN2381.html?refid=3DAPI.2"
  "sku": "HOHN2563",
  "product_name": "Maza Parsons Chair",
  "sale_price": 153.99,
  "product_description": "A dining chair ...",
  "image_url": "https://secure.img2.wfrcdn.com/lf/43/hash/37313/28458225/1/custom_image.jpg",
  "class_name": "Dining Chairs",
  "class_id": 168,
  "pdp": "https://www.wayfair.com/Parsons-Chair-HOHN2563-HOHN2563.html?refid=3DAPI.2"
  "sku": "HOHN3044",

List of 3D models for the Maza Parsons Chair –

  "HOHN2563": {
    "viewer": "https://www.wayfair.com/v/api/model_viewer...",
    "android": "http://img.wfrcdn.com/docresources/37313/40/408920.android",
    "web": "http://img.wfrcdn.com/docresources/37313/44/441369.web",
    "fbx": "http://img.wfrcdn.com/docresources/37313/44/444894.fbx"

Sample GraphQL query for fetching 3D models for the Maza Parsons Chair –

  product(sku: "HOHN2563") {
    three_d_models(page: 2, user: "xxx", api_key: "xxx") {


Screenshot of the 3D model of the a web viewer we built to view the 3D models of our products

Screenshot of the 3D model of the Maza Parsons Chair being displayed in a web viewer

At the time of writing this post, we have ~10,000 products, distributed across all our categories, that are 3D ready. We are hard at work applying our engineering chops to scale our 3D catalog. We’ll talk about how we’re doing that in another post.

Meanwhile, if you would like to start using this API, e-mail us at Next3DApi@wayfair.com with a short description of how you would like use our 3D models, and we’ll hook you up with an API key and documentation on how to use the API.

We would love to hear from you!

Happy developing!

If you’re local, and this sounds exciting, we would highly recommend checking out the Reality, Virtually Hackathon hosted by the MIT Media Lab from Oct 7 – 10, 2016. We’re going to be giving participants beta access to the API for the weekend and are also conducting a workshop on how to use the API.

Testing PHP7

PHP7 is out. This isn’t news. It’s been out since last December, with nine minor revisions since then. What’s new is that it’s serving all of Wayfair’s customer-facing traffic.

Performance-wise, PHP7 is the rocket ship people said it would be. We’re nothing but pleased. If you can upgrade, you should do it yesterday. With barely any changes to our code, pages render in about half the time. CPU utilization dropped by about 30%.

Part of what made us nothing but pleased was that we haven’t had to roll anything back after flipping the switch. For context, Wayfair’s PHP presence is 3.5M LoC over 28K files. Coding conventions span several versions of PHP. There’s a similarly diverse range of third-party packages (composer and otherwise). We also use 66 PHP extensions – that is, C/C++ – including some forked third-party and some totally custom code. We serve 30M+ unique visitors a month from all around the world.

In a way, upgrading to PHP7 was less about the language and more about testing a complex codebase. Testing is a tricky thing with dynamically-typed languages. At the risk of stating the obvious, this means that every error will be a runtime error. This is great for fast development, but makes it difficult to ensure code “correctness.”

Here’s a brief guide to upgrading, as we experienced it. YMMV.


A first step was to get a lay of the land. Use PHP built-ins (phpinfo() or `php -i`) to find out which extensions you rely on and how they’re configured. Check out the official migration guide and the goPHP7 notes on extension compatibility.

With a decent mental map, you can grep your way to a punchlist of compatibility fixes. For example, knowing that the ereg extension has been dropped, apply /\b(ereg_replace|ereg|eregi_replace|eregi|split|spliti|sql_regcase)(/ to your codebase.

Static Analysis

There are a few ways you can check your code without running it. The above grep example is, technically speaking, a form of static analysis. You can use `php -l` to check basic syntax. php7mar will pair that check with a group of others (mostly regex-based) for common migration gotchas.

If you want to be more robust, Phan and HHVM perform true static analysis on your code. (The Phan readme lists examples of what this includes.) It’s a tempting concept – marrying the rigor of strongly-typed languages with the agility of PHP – but it can be imperfect in practice. We used HHVM with PHP5.6 and found it difficult to maintain alongside our build process. We tested Phan and had some concerns about false positives and other noise on our aging codebase. That’s not to say either tool won’t have value for you, but that they require some careful evaluation.

Automated Testing

Automated testing comes in many guises: unit tests, integration tests, browser tests. There are no universal truths in this world. So much depends on how your team works. But codified, reproducible tests can reveal subtle regressions that could impact your customers.

Replay Testing

Use web logs and other information you already have about how people use your product. No unit test coverage? No problem! Take a day’s worth of URLs and curl a test server. Naturally, that means you’ll need to be able to detect when things go wrong.


Does your application already have a safety net? Logging and alerting are invaluable when preparing for large systems changes. (Really, they’re invaluable, period. If your application doesn’t have monitoring in place, stop reading this right now and set it up. Trees can fall in the woods without making a sound.)

Stress Testing

Before you roll out to real traffic, try hammering a test server with ApacheBench. You might have an extension with memory allocation issues that aren’t revealed until you apply load in parallel.


Writing custom extensions is niche activity, but one that presents a significant testing burden. The internals of PHP have changed a bit – the slimmer data structures behind PHP variables are actually responsible for much of the speed gains in PHP7 – though there is code out there to help with some of the changes. A strong suite of .phpt tests, run with Valgrind, is invaluable. Code coverage tools, like gcov, can help validate the breadth and depth of your tests. A sandboxed development environment, like the php7dev vagrant box, can streamline your process.

Even if you’re not rolling your own, you may rely on pecl or other third-party extensions. You may need to isolate test cases to help guide others. Basic comfort with capturing and examining core dumps can go a long way here.

The Community

If you encounter issues, don’t forget about the existing PHP bug tracker. It’s a big community, so it’s not unlikely that someone else has already had your problem. Follow maillists. There’s a whole suite for PHP core. Third-party extensions and libraries may have their own. Familiarize yourself with the GitHub repositories for any third-party code. PHP7 is still relatively new. A bug fix you need might still be in an stable release.

Above all, remember that open source projects work when you give back to them. Share findings on maillists, file bug reports, submit patches for improvements.

When You’re Done

A big question is how to determine when you’ve testing something just enough. No method is totally comprehensive. Diminishing returns are a fact of life. We didn’t do anything particularly special in this department. We kept good notes, recorded questions as they arose, and ensured low-friction communication within the working group. At a certain point, you – the group you – will feel comfortable enough with the state of things to move to the next step: asking manual testers to have a look, beginning stress testing, exercising your deploy scripts, and, finally, sharing your work with the world.

Wayfair Engineering FAQ, a conversation with the Chief Architect

Q1: What’s the tech stack in a nutshell?
A: PHP on Linux, and a few data backends, with continuous deployment at the pace of ~250 zero-downtime code pushes a day.

Q2: Whoah, that’s a lot of code pushes! Break that down for me.
A: What I’m actually counting there is git changesets that are going to some type of production system. We group them in batches, so merges and testing don’t get too hairy. The rumors that we’re so cowboy we only test in production are an exaggeration.

Q3: What’s the main idea of Wayfair Engineering?
A: Stay fast while growing to enormous size.

Q4: That sounds cool, but how do you measure ‘enormous size’?
A: Anything you can think to measure: revenue, active customers, daily visitors, site traffic, terabytes of data, engineering headcount, warehouse square footage, linear miles traveled by our packages being delivered, square miles covered by our proprietary transportation network, number of products we sell. The numbers are out there every quarter in our reports, and in the independent venues: however you count, we’re huge. I’d give you the actual numbers today, but I want to be able to leave this page up for a while, and at the rate we’re growing, snapshot numbers get stale fast.

Q5: How do you measure ‘fast’?
A: Two things. First and foremost, speed to market for new business ideas and features. We also look at performance metrics: web page load times, lead times for delivery, etc., etc. We measure everything.

Q6: How’s the relationship between tech and the business?
A: It’s very good, and it’s one of mutual respect and cooperation. It’s a founder-led business. Steve Conine (tech) and Niraj Shah (business), co-founders/owners, provide a good model of that kind of relationship for the rest of the company, and we take our cue from them. Steve is a very business-minded tech entrepreneur who’s also a phenomenal programmer, and Niraj is an unusually tech-savvy business person. They were both engineering majors at Cornell. There’s less of a divide there in the first place than at many companies. The core of Wayfair is an innovative, completely custom e-commerce platform, and management has consistently described that to the outside world as being a big part of the equity value of the company. On the tech side, let me ask people out there: how valuable would it be to you, to have a deep reserve of confidence that you’re working at a well managed business, where your engineering efforts aren’t going to be wasted on some improperly vetted hunch that will send the balance sheet into the red for no good reason? Combine that analytical sense and good judgment with Wayfair’s characteristic aggressive business innovation, and we all feel pretty good about working together. The home goods niche of the economy isn’t going to go on line all by itself: we’re going to need to make it easy for people, and that’s going to come from a strenuous, combined effort by all parts of Wayfair.

Q7: The home goods niche? You’re taking credit for a whole sector of the retail economy going on line?
A: Of course it’s not *all* us, but we do account for a hefty percentage of every dollar that goes on line for the purchase of home goods.

Q8: ‘Innovative, completely custom e-commerce platform,’ you say? Do you want to elaborate?
A: It’s not that we never buy third-party software, but we have a strong bias towards ‘build’, in build-vs-buy discussions, and that has only become more pronounced over time. At the core, there’s no third-party platform like DemandWare or Magento, just an evolving set of data models and architectural principles, a lot of code, and some great components that our developers know what to do with. We’re very patient with the early stages of DIY efforts that aren’t necessarily up to industry standards at first, if we think we can gain a sustainable advantage over time. Most recently, we’ve insourced some big parts of our marketing tech stack, that we formerly used outside vendors or commercial software for. It’s satisfying when we can leave behind vendor-based point solutions to individual problems, and stand up a new part of the living, breathing, integrated whole, which allows us to take advantage of everything our platform has to offer.

Q9: What languages do you write code in?
A: PHP and Javascript are the bread and butter, including our opensource Tungstenjs framework. We have also written important things in Python and C#, and some key components in Java. Objective C for iOS mobile apps, Java for Android, and some emerging language platforms for VR and AR. We use Puppet for configuration management, so there’s a certain amount of Ruby hacking as well, and a lot of systems scripting with Python. Once in a while we write some C or C++, for optimized numerical work, PHP extensions, and opensource infrastructure like Twitter’s Twemproxy (patches) and Statsdcc (from scratch, inspired by things in Node.js and other languages).

Q10: So you opensource code. Where can I find that?
A: Yes, we do that all the time, most of it on https://github.com/wayfair. Check it out!

Q11: VR and AR?
A: Virtual and augmented reality. We’ve got a lot going on in that space, particularly in a small department we call Wayfair Next that Steve Conine is leading. Right now the biggest push is to model the catalogue. From this we get excellent 2D imagery for the site, and next-gen experiences on things like the Google Tango AR devices that are becoming available to the general public in September. If you have a dev kit, check out the ‘WayfairView’ app. Big picture: we want to make it easier and easier to buy, say, an easy chair from your couch. VR/AR is going to be a big part of that.

Q12: What are your data platforms?
A: We’re proud of how far we got as a business, from our founding in 2002 until 2010, on a keep-it-simple-stupid or KISS architecture of relational-database-backed web scripting. SQL Server was and still is our core for OLTP, and it allows us to plug new tools into our integrated operational and analytical infrastructure very quickly. But to drive innovative customer experiences, we now rely on Solr-backed search and browse, and Redis/Memcache for fast access to complex data structures and ordinary caching. We have a modern, on-premise big data infrastructure consisting of Hadoop and Vertica clusters, and some specialized, vertically-scaled big-memory and GPU machines, for analytical workloads. We do our machine learning, statistical analysis and other types of computation on that setup, and funnel the results to the ‘Storefront,’ as we call it, and to the operational business systems. RabbitMQ and Kafka provide a kind of circulatory system for the data, and they are gradually replacing what traditional ETL we have. As I speak with other architects and CTOs around the industry, I actually think it’s pretty rare, at the biggest and most successful companies, to junk your relational databases, even when you’re many years into adopting these next-generation auto-sharding platforms. We’re fine with that.

Q13: OK, so with all these relational databases, do you use an ORM?
A: There’s a joke around Wayfair Engineering, that if you use the word ‘ORM’ in a positive way, you might notice a sudden drop in your career prospects. Joking aside, I do think excessive reliance on ORMs tends to foster careless data access code, and excessive round trips to the back end. We mostly use the ‘phrasebook’ pattern and hand-make our data access layer. Besides, it’s not as if an easily generated mock would really help you: by the time you’ve horizontally partitioned your data to the extent we have, ORMs are close to useless. We try to make it easy for everybody to develop against a readily accessible development database infrastructure. On the other hand, there is actually a bit of Hibernate, SQLAlchemy and both the Entity Framework and nHibernate in the Java, Python and C#, respectively. ORMs on language platforms like those can have some engineering benefits in addition to the convenience features, such as connection pooling, caching of various kinds, etc. None of that works, or at least works well, in PHP, so we just use PDO like the rest of the PHP world, and we’re experimenting with SQL Relay for some other kinds of optimization and encapsulation of the details of how we talk to the databases. At the higher levels, we have some pretty handy traits, which are the multiple inheritance thing in PHP, to inject common functionality into our codebase in a DRY way. No fanaticism, one way or the other.

Q14: What are your thoughts on web services?
A: We have a handful of important web services behind the scenes at Wayfair. Search and browse for products, orders, and some other things, are powered by Solr, which is an opensource, Java-based web service that we have patched a few times for our own needs. Our Python-based customer recommendations and search enhancements, and our C#-based inventory service, deliver a lot of value. There are other examples.

Q15: Do you have any other kinds of services?
A: Good question. Some of the highest-value systems we have are data processing services that ingest data from our messaging platforms (Rabbit, Kafka) and push value-added results to where we can use them to move faster on behalf of customers and suppliers. There are some DIY ones, but most live in the frameworks Celery, Storm or Spark. You could call our caching system a service. It’s a composite Redis/Memcache/consistent-hashing thing with smart proxies everywhere. You’re using regular Redis and Memcache commands, not going through an adapter layer, but that’s true of Elasticache too, so we’re far from eccentric in this way. Unlike in Elasticache, the sharding is taken care of for you. We built it on the back of work by Twitter, Pinterest and Instagram, but we added some innovative elements of our own. It has some similarities with Facebook’s McRouter, which is pretty awesome, and which we might well have chosen instead, if it had Redis support.

Q16: What about micro-services, or SOA?
A: We’re not really into all that, although as I said, we have some pretty awesome services back there. Is your code base, or a big part of it, really a monolith, in any pejorative sense, if it has decent separation of concerns, and you can deploy small modifications to any layer of it without rebuilding the whole thing, and without down time? We’ve had all of that for years. Many of the best big tech companies have largely monolithic code bases, and they’re too busy adding awesome features to the core to want to replatform. But don’t get me wrong: there are some cool micro-service set-ups out there. If we keep developing very valuable *macro* services at the rate we’ve been doing that, we’ll eventually have so many of them that micro-service-style orchestration techniques will start to make sense for us. Our Python services are the most numerous ones we have, and we’re already experimenting with Docker, Mesos and Kubernetes, for them. It’s just that over time I have seen the importance of web services diminishing, as data platforms become easier to scale horizontally, and server-side-of-the-front-end programming becomes easier and more powerful. The data is just too readily available for these layer cakes of http indirection to make any sense in a well-designed, modern setup.

Q17: Why do you like PHP?
A: I’m not sure I *do* like it, but it attracts the right kind of people: neither ivory-tower language snobs, nor hipster code posers. No fanatics, but no luddites either. We have some fun with all that, when we’re trying to make sure the culture stays strong. I tried to depict both sides of the ivory-tower/hipster thing in this picture a few years ago, in a comic-strip-style blog post on our Python ops: I think the tweed jacket combined with the Brooklyn t-shirt really gets the point across. (To MIT professors, and to my former neighbors in Park Slope: I kid because I love!)
eye-rolling sysadmin
With every other language, there are a lot of fanatics who think it’s the answer to every problem, and will wear your ears out explaining why. Even people who love PHP don’t think that about PHP. It’s just a solid platform for web development, the kind the tattooed web ops expert in the picture would think is a fine thing to have running on his servers. There’s no server lifecycle management to worry about, and practical problem-solvers gravitate to it. It’s also very readable, even if you don’t know it well, so let’s all just pause to give it the big thank you it deserves for killing Perl (with a substantial assist from Python, of course, on the systems scripting side). That needed to happen, and in retrospect it’s obvious that neither Java nor .NET had the slightest chance to do it. 80% of the internet runs on PHP, including a bunch of the biggest sites, which we’re rapidly becoming one of. It’ll do.

Q18: So PHP is a cultural thing?
A: Yes. Let me draw an analogy, which I sometimes use in talks for new hires. Do you remember the scene in Star Wars, when Luke Skywalker sees the Millennium Falcon for the first time, and says “What a piece of junk!”? Han Solo responds, “She’ll make point five past lightspeed. She may not look like much, but she’s got it where it counts, kid. I’ve made a lot of special modifications myself.” H/t @danmil for that analogy. Our PHP-and-friends stack might seem inelegant to language snobs, but it takes us where we want to go fast. The Millenium Falcon is still a space ship, after all! Let’s try a ‘car’ analogy: anyone who has ever done the coding equivalent of putting a Porshe engine in a VW Golf, or can show us the chops and attitude for that and wants to try, is welcome at Wayfair Engineering. Adding lambdas to the opensource php_mustache extension, which we did, is a great example of something that fits that mold. If you’re more of an “I won’t drive at all unless I have a Lamborghini” person, you should seek a company more willing to splurge on the shiniest tools, before thinking about whether there is really a need. If your mindset is more “My tank-like SUV keeps me safe, and I don’t care that it handles poorly,” there are plenty of J2EE shops out there.

Q19: OK, I’m getting the picture. Why use the other languages at all?
A: PHP is not a great fit for every programming task. Sometimes you need a long-running daemon that can respond to requests with little startup/wake-up overhead. We have excellent services in Python, C# and Java for that. The C# code grew out of our early-stage Microsoft heritage, but we are now doing some phenomenal things with it, and we have added some very elegant functional programming in F#. Python is our favorite language for data science, machine learning and the like, and it combines low-latency service qualities, the way we run it, with the convenience and productivity of loose typing, and that super-handy mix of the functional and object-oriented styles. Java allows us to tap directly into platform-level infrastructure such as Solr, Elastic Search, Hadoop, Kafka and Storm.

Q20: You’ve been talking about speed, and you mentioned that you measure web performance earlier. Can you give a bit more detail on that?
A: Sure. Web performance measurement is basically a 3-legged stool: RUM, or real user monitoring; synthetic monitoring, which is externally-located bots that measure page speed; and server-side execution metrics. We have a centralized performance team that makes sure we have the right tools and dashboards to be proactive about all of that. They also work on framework-level changes that can make a big impact, when those aren’t naturally more specialized with another group. They play a strategic role for us, but that team wouldn’t be very effective if we didn’t have a good culture of thinking about web performance in a broader context of putting a great user experience into the hands, and onto all the devices, of our customers. The RUM instrumentation gives us great insight into what our customers are actually experiencing. I’m not original with this name, but my joke name for that department is the RUM distillery, and you can imagine the joking about operating precise instruments in the right state of mind. We have some cool ‘responsive’ experiences here and there, but the RUM tells us that our decision to emphasize adaptive delivery over responsive design was a good one. Check out Wayfair mobile web, on a small iOS or Android device, and you’ll see what I mean. Our native apps are fast too, but that’s a separate discipline, where server-side execution and expert Java and Objective C programming are the key components.

Q21: Thoughts on the cloud?
A: We run a few elastic workloads on public cloud infrastructure, but that’s a drop in the ocean of Wayfair tech. Don’t get me wrong: if we were starting Wayfair today, we would do it on public cloud infrastructure, for the speed-to-market aspect, for sure. In fact, Wayfair was very briefly a Yahoo! Store in 2002, before Steve built the first version of the platform to run in a colocated cage in a data center. We run colo-style to this day. Wayfair was already a multi-hundred-million-dollar company before the cloud was a thing. We think about it, and do some analysis and experiments periodically. But ultimately our traffic is not extremely spiky, and we grow into the holiday spike provisioning pretty early the following year. The economics, control and convenience have not yet aligned to make it worthwhile to go through a big process of switching. We’re not big enough to have whole data centers, at least not yet, but we have our /22 ARIN range, and we use the border gateway protocol to make sure we have the kind of relationship with our ISPs where we have a lot more control than when we were smaller. Wrestling with these types of configurations is interesting work, and it attracts really good network and systems people. Let’s face it: the public cloud is awesome, but when the problem is under the hood of the hypervisor, you’re in for a frustrating day at the office. We do a lot of virtualization, and we like it, but when various types of systems become very cookie-cutter or have certain types of requirements, we run physical boxes. Virtualization adds overhead, and it’s one more thing that can break. If you can provision basic types ahead of demand, the IAAS side of the cloud becomes just another provider, and of course the higher-level services are fraught with problems of vendor lock-in. The way cloud adoption presents itself to new or small companies, it’s kind of ironic that we’re moving too fast to be bothered with moving to the cloud. But never say never.

Q22: OK, sign me up. How do you succeed in Wayfair Engineering?
A: It’s hard to answer that question without using some cliches, but I’ll try to use the ones that are characteristic and relevant. Programmers with the polyglot, DevOps-savvy innovator background tend to do really well here. Boyscout principle for refactoring, rather than a penchant for from-scratch rewrites. Bias for action: if you’re not embarrassed by the first version, you waited too long to ship it. Just ship! If you find yourself tempted by a months-long science project, don’t do it. Instead, fast-follow/adopt something that’s already here in the general area (whether we wrote it or it’s open source from outside), and innovate at the margins for now. But when you see a quick win that you think is on a path to a real breakthrough, pounce.

Boston events during the week of 25 July 2016, on augmented/virtual reality, data science and PHP

There are three events in Boston this week where Wayfair engineers will be speaking.

On Monday, July 25th, I am on a panel called “What’s Hot in Tech: Providing Next Gen Experiences,” which is part of the MITX E-Commerce Summit 2016, being held all day, from 8:30 to 5, at Google in Cambridge. Details here: http://mitxecommerce.org. Fair warning: paid admission. I’ll be speaking about Wayfair Next and our augmented and virtual reality applications. The panel is from 1:45 to 2:30 in the afternoon (details here: http://mitxecommerce.org/session/tech/), but Shrenik Sadalgi of Wayfair Next and I will be there all day, demo-ing next-gen technology, including our Tango app WayfairView, which is already in the Tango app store in Google Play, and will be available to the general public when the Lenovo Tango devices hit the market in September.

On Tuesday, July 26th, at 7:30 pm, Robby Grodin of our Marketing Engineering group is speaking on data science at General Assembly in Boston: https://generalassemb.ly/education/data-science-lets-break-it-down/boston/27296.

On Wednesday, July 27th, it’s open mike night at the Boston PHP Meetup, hosted at Wayfair. Pizza at 6, talks from 7-8:30, Q&A, wrap-up, out for beers? after that. Adam Baratz, George Carrette and I will get the ball rolling with some material about PHP 7 and a couple of other things. If you’ve been wondering why we haven’t been blogging about PHP7, it’s because we’re just rolling it out now, after wrestling with some interesting APC issues. Details on Wednesday.


Statsdcc is a Statsd-compatible high performance multi-threaded network daemon written in C++. It aggregates stats and sends the results to backends, especially Graphite. We are proud to announce that we are opensourcing it today. Check out the code at https://github.com/wayfair/statsdcc.

At Wayfair we’re big believers in “measure anything, measure everything,” as the “Statsd is reborn in node.js at Etsy” announcement put it. We do application performance monitoring with the opensource tools Graphite, Grafana, the ELK stack (Elastic Search/Logstash/Kibana), and some homegrown tools. Until recently we had been using Flickr/Etsy’s 2nd-generation, node.js-based Statsd to collect metrics for Graphite. As the volume of these metrics increased, we noticed inconsistencies in the data, and realized that some metrics were being dropped. Long story short, we tried some architectural changes, scaling Statsd and Carbon horizontally (details below), but as the operational complexity of that increased, we began to wonder why we needed so many boxes. We found a bottleneck in the way Statsd buffers and flushes data to Carbon, and we decided we needed a different version.


There are already quite a few alternative Statsd implementations available, but none of them really came close to meeting all of our needs. Brubeck by github is one that we found interesting, because it promised high throughput. Unfortunately, it was released after we had Statsdcc implemented and were ready to put it into production. At that point, we had no reason to take Brubeck and extend it to support the features we needed. However, we borrowed the idea of integrating a webserver to view application health from BrubeckStatsdcc and Brubeck try to solve similar problems. I would recommend checking out all these implementations and picking the one that best fits your needs.


If you’re interested in what we tried before starting to hack the C++, read on.

Attempts at Horizontal Scaling with Statsd:

Statsd performs aggregations on incoming metrics and sends the aggregates to a Carbon process, which in turn saves received metrics to a Whisper database.

To scale, we use multiple Statsd/Carbon chains. Each chain goes to a different disk.  Proxy daemons hash metric names to determine which chain to use. Which proxy daemon is chosen depends on round robin DNS. Consistent hashing ensures metric names are well balanced.

The diagram below depicts the architecture.



A year ago we noticed that a certain set of metrics were being dropped, resulting in inconsistent monitoring data. We realized that this was due to a maxing out of UDP receive buffers on Statsd. So we tried adding more Statsd processes with increased UDP buffer sizes.

However, adding a new process is complicated. When a new Statsd instance is added, consistent hashing by a reverse proxy will re-route some metrics to the new process, resulting in duplicate files on different Carbon nodes for the same metric – one for the old data and one for the new data. To save space, and for Graphite to show all data, the old Whisper data files should be merged into the new ones.

In the end we were unhappy with how much traffic an individual node could handle. We discovered that the problem was a design decision in Statsd, where the same thread is responsible for both buffering incoming metrics and performing aggregations on them at every flush interval. When computing aggregations, the thread stops listening for incoming metrics, which are stored in the UDP buffer. As the rate of metrics increases, the UDP buffer overflows and drops metrics. We use single-threaded, event-looping frameworks in a few places (Node.js-based daemons for a couple of things, Python-based gunicorn+gevent for several), and we have seen this type of problem before. The event loops don’t help you when you have a blocking IO operation that can bring processing to a halt. Sometimes we work around or solve such problems within the event-loop paradigm, and sometimes we take a completely different approach.

After finding the actual root cause, we decided to rewrite Statsd as a multi-threaded application with a focus on effective use of socket-IO and CPU cycles.


Statsdcc is an alternative implementation for Statsd written in C++ for high performance. In Statsdcc, one or more server threads actively listen for incoming metrics. Server threads distribute incoming metrics among multiple workers using the formula worker = hash(metric name) % #workers. Worker threads read from their dedicated queues and update their ledgers until signaled to flush by a clock thread. Upon receiving this signal, the worker threads hand off their ledgers to short-lived flush threads, and continue with new ledgers until the next signal. To avoid lock contention and to pass metrics faster between server and worker threads, boost’s lock-free queues are used.


We have not gotten rid of consistent hashing as we did not want to lose the ability to scale horizontally. However, to solve the scaling problem in our previous architecture, where adding a new process required cleanup on the Carbon end, we moved consistent hashing from proxies to aggregators. The proxies distribute the incoming metrics among multiple aggregators using the formula aggregator = hash(metric name) % #aggregators. Each aggregator then sends the metric aggregation to its respective Carbon process by using the consistent hash. The difference from the previous architecture is that the Carbon process has more TCP connections open, one with each aggregator. However, unlike Statsd, instead of reopening connection on each flush, Statsdcc reuses established TCP connections, thereby avoiding the overhead of a TCP handshake. The diagram below describes the current architecture.


Statsdcc can handle up to 10 times more load (up to 400,000 metrics/sec) than Etsy’s Statsd. Only one instance of the Statsdcc aggregator handles all our production traffic, in contrast to the previous 12 Statsd instances. Statsdcc has been used in production for about 7 months. We hope more people will find Statsdcc as useful as we have at Wayfair.

Tungsten in the news

There’s a great interview with our own Matt DeGennaro by Paul Krill of Infoworld that came out a few days ago. The topic is Tungsten.js, our awesome framework that ‘lights up’ the DOM with fast, virtual-DOM-based updates, React-style, and can be integrated with Backbone.js and pretty much whatever other framework you want. It’s spiffy, it has a logo,
we do it github-first, and we’re getting a lot of mileage out of it at Wayfair. Matt mentions the templating aspect of our composite system: we use server-side PHP, including Mustache templates, and then our client-side pages, also including Mustache templates as needed, get dynamic updates via Tungsten.js. That works great for us because Mustache has implementations in both Javascript and PHP, among many other languages.

What’s that you say? The PHP implementation of Mustache is not fast enough for you? Well, we’ve got you covered! Adam Baratz just put up a blog post yesterday on a server-side optimization that’s been working well for us. We use John Boehr’s excellent PHP mustache extension, which is written in C++, and is much faster than vanilla PHP Mustache. Inspired by another snippet of PHP/Mustache code, we’ve even added lambdas to that, as Adam explains. I had to do a double-take the first time he explained that to me. As far as I can tell, the PHP community, of all groups of web programmers, is the least likely to care about lambdas in particular, and any kind of functional programming in general. And yet, we’re finding lambdas very useful for our globalization efforts, and we’re starting to use them for other things as well.

We’re still working on a date, but Adam, Matt and Andrew Rota will be giving a talk on all of this at the Boston Web Performance Meetup, hosted at Wayfair, in the near future.

Wayfair Labs in the news

Scott Kirsner has a terrific piece about tech talent wars in Boston, that was in Beta Boston on Friday, and then in the print edition of the Boston Globe on Sunday, October 26th. It features Wayfair Labs, which is our hiring and onboarding program for level 1 engineers in most of the department (a few specialized roles excepted). I am the director of it, so if you have any questions, please reach out.


Rendering Mustache templates with PHP

For the past couple years, Wayfair’s front-end stack has relied heavily on Mustache templates. They’ve let our growing front-end team focus on the front-end. They allow us to share more code between server and client as we push towards a Tungsten-powered future.

Anyone who’s seen a Mustache template knows that they’re pretty simple to write. Rendering them can be another story. We began using a pure PHP implementation. This got us off the ground, but as we expanded our use of templates, we ran into the unfortunate truth that such a library could never be faster than the pure PHP pages we were hoping to replace. Yet, we wanted to make it work, for the mentioned organizational and architectural reasons.

To understand why rendering Mustache in PHP is slow, first you have to understand what goes into rendering Mustache. Consider a canonical template/data example:

Hello {{name}}
You have just won {{value}} dollars!
Well, {{taxed_value}} dollars, after taxes.

  "name": "Chris",
  "value": 10000,
  "taxed_value": 10000 - (10000 * 0.4),
  "in_ca": true

This template must be tokenized, then each token must be rendered. Mustache is simple enough that tokenizing mostly amounts to splitting on curlies. Not a big deal for PHP. The real trick is in replacing {{name}} with the correct content. This is fine when you have a flat set of key/value pairs. Consider this example:


  "inner": 1,
  "outer": {
    "inner": 2

The output should be “12”. When rendering the {{#outer}} section, the renderer must know which “inner” to display. This is typically implemented by turning the data hash into a stack. When entering/exiting sections, data is pushed/popped. To get a value, start at the top and descend until you find a match.

This is an easy operation to describe, but it makes for some slow PHP. It was the major performance bottleneck with the Mustache implementation we first used, and it’s an issue with another popular implementation.

Bearing this in mind, we sought a more radical solution. Enter php-mustache, a C++ implementation of Mustache as a PHP extension. C++ is much better than PHP at traversing stacks. Witness this before/after from when we first deployed php-mustache:

php-mustache performance

This chart shows the render time for the product grid on our browse pages (for example, Beds). It’s a complex mustache template with a lot of data and a lot of nesting. The X-axis is clock time, the Y-axis is render time in milliseconds.

This kind of lift allowed us to justify making Mustache a standard instead of an occasional tool. And, courtesy of the open source world, we didn’t even have to write it. However, it became something of a double-edged sword. As Wayfair operates stores in multiple countries, we have to localize a lot of strings. We started handling this by building all of them in PHP and loading them into the template. This led to some thick code in some cases, which occasionally created friction around using Mustache. The typical i18n solution for Mustache involves lambdas, which unfortunately were not implemented in php-mustache… until now! If you’re a performance-minded Mustache user, we hope you’ll check it out.


When you write your first web application, chances are you’re going to query a database. When you write it in PHP, chances are it’ll look like this:

$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product");
$row = $result->fetch_assoc();

Before long, you have to start handling user input, which means escaping:

$mysqli = new mysqli("example.com", "user", "password", "database");
$result = $mysqli->query("SELECT * FROM product WHERE name = " . mysqli_real_escape_string($mysqli, $product_name));
$row = $result->fetch_assoc();

As your application grows, you start writing code like this a lot. You may start encapsulating it in a DAO, but they do little besides erect walls around this chimeric code. “Okay,” you say. “This is fine, because it’s only me. I’m a Responsible Engineer and I don’t have to sugar-coat things for myself.” But soon, this project is going gangbusters. You’ve got a team, and then a large one, and now there’s no rug large enough under which you can hide this mess. And woe unto you should you decide you need connection pooling or any other resource management.

One solution to this problem is an ORM. But, some people prefer having their database interactions more “managed” than “abstracted away.” Instead your code could look more like this:

$pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
$statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");
$statement->bindParam(":name", $product_name);
$row = $statement->fetch(PDO::FETCH_ASSOC);

A little more verbose, yes, but also easier to read and less error-prone. This is PDO. It’s a PHP extension that provides a vendor-agnostic interface to various relational databases. It pairs a well-structured API for performing queries with a series of different database drivers.

When Wayfair began adopting PDO, our database access was relatively managed. An in-house library managed connections over the course of a request, but building queries involved a whole lot of string concatenation. Complex queries would get unwieldy. Engineers with prior PDO experience wanted to know why we weren’t using it. However, to convince engineers new to PDO that it would make their lives easier, it had be as low friction as the existing library and produce output in the same format.

Simplifying PDO syntax was the easy part. Technically, the example given is shy on error handling. The PDO constructor can throw exceptions. Related functions return a boolean value, indicating whether they succeeded. So a “correct” PDO example would look like this:

$pdo = null;
try {
  $pdo = new PDO("mysql:host=example.com;dbname=database", "user", "password");
} catch (Exception $e) {
  // logging, etc., if you want to note when you were unable to get a connection

$statement = false;  // PDO::prepare() will return false if there’s an error
if ($pdo) {
  $statement = $pdo->prepare("SELECT * FROM product WHERE name = :name");

$row = null;
if ($statement) {
  $statement->bindParam(":name", $product_name);

  if ($statement->execute()) {
    $row = $statement->fetch(PDO::FETCH_ASSOC);

// now, do something with $row

Awesome, I know, right? Sure, the PDO API is, on the whole, “nicer,” but no one’s going to want to deal with it if they’re forced to jump through these kinds of hoops. And who could blame you? At Wayfair, we place a lot of value on developer ergonomics. These are problems we strive to solve well when rolling out new internal tools. We landed on a slight extension to PDO that would yield this syntax:

$statement = PDO::new_statement("PT", "SELECT * FROM product WHERE name = :name"); // the first argument refers to the desired host/database
$statement->bindParam(":name", $product_name);
$row = $statement->fetch(); // PDO::FETCH_ASSOC is now the default fetch style

We pulled all the boilerplate into a factory function. It does the necessary error handling and reporting. If everything succeeds, it’ll return a standard-issue PDO statement object. If there are errors, it will return a special object which acts like a statement that’s failed, but will return empty result sets if asked. We felt comfortable that this would remove most of the friction around using PDO while preserving the underlying interface. Anyone who wants finer-grained control can still utilize the stock API.

The trickier problem was “make output the same.” While PDO looks the same with each driver, the drivers don’t necessarily behave the same. The documentation isn’t always clear about these differences. We needed to do a fair amount of testing and source code reading to suss out the effects.

While my examples have used MySQL, Wayfair is an MSSQL shop. We had been using the mssql extension. It uses a C API called DBLIB to talk to the server. Microsoft doesn’t maintain an open source version. FreeTDS is the commonly-used free implementation. One of the PDO drivers also uses DBLIB, but it returns column data differently. Instead of returning strings as strings and ints as ints, the PDO DBLIB driver returns everything as a string. We had to patch it to use the expected data types. To be able to differentiate between quoting strings as VARCHAR vs. NVARCHAR, we also added a parameter type. We also added support for the setting connection timeouts (PDO defines a PDO::ATTR_TIMEOUT constant, but it has no effect with the DBLIB driver).

Another reason we were first attracted to PDO was for prepared statements. Since MSSQL supports them, it seemed like this could be an opportunity for a performance gain. However, after digging into the driver internals, we found that the DBLIB driver only emulates them. Microsoft has an ODBC driver for Linux. We tested it in conjunction with PDO’s ODBC driver, but found the two to be incompatible. We were able to get it working with the plain odbc extension, but (amazingly) found prepared statements to be slower than regular queries. Since using prepared statements would’ve necessitated a nontrivial change in coding style, we decided against investigating the speed difference.

We’re currently working on deploying SQL Relay. Preliminary tests have proven out that it reduces network load without adding much overhead. It has a PDO driver, so we’ll be able to swap it into our stack without having to change how queries are made.

Tungsten.js: UI Framework with Virtual DOM + Mustache Templates

Performance is top priority here at Wayfair, because improved performance means an improved customer experience. A significant piece of web performance is the time it takes to render, or generate, the markup for a page. Over the last several months we’ve worked hard to improve the render performance on our customer facing sites, and ensure it’s easier for our engineers to write code that results in page renders with optimal performance.

We had been using Backbone.js and Mustache templates for our JavaScript modules at Wayfair for some time, but last year we realized that our front-end performance needed an upgrade. We identified two areas for improvement: efficiency of client-side DOM updates and abstracting DOM manipulation away from engineers.

The first issue was a result of the standard render implementation in Backbone. By default, the Backbone render implementation does nothing. It is up to developers to implement the render function as they see fit. A common implementation of this (and the example given in Backbone Docs) looks something like this:

render: function() {

The problem with this implementation is two-fold: first, the entire view is unnecessarily re-rendered with jQuery’s $().html() whenever render is called, and second, the render method always manipulates the DOM regardless of whether the data changed, so engineers must be explicit about when render is called to avoid unnecessary expensive DOM updates. The solution to both of these problems is a mix of only calling render when the engineer is sure the entire view needs to be re-rendered and then writing low-level DOM manipulation code when only portions of the view need to be updated, or the update needs to be more precise (e.g., changing a single class on an element in the view). All of this means that engineers have to be very aware of the state of the DOM at all times, and have to be aware of the performance consequences of any DOM manipulations. This makes for view modules that are hard to reason about and include low-level DOM manipulation code.

To address both of these problems, we investigated front-end frameworks that would abstract the DOM from the developer while also providing high-performance updates. The primary library we looked at was React.js, a UI library open-sourced by Facebook that utilizes a one-way data flow with virtual DOM rendering to support high-performance client-side DOM updates. We really liked React.js, but encountered one major issue: the lack of support for templates which enabled high-performance server-side rendering.

On modern web sites and applications, HTML rendering occurs at two points: once on page load when the DOM is built from HTML delivered from the server, and again (0 to many times) when JavaScript updates the DOM after page load, usually as a result of the user interacting with the page. The initial rendering happens on the server with a multi-page site like Wayfair, and we’ve put a lot of work into making sure it’s as fast as it can be. HTML markup is written in Mustache templates and rendered via a C++ mustache renderer implemented as an extension for PHP. This gives us server-side rendering at speeds faster even than native PHP views.

Since server-side rendering is an important part of our web site, we were glad that React.js comes with this feature out of the box. Unfortunately while server-side rendering is available with React.js, it’s significantly slower than our existing C++ Mustache setup. In addition to performance, rendering React.js on the server would have required Node.js servers to supplement our PHP servers. This new requirement for UI rendering would have introduced complexity as well as a new single point of failure into our server stack. For these reasons, as well as the fact that we already had existing Mustache templates we wished to reuse, we decided React.js wasn’t a good fit.

Where do we go from here? We liked many of the concepts React.js introduced us to, such as reactive data-driven views and virtual DOM rendering, but we didn’t want our choice of a front-end framework to dictate our server-side technologies, and dictate a replacement of Mustache rendering via C++ and PHP. So, after some investigation of what else was available, we decided to take the concepts we liked from React.js and implement them ourselves with features that made sense for our tech stack.

Earlier this year, we wrote Tungsten.js, a modular web UI framework that leverages shared Mustache templates to enable high-performance rendering on both server and client. A few weeks ago we announced that we were open sourcing Tungsten.js, and today we’re excited to announce that primary development on Tungsten.js will be “GitHub first,” and all new updates to the framework can be found on our GitHub repo: https://github.com/wayfair/tungstenjs.

Tungsten.js is the bridge we built between Mustache templates, virtual-DOM, and Backbone.js. It uses the Ractive compiler to pre-compile Mustache templates to functions that return virtual DOM objects. It uses the virtual-DOM diff/patch library to make intelligent updates to the DOM. And it uses Backbone.js views, models, and collections as the developer-facing API. At least, it uses all these libraries for us here at Wayfair. Tungsten.js emphasizes modularity above all else. Any one of these layers in Tungsten.js can be swapped out for a similar library paired with an adaptor. Backbone could be swapped out for Ampersand. virtual-DOM could be swapped out another implementation. Mustache could be swapped out for Handlebars, Jade, or even JSX. So, more generally, Tungsten.js is a bridge between any combination of markup notation (templates), a UI updating mechanism, and a view layer for developers.

We don’t expect Tungsten.js to be the best fit for everyone, but we think it fits a common set of uses cases very well. We’ve been using it on customer-facing pages in production for a while here at Wayfair and so far we’ve been very happy with it. Our full-stack engineers frequently tell us they far prefer using Tungsten to vanilla Backbone.js + jQuery, and we’ve improved client-side performance now that DOM manipulation is abstracted away from developers. And while we weren’t trying to be the “fastest” front-end framework around, it turns out that when we re-implemented Ryan Florance’s DBMonster demo from React.js Conf in Tungsten.js, the browser’s frame rate ends up being, give or take, at the same level as both React and Ember with Glimmer.

Here at Wayfair we have a saying that “we’re never done”. That’s certainly the case with Tungsten.js, which we’re constantly improving. We have a lot of ideas for Tungsten.js in the coming months, so watch the repo for updates. And of course we welcome contributions!