It started as a proof of concept prototype in the fall of 2010. The idea came from a meeting where we were discussing the porting of our storefront codebase from classic ASP to PHP. One of the discussion points was how to avoid simply porting the same logic from one scripting language to another, but rather finding ways to move some of that logic to other more suitable platforms, including service oriented solutions. The little program was written as a self-hosted WCF service written in VB.NET running on my Windows XP box as a console application. It implemented a RESTful API service that returned the number of products in a customer’s shopping cart. Really simple and modest in scope, the thing worked like a charm. I made a presentation about it during one of our Wayfair Engineering Lunch and Learn sessions about a week later. So far, so good. Continue reading
A few weeks ago, we celebrated Inc! Magazine’s great cover story about us, including an internal poll of our favorite item from the photo shoot (results: tie between the purple dragon and giant giraffe). Unbeknownst to us, however, the story was later picked up by Yahoo!’s news feed on April 10th and posted to the scroller on their homepage. This is where our story begins… Continue reading
At Wayfair, we are big fans of Dell’s server platform, so naturally we were excited when their 12G line of servers started shipping. We are also very big fans of FreeBSD. Ahead of our first order for a new Dell PowerEdge r720, we did some research and found that the new PERC H710 RAID Controllers used the LSI SAS 2208 controller chip. A quick look at the FreeBSD hardware compatibility list for 9.0 Release showed that this chip was supported by the mps driver, knowing that PERC cards are usually supported by the mfi driver, we thought it was a bit weird, but weren’t that concerned. Continue reading
As has been discussed at great length on this blog recently, performance is a key part of the work we do here at Wayfair. Recently, we’ve been putting a lot of extra effort into our technology developments to make our pages load faster and more reliably. One of the most recent releases to this end was an update to how we store a lot of our data in our various caching systems. I am going to focus this post on the introduction of a new technology to our caching layers: CDB (Constant Database). Continue reading
You would think data replication is a piece of cake these days given all the advances in database technology, and that’s true for the most part when you’re dealing with databases of the same type, but when you have to replicate parts of your product catalog with other companies, things get a bit tricky. At Wayfair Engineering we’ve figured out how to make it happen by creating a great software solution that keeps our retail partner operations working like a well-oiled machine. Continue reading
At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways to have our queries run faster – by maintaining indexes, optimizing queries and procedures, creating any missing indexes based on query usage, generating statistics on currently running queries, and filtering out queries with top CPU usage, among other improvements. Of late, we’ve been trying to eliminate any implicit data type conversions that happen at runtime. Implicit data type conversions come with cost, especially when the conversion is performed at the column side of the query – not the literal side. We have had scenarios where for high volume processing jobs (processing millions of records) we had index scan execution on queries due to implicit conversions. A simple demonstration of an implicit conversion is: WHERE a.OrderID = b.OrderNo; a.OrderID being varchar(30) and b.OrderNo is nvarchar(30). Here the execution plan would do an implicit cast to nvarchar(30) and would perform an index scan operation on the millions of records – with you waiting endlessly for the query or job to finish. Continue reading
Our story begins in Holland in 1997, where a researcher named Stijn van Dongen, who is pretty good at Go, has a 5-minute flash of insight into modeling flows with stochastic matrices. He writes a thesis about it and makes a toolkit called MCL with a free software license.
Flash forward to late 2011. It turns out that MCL is pretty useful if you are trying to sell home goods on the internet, and perhaps other types of goods as well. The search and recommendations team at Wayfair has just launched a simple recommender component, as described here. Our system is working pretty well and giving the people something like what they want, but we suspect we can find more interesting connections among people and things than the ones we are finding. Greg and I are reading academic and industry research papers, when Greg finds Stijn’s research and MCL. We give it a try, and our recommendations improve. Continue reading
When you sit down to write a recommendations system, there are quite a few well-practiced techniques you can use, and it’s difficult to know in advance how well they are going to work out when applied to your data. Thanks to the Netflix prize, which was initiated in 2006 and awarded in 2009, a lot has been written on recommender systems for the Netflix data set. If you happen to have a product catalogue similar to Netflix’s (those movies from the 60s are still being viewed and rated), and your users happen to have scored it with a 5-point explicit ratings system, there are some awesome advanced techniques and frameworks that you can take for a spin. Does that sound like you? Show of hands? I didn’t think so. Our data is certainly nothing like that.
We run a python/Tornado-based recommendations service behind the scenes at Wayfair. As part of our code deployments, we need to install various third-party libraries to our Tornado servers. The python tools that do this kind of thing are a bit half-baked, so we paper over their inadequacies with puppet.
A while back a fellow name Richard Crowley wrote a puppet-pip provider, which seems to have been folded into Puppet 2.7, or replaced by a module in Puppet 2.7, or something like that. So in a sense his little project is dead. But Karthick on our team has resurrected a fork of it, a hybrid provider using subcommands of setuptools (easy_install) and pip for different aspects of installation, version checking and uninstallation. We call it easypip (easypip.rb), and the forked project containing it is now up on github. Enjoy!
It’s a new year and it’s time for another report on how fast (or slow) our sites are. If you are new here, our previous report looked at the average load time for our four major types of pages, as well as the 95th percentile load time. The inspiration for this type of post came from our friends at Etsy. Continue reading