You would think data replication is a piece of cake these days given all the advances in database technology, and that’s true for the most part when you’re dealing with databases of the same type, but when you have to replicate parts of your product catalog with other companies, things get a bit tricky. At Wayfair Engineering we’ve figured out how to make it happen by creating a great software solution that keeps our retail partner operations working like a well-oiled machine. Continue reading
At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways to have our queries run faster – by maintaining indexes, optimizing queries and procedures, creating any missing indexes based on query usage, generating statistics on currently running queries, and filtering out queries with top CPU usage, among other improvements. Of late, we’ve been trying to eliminate any implicit data type conversions that happen at runtime. Implicit data type conversions come with cost, especially when the conversion is performed at the column side of the query – not the literal side. We have had scenarios where for high volume processing jobs (processing millions of records) we had index scan execution on queries due to implicit conversions. A simple demonstration of an implicit conversion is: WHERE a.OrderID = b.OrderNo; a.OrderID being varchar(30) and b.OrderNo is nvarchar(30). Here the execution plan would do an implicit cast to nvarchar(30) and would perform an index scan operation on the millions of records – with you waiting endlessly for the query or job to finish. Continue reading
Our story begins in Holland in 1997, where a researcher named Stijn van Dongen, who is pretty good at Go, has a 5-minute flash of insight into modeling flows with stochastic matrices. He writes a thesis about it and makes a toolkit called MCL with a free software license.
Flash forward to late 2011. It turns out that MCL is pretty useful if you are trying to sell home goods on the internet, and perhaps other types of goods as well. The search and recommendations team at Wayfair has just launched a simple recommender component, as described here. Our system is working pretty well and giving the people something like what they want, but we suspect we can find more interesting connections among people and things than the ones we are finding. Greg and I are reading academic and industry research papers, when Greg finds Stijn’s research and MCL. We give it a try, and our recommendations improve. Continue reading
When you sit down to write a recommendations system, there are quite a few well-practiced techniques you can use, and it’s difficult to know in advance how well they are going to work out when applied to your data. Thanks to the Netflix prize, which was initiated in 2006 and awarded in 2009, a lot has been written on recommender systems for the Netflix data set. If you happen to have a product catalogue similar to Netflix’s (those movies from the 60s are still being viewed and rated), and your users happen to have scored it with a 5-point explicit ratings system, there are some awesome advanced techniques and frameworks that you can take for a spin. Does that sound like you? Show of hands? I didn’t think so. Our data is certainly nothing like that.
We run a python/Tornado-based recommendations service behind the scenes at Wayfair. As part of our code deployments, we need to install various third-party libraries to our Tornado servers. The python tools that do this kind of thing are a bit half-baked, so we paper over their inadequacies with puppet.
A while back a fellow name Richard Crowley wrote a puppet-pip provider, which seems to have been folded into Puppet 2.7, or replaced by a module in Puppet 2.7, or something like that. So in a sense his little project is dead. But Karthick on our team has resurrected a fork of it, a hybrid provider using subcommands of setuptools (easy_install) and pip for different aspects of installation, version checking and uninstallation. We call it easypip (easypip.rb), and the forked project containing it is now up on github. Enjoy!
It’s a new year and it’s time for another report on how fast (or slow) our sites are. If you are new here, our previous report looked at the average load time for our four major types of pages, as well as the 95th percentile load time. The inspiration for this type of post came from our friends at Etsy. Continue reading
Here at Wayfair, we have thousands of suppliers we work with in order to provide our products to our customers. To automate the bulk of these interactions, we use Electronic Data Interchange (EDI), so that we can trade documents back and forth. FTP is still one of the predominant methods for transferring these documents, so we have had to build a robust FTP solution to handle this traffic.
As we have mentioned before, the main source control system we use at Wayfair is SVN, with TortoiseSVN as our client. One of the things we love about SVN is the ability to add commit hooks, or checks that run when someone tries to commit a file to source control. By having a few key checks we can prevent bugs, ensure consistent coding practices, and generally have a cleaner codebase.
FreeBSD has spent the last few years implementing awesome Solaris-based features such as DTrace and ZFS. Here at Wayfair, we use FreeBSD on our servers and our development environments. As a security engineer for Wayfair, I’m tasked with many fun projects. I get to test out different configurations and applications in FreeBSD. Naturally, I set up a FreeBSD 8-STABLE VM to do my testing.
ZFS gives you a lot of flexibility and stability. I decided to go with ZFS on my VM so that I could emulate OpenIndiana’s boot environment (BE) system. For an example on how boot environments work, please refer to the OpenSolaris documentation.
Progressive Enhancement is often described as an alternate approach to “Graceful Degradation” – it encourages focusing on the most basic functionality first and then building out from there. It also forms the core of the Yahoo! Graded Browser Support model, which we use as a guide for our own rules around browser support. This is an important topic, but it has been covered fairly extensively in other articles, so I’m not going to dive into it too much here. Instead I am going to talk about specific progressive enhancement techniques we use at Wayfair to improve site performance.