Dan R.

About Dan R.

Dan is the Director of Systems Engineering at Wayfair. He is responsible for maintaining, improving and scaling the infrastructure and systems that power Wayfair both internally for employees and externally for customers. He is a proponent of FreeBSD, along with open source software in general. You can find him on Twitter @draco2002

Why not give Code Deploy Clients access to the repository?

We’ve received a few online, and in person questions like this, so i figured it was probably worth explaining in a little more detail.

On the Deployment server, we have a variety of applications that we deploy. From Windows .Net Services, Python, Classic ASP, CSS/JS and PHP to name a few.

We chose to standardize the interface to the Deployment server to make creating new code deployment clients simpler. Our Deployment server is essentially an on demand package creation and deployment system. Continue reading

Wayfair Code Deployment, Part 3

If you haven’t already read the overview of our deployment system or the architecture of our deployment server you might want to check those posts out first.

In this article I will discuss how we deploy code in a unique way that gives us all the flexibility of the symlink method without requiring a symlink.

Part of the decision on how to setup our deployment system was determined by the need to roll forward and roll back our deployments as quickly as possible. We also needed a way to do this atomically, so that no errors occurred when the code was physically being copied to the webservers.

Continue reading

Have you heard about ESC Conference 2011?

If you haven’t heard about the ESC Conference yet, you should check it out!

It’s a new conference this year, giving us all another place to congregate and discuss running code and systems at scale. If you’ve been to a devopsdays conference before then you will be familiar with and enjoy the opportunity to have a full day of Open Spaces!

The first day of the conference has a great line-up of talks, including one from our friends over at Etsy, and two talks from the Wayfair Engineering team!  Jonathan Klein will be presenting “Deep Dive Frontend Optimization” and Dan Rowe (me :) ) will be talking about monitoring with “Graphite, Statsd, and Graphite-Tattle : Simple metric collection meets self-serve alerts”.  We hope you can make it!

Eric Ries and the Lean Startup

Last January Eric Ries was kind enough to host a continuous deployment breakfast at Wayfair.

It was a great discussion, and we were happy to learn that we have independently arrived at similar conclusions over the last 9 years, and built a system that aligns closely with some of the principles that Eric advocates. The learning aspect and continuous improvement approach to the Lean Startup formula is an integral part of our method here, and why we resonate with Eric’s message.

Today Eric is back in town and speaking at an event at the Harvard i-lab.

As a way to help spread the word about continuous improvement, the Lean Startup movement and our new engineering blog describing how those fit in at Wayfair we are going to be giving away 10 copies of Eric’s book.

The first five blog posts mentioning the new engineering.wayfair.com site gets a copy of Eric’s new book. Post a comment below with a link to your post so we know where to look. To help get the word out, tweet this post and we’ll pick 5 random tweets to get books as well!

Wayfair Code Deployment, Part 2

The architecture of the deployment system

In the last post I explained our deployment system goals and the basic architecture.

Now that stage is set, let’s get into the details of the architecture from the deployment server perspective. As we designed the deployment server we wanted to keep the interface and the logic and requirements of the server itself as simple as possible.

The deployment server is running on our standard FLAMP stack. (FreeBSD, Lighttpd, APC, MSSQL/MySQL/Memcache, PHP). We’ll save more details on those decisions for a later post.

On the deployment server there is a configuration file that defines each application and the required details to deploy it. It defines the basic things like friendly name, allowed code reviewers, repository name and repository folders to be included. Continue reading

Continuous Improvement

The Wayfair Engineering team isn’t really on the continuous integration bandwagon, but our deployment model comes pretty close naturally because of our desire for continuous improvement.

In the block to the right of the homepage we are displaying the number of code deployments that engineers at Wayfair have done over the last 7 business days. This is a count of deployments to our new environment running our customer facing websites. This represents only a small portion of our total code base, but is a good display of how often we are improving the storefront for our customers. Internally we track a lot of other statistics about our deployments as well, like who did the deployment, the internal ticket number that the deployment was for, the SVN revision that was deployed, and who reviewed the code. Continue reading

Wayfair Code Deployment

Part 1 of a series on Code Deployment at Wayfair

Code Deployment at Wayfair has always been about creating the
fastest and most friction free deployment process possible .

Wayfair.com Architecture History
For the last 9 years our platform has been primarily a Classic ASP
environment on a Windows Server stack. In order to facilitate code deployment in that environment we wrote a script that replicated file changes out to the webfarm once they were FTPed to a central server.  Unfortunately as the webfarm grew, so did the amount of time it took to push code, and in the event of an issue, the time it took to roll back code. In this environment it took about 15 minutes to replicate code out to all of our servers.

From the chatter on the inter-webs and talks at Velocity and other conferences I’m sure a lot of people would think that is “fast enough”. For us this was unacceptable, since changes that had to go out together were not guaranteed to replicate at the same time, and with upwards of 50 deployments a day we needed to see our changes more quickly. Continue reading