Wayfair @ Beanpot Hackathon

Wayfair was invited to be a sponsor at this year’s Beanpot Hackathon (link: http://www.hackbeanpot.com/), held last week at the Microsoft NERD center in  Cambridge.  The concept of a hackathon is so closely related to our core values, that we jumped at the opportunity to participate.  Wylie Conlon, along with others from the nuACM (link: http://acm.ccs.neu.edu/), did a great job organizing this event.

For those unfamiliar, a hackathon is a fantastic display of creativity, technical skills, team work, problem solving, and time management, all compressed into a single marathon event. The beanpot hackathon produced 17 demos, impressive considering the event only lasted about 24-hours.

As the event got underway, the dinner area was buzzing with excitement.  Groups of people informally huddled together, some with a white board to their side, drawing sketches and getting feedback, others researching stuff on their laptops, everyone engaged in the discussion bouncing ideas back and forth.  As different teams solidified, they moved to the main conference room to start building their project.  The one theme that was consistent across all groups was passion for technology, and enthusiasm to get something ready for demo.

Most groups worked through the night, taking short naps between bursts of coding.  We had some of our engineers available as mentors, although most groups seemed to be heads down and not looking for outside assistance.  Near the entrance to the conference room, Wayfair setup a duck pond, available for those needing a fun distraction from their project.  There was a fishing pole, and you could pull a duck from the pond to win a prize.  The rubber duck also serves as a good sounding board for ideas, or debugging code when you are stuck. (link: http://en.wikipedia.org/wiki/Rubber_duck_debugging/)

By the time Saturday evening arrived, I was blown away by some of the projects that teams put together.  Not only were the demos some cool application, or something that solved a problem, but the presentations were well done.  In many cases, the presenters talked about their inspiration, thought process, and where they saw the idea going next.  Questions from the audience were often constructive and suggested improvements.

Looking back on the event, I think one of the reasons we aligned so well with this particular event is because of the similarities to our work environment — Smart people using technology to solve problems quickly and get things done.  I see that demonstrated every day in our engineering department, and it was refreshing to see so many talented students come together for an event like this.  On a related note, we are hiring for summer internships in our software development group.  If you were a participant at the beanpot hackathon, or this type of environment sounds good to you, please get in touch with us (link: eomeara@wayfair.com)

Brad S.

(contributors: Elias Y., Nishan S.)

Lessons from a datacenter move

Last winter we were discussing all of our upcoming projects, and what they would require for new hardware in the datacenter.  Then we took a look at the space we had in our cage space at our main datacenter.  Turns out, we didn’t have enough space, and the facility wouldn’t give us any more power in the current footprint we had.  There was also no room to expand our cage.  We had two basic options, one would have been to add additional cage space either in the same building, or even another facility and rely on cross connects or WAN connections.  We weren’t wild about this approach because we knew it would come back to bite us later as we continuously fought with the concept, and had to decide which systems should be in which space.  The other option was to move entirely into a bigger footprint.  We opted to stay in the same facility, which made moving significantly easier, and moved to a space that is 70% larger then our old space, giving us lots of room as we grow.  Another major driver in the decision to move entirely was that it afforded us the opportunity to completely redo our network infrastructure from the ground up to have a much more modular setup and finally using 10Gb everywhere in our core and aggregation layers.

Some stats on the move:

  • Data migrated for NAS and SAN block storage: 161 TB
  • Network cables plugged in: 798
  • Physical servers moved or newly installed: 99 rack mount and 50 blades
  • Physical servers decommissioned to save power and simplify our environment: 49
  • VMs newly stood up or migrated: 619

It’s worth noting that the physical moves were done over the course of 2 months.  Why so long?  Unlike many companies that can have a weekend to bring things down, we aren’t afforded that luxury.  We have customer service working in our offices 7 days a week both in the US as well as Europe, and we have our website to think about, which never closes.  In fact, we were able to pull this off with only a single 4-hour outage to our storefront, and several very small outages to our internal and backend systems during weeknights throughout the project.

Lessons Learned:

No matter how good your documentation is, it’s probably not good enough.  Most folks documentation concentrates on break/fix and general architecture of a system, what’s installed, how it’s configured, etc.  Since we drastically changed our network infrastructure, we had to re-ip every server when it was moved.  We had to go through and come up with procedures for what else needed to happen when a machine suddenly had a new IP address.  We use DNS for some things, but not everything, so we had to ensure that inter-related systems were also updated when we moved things.

Get business leads involved in the timeline.  This sounds funny, but one of the biggest metrics in measuring the success of a project like this is the perception of the users.  Since a good percentage of the systems moved had certain business units as the main “customers”, we worked with leaders from these business units to ensure we understood  their use of the systems, what days or times of day were they using it the most, or if they had any concerns over off-hours operations during different times of the week.  Once we had this info from many different groups, we sat down in a big room with all the engineers responsible for these systems, and came up with a calendar for the move, then got final approval for dates from the business leads.  This was probably the smarted thing we did, and went a long way in helping our “customer satisfaction”.

Another thing we learned early on was to divide the work of the physical moving of equipment and the work done by the subject matter experts to make system changes and ensure things are working properly after the physical move.  This freed the subject matter expert to get right to work, and not have to worry about other, non-related systems that were also being moved in the same maintenance window.  How did we pull this off?  Again, include everyone.  We have a large Infrastructure Engineering team, 73 people as of this writing.  We got everyone involved, from our frontline and IT Support groups, all the way up to directors; even Steve Conine, one of our co-founders did an overnight stint at the datacenter helping with the physical move of servers.  It was an amazing team effort, and we would never have had such a smooth transition if everyone didn’t step up in a big way.

I hope these little tidbits are helpful to anyone taking on such a monumental task as moving an entire data center.  As always, thanks for reading.

The little program that could

It started as a proof of concept prototype in the fall of 2010. The idea came from a meeting where we were discussing the porting of our storefront codebase from classic ASP to PHP. One of the discussion points was how to avoid simply porting the same logic from one scripting language to another, but rather finding ways to move some of that logic to other more suitable platforms, including service oriented solutions. The little program was written as a self-hosted WCF service written in VB.NET running on my Windows XP box as a console application. It implemented a RESTful API service that returned the number of products in a customer’s shopping cart. Really simple and modest in scope, the thing worked like a charm. I made a presentation about it during one of our Wayfair Engineering Lunch and Learn sessions about a week later. So far, so good. Continue reading

Eliminate Implicit Conversions

At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways to have our queries run faster – by maintaining indexes, optimizing queries and procedures, creating any missing indexes based on query usage, generating statistics on currently running queries, and filtering out queries with top CPU usage, among other improvements. Of late, we’ve been trying to eliminate any implicit data type conversions that happen at runtime. Implicit data type conversions come with cost, especially when the conversion is performed at the column side of the query – not the literal side. We have had scenarios where for high volume processing jobs (processing millions of records) we had index scan execution on queries due to implicit conversions. A simple demonstration of an implicit conversion is: WHERE a.OrderID = b.OrderNo; a.OrderID being varchar(30) and b.OrderNo is nvarchar(30). Here the execution plan would do an implicit cast to nvarchar(30) and would perform an index scan operation on the millions of records – with you waiting endlessly for the query or job to finish. Continue reading

FreeBSD ZFS Boot Environments

FreeBSD has spent the last few years implementing awesome Solaris-based features such as DTrace and ZFS. Here at Wayfair, we use FreeBSD on our servers and our development environments. As a security engineer for Wayfair, I’m tasked with many fun projects. I get to test out different configurations and applications in FreeBSD. Naturally, I set up a FreeBSD 8-STABLE VM to do my testing.

ZFS gives you a lot of flexibility and stability. I decided to go with ZFS on my VM so that I could emulate OpenIndiana’s boot environment (BE) system. For an example on how boot environments work, please refer to the OpenSolaris documentation.

Continue reading