Better Lucene/Solr searches with a boost from an external naive Bayes classifier

Me: Doug, what are you doing?

Doug: Solving the problem of class struggle with one of Greg‘s classifiers.

Me:  Karl Marx should call his office.  What do you mean by that?

Doug: Let me explain… Continue reading

Better three-word searches with SOLR

We use the Apache SOLR search platform behind the scenes at Wayfair.  Sometimes, when vanilla SOLR doesn’t quite do what we want, we improve it for our purposes. When we suspect that others might have the same purposes, and we think that we have solved our problems in a generally useful way, we contribute our solutions back to the open source community, either on github, or through a more project-specific distribution channel.  SOLR is an Apache project, so for SOLR, this means attaching a patch to a ‘Jira’.  This blog post is about SOLR Jira 1093. Continue reading

Information Week Interviews Wayfair on its use of Markov Clustering

These days, in the big data community, we often hear how biologists have adopted and are using distributed computing technologies that were first introduced to solve problems in software engineering. The fact that Wayfair has done the inverse and used a tool initially developed to help biologists cluster similar proteins together to solve a problem in e-commerce, piqued the curiosity of Information Week magazine, who asked us for an interview about our February blog post on using Markov clustering for generating recommendations Read the interview here


Northeast PHP Recap

Last weekend was the inaugural run of the Northeast PHP Conference in Boston.  Wayfair was a gold sponsor, so we bought t-shirts, paid for apps and beer at the Saturday night event, and also sent about 15 engineers to the event.  I gave a talk on High Performance PHP, and we had a blast. Check out the slides from my talk. The feedback was great, and we look forward to sponsoring the conference again next year!

You can also take a look at some of the other talks that we really enjoyed:

Thanks to Michael Bourque and the other organizers for putting on a great event!

Measuring CDN Performance Benefits with Real Users

A couple of weeks ago I ran a test with WebPagetest that was designed to quantify how much a CDN improves performance for users that are far from your origin.  Unfortunately, the test indicated that there was no material performance benefit to having a CDN in place.  This conclusion sparked a lively discussion in the comments and on Google+, with the overwhelming suggestion being that Real User Monitoring data was necessary to draw a firm conclusion about the impact of CDNs on performance.  To gather this data I turned to the Insight product and its “tagging” feature.

Before I get into the nitty-gritty details I’ll give you the punch line: the test with real users confirmed the results from the synthetic one, showing no major performance improvement due to the use of a CDN. Continue reading

Webops for Python, part 2: the how-to

In part 1 of this 2-part series we used a comic strip to depict Python programmers and web operations folk working together to figure out how to deploy some scientific computing to an e-commerce site.  Joking aside, let’s describe exactly what were were trying to accomplish, and how we did it. Continue reading

WebOps for Python, part 1: the comic strip

Python is my favorite computer language for data science, but it is a poorly standardized beast when it comes to packaging, deployment, web operations, etc.  There are plenty of people who are deploying Python code to the web effectively, but especially in the data science area, there is no equivalent of the LAMP stack that you can just plug in and start coding against.  We have a way, among other possible ways, of solving these problems, that we think people might find useful, and I am going to describe our methods in a couple of blog posts.  The first one will tell the story as a comic strip.  The next one will have the code and instructions. Continue reading

REST, REST, REST, you can never get enough rest.

Representational State Transfer (REST) has been around for more than a decade and continues to grow in popularity. There are dozens of frameworks that can expedite implementing a solution; 16 OpenSource PHP alone. So when Wayfair Engineering decided to evaluate migrating to Web Services (WS), it was a no-brainer. REST, right? Wait; there are other more mature frameworks. Continue reading

Wayfair Engineering Open Board Game Night

At Wayfair Engineering, we’re not just proud of the elegant technical solutions we implement, but we’re also proud of our team.  As part of our team bonding, we have frequent “Pod Outings,” activities that can be organized by any member of the Engineering team. Some recent Pod Outings have included a trip to play aerial dodgeball at SkyZone, a paintball outing where we honed our squad tactics, a relaxing day of golf, and our recurring breakfast club before work.  Sometimes the best outings are the ones we host at our offices, as we recently did when we decked out our 24th floor with food, beer, and every gaming system we could lay our hands on. Continue reading

The little program that could

It started as a proof of concept prototype in the fall of 2010. The idea came from a meeting where we were discussing the porting of our storefront codebase from classic ASP to PHP. One of the discussion points was how to avoid simply porting the same logic from one scripting language to another, but rather finding ways to move some of that logic to other more suitable platforms, including service oriented solutions. The little program was written as a self-hosted WCF service written in VB.NET running on my Windows XP box as a console application. It implemented a RESTful API service that returned the number of products in a customer’s shopping cart. Really simple and modest in scope, the thing worked like a charm. I made a presentation about it during one of our Wayfair Engineering Lunch and Learn sessions about a week later. So far, so good. Continue reading