These days, in the big data community, we often hear how biologists have adopted and are using distributed computing technologies that were first introduced to solve problems in software engineering. The fact that Wayfair has done the inverse and used a tool initially developed to help biologists cluster similar proteins together to solve a problem in e-commerce, piqued the curiosity of Information Week magazine, who asked us for an interview about our February blog post on using Markov clustering for generating recommendations http://engineering.wayfair.com/recommendations-with-markov-clustering/. Read the interview here http://www.informationweek.com/big-data/news/big-data-analytics/240007850/online-retailer-uses-dna-research-to-connect-with-customers
Our story begins in Holland in 1997, where a researcher named Stijn van Dongen, who is pretty good at Go, has a 5-minute flash of insight into modeling flows with stochastic matrices. He writes a thesis about it and makes a toolkit called MCL with a free software license.
Flash forward to late 2011. It turns out that MCL is pretty useful if you are trying to sell home goods on the internet, and perhaps other types of goods as well. The search and recommendations team at Wayfair has just launched a simple recommender component, as described here. Our system is working pretty well and giving the people something like what they want, but we suspect we can find more interesting connections among people and things than the ones we are finding. Greg and I are reading academic and industry research papers, when Greg finds Stijn’s research and MCL. We give it a try, and our recommendations improve. Continue reading
When you sit down to write a recommendations system, there are quite a few well-practiced techniques you can use, and it’s difficult to know in advance how well they are going to work out when applied to your data. Thanks to the Netflix prize, which was initiated in 2006 and awarded in 2009, a lot has been written on recommender systems for the Netflix data set. If you happen to have a product catalogue similar to Netflix’s (those movies from the 60s are still being viewed and rated), and your users happen to have scored it with a 5-point explicit ratings system, there are some awesome advanced techniques and frameworks that you can take for a spin. Does that sound like you? Show of hands? I didn’t think so. Our data is certainly nothing like that.