<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Wayfair Engineering</title>
	<atom:link href="http://engineering.wayfair.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://engineering.wayfair.com</link>
	<description>Building things bigger and faster everyday</description>
	<lastBuildDate>Tue, 08 May 2012 18:42:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Wayfair Engineering Open Board Game Night</title>
		<link>http://engineering.wayfair.com/wayfair-engineering-open-board-game-night/</link>
		<comments>http://engineering.wayfair.com/wayfair-engineering-open-board-game-night/#comments</comments>
		<pubDate>Tue, 08 May 2012 18:38:35 +0000</pubDate>
		<dc:creator>cbuchananhowland</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=736</guid>
		<description><![CDATA[At Wayfair Engineering, we’re not just proud of the elegant technical solutions we implement, but we’re also proud of our team.  As part of our team bonding, we have frequent “Pod Outings,” activities that can be organized by any member &#8230; <a href="http://engineering.wayfair.com/wayfair-engineering-open-board-game-night/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At Wayfair Engineering, we’re not just proud of the elegant technical solutions we implement, but we’re also proud of our team.  As part of our team bonding, we have frequent “Pod Outings,” activities that can be organized by any member of the Engineering team. Some recent Pod Outings have included a trip to play aerial dodgeball at SkyZone, a paintball outing where we honed our squad tactics, a relaxing day of golf, and our recurring breakfast club before work.  Sometimes the best outings are the ones we host at our offices, as we recently did when we decked out our 24th floor with food, beer, and every gaming system we could lay our hands on.<span id="more-736"></span></p>
<p>The team bonding isn’t limited to after-hours; several different groups of engineers get together at lunch throughout the week to play board and card games – Tuesday is usually dominated by Dominion, and Thursday is all about Bridge, Wizard, and other trick-taking games.</p>
<p>It doesn’t take a Wayfair engineer to see that there’s some synergy between our small lunch gatherings and our big after-hours blowouts, so we’re putting together our first Wayfair Engineering Open Board Game Night. We’re reaching out to the Boston engineering community to join us on our 24th floor (check out the sweet view) on <strong>Wednesday May 23rd from 6:30 to 9:00 pm</strong>. There will be free food and brew, along with a selection of our favorite games.</p>
<p>Register <a href="http://wayfairgames.eventbrite.com/">here</a> to spend some quality time with the Wayfair engineering team. Feel free to bring your own favorite games and to bring your friends – but make sure to register them too, so we’re sure to have plenty of pizza to go around.  Since we will be ordering food and beverages based upon the number of registered users, Please don’t sign up unless you will make a best effort to attend.</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/wayfair-engineering-open-board-game-night/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The little program that could</title>
		<link>http://engineering.wayfair.com/the-little-program-that-could/</link>
		<comments>http://engineering.wayfair.com/the-little-program-that-could/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 16:12:51 +0000</pubDate>
		<dc:creator>Bernardo</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=605</guid>
		<description><![CDATA[It started as a proof of concept prototype in the fall of 2010. The idea came from a meeting where we were discussing the porting of our storefront codebase from classic ASP to PHP. One of the discussion points was &#8230; <a href="http://engineering.wayfair.com/the-little-program-that-could/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It started as a proof of concept prototype in the fall of 2010. The idea came from a meeting where we were discussing the porting of our storefront codebase from classic ASP to PHP. One of the discussion points was how to avoid simply porting the same logic from one scripting language to another, but rather finding ways to move some of that logic to other more suitable platforms, including service oriented solutions. The little program was written as a self-hosted WCF service written in VB.NET running on my Windows XP box as a console application. It implemented a RESTful API service that returned the number of products in a customer’s shopping cart. Really simple and modest in scope, the thing worked like a charm. I made a presentation about it during one of our Wayfair Engineering Lunch and Learn sessions about a week later. So far, so good.<span id="more-605"></span></p>
<p>Nothing happened to it for about a year, until one day in early November of 2011. The port to PHP had been successful and we were gradually moving traffic from our ASP scripted web pages to the new PHP platform. The holiday season was approaching and the PHP platform was taking more and more traffic every day. The web platform started showing lots of faults and things were getting worse with the increase in traffic. Turns out that the FreeTDS driver we use in PHP scripts to access our MSSQL databases had trouble making connections under high traffic conditions. The efforts to resolve the issues with the driver produced no immediate results, and we started to brainstorm for alternative ways to solve the problem. Wayfair Engineering had to come up with something soon. The biggest chunk of the DB queries affected were identified, and it looked like our inventory lookup query was the easiest one to divert to an alternative solution. Without inventory lookups, the FreeTDS driver seemed capable of handling the expected traffic.</p>
<p>A few ideas came up. One of them was to take that year-old WCF prototype and turn it into a proxy service for MSSQL inventory lookups. So I started adding the necessary code to implement the inventory query as a pass-through operation. So, essentially, the PHP client script would make a REST API call to the new service over HTTP passing the query as a parameter and get the query execution result in the response. I added code to implement the database connection logic from our standard method and tested that. I also needed to provide a good format for the response and wrote a function that formats a dataset object directly into a JSON object to facilitate the use of the response in PHP. Once all this was working I had to test it under heavy traffic conditions, so our DBAs provided captured query traffic from the production databases to use as realistic load. I wrote a simple client to call the service with the captured queries and monitored the performance, memory and CPU. At first it looked promising, but eventually the memory use rose to high levels, so I checked the code and found ways to eliminate potential memory leaks. After that, the service was ready for a trial in the production environment, so the PHP team built a client function and prepared to send the traffic of one of the web servers to the new service. The service performed reasonably well and more and more servers were added to the trial until all the inventory calls were using the service. The response time wasn’t as good as the one from the FreeTDS driver’s call, but it was close enough and it didn’t cause connection faults. The new service was named “http query”. Once the service was running in its own application server, it took care of our high traffic worries and sailed through “Black Friday”, “Cyber Monday” and the rest of the holiday season workloads without any major issues.</p>
<p>This experience made Wayfair Engineering aware of alternative ways to solve scalability problems. Since then, a few new service oriented solutions have spawned from the little program that could. One solution uses the same concept to process order data to identify fraudulent trends. Another effort by the Recommendations team built upon the same foundations to implement a more efficient way to retrieve inventory status for multiple products in a single request.</p>
<p>The Wayfair Engineering Express has left the station, and I’m so glad I got a great window seat. Long live the little program that could!</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/the-little-program-that-could/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The 24-Hour News Cycle</title>
		<link>http://engineering.wayfair.com/the-24-hour-news-cycle/</link>
		<comments>http://engineering.wayfair.com/the-24-hour-news-cycle/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 15:25:03 +0000</pubDate>
		<dc:creator>eric</dc:creator>
				<category><![CDATA[Web Performance]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=699</guid>
		<description><![CDATA[A few weeks ago, we celebrated Inc! Magazine’s great cover story about us, including an internal poll of our favorite item from the photo shoot (results: tie between the purple dragon and giant giraffe).  Unbeknownst to us, however, the story &#8230; <a href="http://engineering.wayfair.com/the-24-hour-news-cycle/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, we celebrated Inc! Magazine’s great <a href="http://www.inc.com/magazine/201204/kasey-wehrum/the-road-to-1-billion-growth-special-report.html">cover story</a> about us, including an internal poll of our favorite item from the photo shoot (results: tie between the purple dragon and giant giraffe).  Unbeknownst to us, however, the story was later picked up by Yahoo!’s news feed on April 10<sup>th</sup> and posted to the scroller on their homepage.  This is where our story begins…<span id="more-699"></span></p>
<p>Around 10 a.m., we started to receive alerts from our internal and external monitoring indicating that something out of the ordinary was up with the site.  Response times were spiking and, within 15 minutes, we were returning error pages apparently caused by timeouts from our back-end.  Everyone scrambled to determine what was causing the site outage:</p>
<ul>
<li>Did we just push bad code?</li>
<li>Did something break?</li>
<li>Was this a DOS attack?</li>
</ul>
<p>Our analytics team was able to determine that referral traffic from Yahoo! had spiked by over 500%.  And they quickly realized that the Inc. story had been picked up by Yahoo! – thus leading us to conclude that the resulting interest was causing the capacity issues we were seeing.</p>
<p>Our site is designed to try to handle large traffic loads and we successfully handled the past holiday rush, which was significantly higher than our typical April load.  However, within the first 15 minutes, we had sustained a <em>forty-fold</em> increase in the amount of traffic to certain pages which brought our total load well beyond what we experienced on Cyber Monday, which had been our biggest day to date.  We had reached our capacity limit which caused the site to slow down for everyone and, at the peak, to return error pages for about half of all requests.</p>
<p>We had sufficient web server capacity, so that wasn’t the issue.  As we looked closer, we found that we had reached a connection pool limit within our load balancing layer, something that we had not experienced previously, so we quickly increased the cap.  By around 10:45 a.m., the error pages had stopped and the site started returning to normal.</p>
<p>Or so we thought…</p>
<p>Traffic continued to grow and we quickly reached another plateau, at twice our normal levels.  We were no longer returning error pages, but load times were still high.  We could see that something was overloading our front-end caching databases, causing application calls to queue and eventually time out.</p>
<p>Over the next hour, we determined that one routine in particular was causing most of our database problems.  The particular call provides us with geo-location information that we use, amongst other things, to provide localized content for our <a href="https://www.getitnearme.com/">Get it Near Me!</a> service.  Fortunately, our application makes use of internal “feature knobs”, which allow us to dial up and down the percentage of users who see elements on our site without requiring code changes, so we simply set this function to appear for zero user.  With this feature turned off, site performance returned to normal, but this time at four times our normal traffic volume.  About 90 minutes later, we had <a href="http://engineering.wayfair.com/wayfair-code-deployment-part-3/">new code in production</a> that resolved the issue and allowed us to turn the feature back on.</p>
<p>Overall, site traffic stayed up for the rest of the day and through 11am the following morning.  At that point, we saw a sharp drop down in traffic with levels returning to what we would expect for a normal April morning.  Our news story had cycled off of Yahoo!’s news page, thus ending the 24-hour news cycle and distinguishing Tuesday as having the largest overall traffic volume in our history, beating the previous Cyber Monday by over 20%.</p>
<p>We are now officially a member of a somewhat exclusive club of large web companies who have fallen victim to their own success.   The real test is to learn from the past, avoid the same mistakes, and continue to improve.  Yahoo! posted our story again on Sunday as part of their week in review, causing that day to have our fourth highest traffic count behind only Tuesday’s spike and the previous two Cyber Monday’s.  The site worked fine.</p>
<p>With Sunday’s spike in traffic successfully behind us, we’re encouraged by our team’s ability to adapt quickly and improve, but by no means are we thinking our job is done &#8211; instead, we’re now ready and eager to face our next challenge!</p>
<div id="attachment_706" class="wp-caption aligncenter" style="width: 594px"><a href="http://engineering.wayfair.com/files/2012/04/Yahoo-Traffic-Spike.png"><img class="size-large wp-image-706" src="http://engineering.wayfair.com/files/2012/04/Yahoo-Traffic-Spike-1024x628.png" alt="" width="584" height="358" /></a><p class="wp-caption-text">(Requests per second over time)</p></div>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/the-24-hour-news-cycle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>FreeBSD 9.0 on Dell PowerEdge 12G servers</title>
		<link>http://engineering.wayfair.com/freebsd-9-0-on-dell-poweredge-12g-servers/</link>
		<comments>http://engineering.wayfair.com/freebsd-9-0-on-dell-poweredge-12g-servers/#comments</comments>
		<pubDate>Wed, 28 Mar 2012 13:54:42 +0000</pubDate>
		<dc:creator>Dan C.</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=678</guid>
		<description><![CDATA[At Wayfair, we are big fans of Dell’s server platform, so naturally we were excited when their 12G line of servers started shipping.  We are also very big fans of FreeBSD.  Ahead of our first order for a new Dell &#8230; <a href="http://engineering.wayfair.com/freebsd-9-0-on-dell-poweredge-12g-servers/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At Wayfair, we are big fans of Dell’s server platform, so naturally we were excited when their 12G line of servers started shipping.  We are also very big fans of FreeBSD.  Ahead of our first order for a new Dell PowerEdge r720, we did some research and found that the new PERC H710 RAID Controllers used the LSI SAS 2208 controller chip.  A quick look at the FreeBSD hardware compatibility list for 9.0 Release showed that this chip was supported by the mps driver, knowing that PERC cards are usually supported by the mfi driver, we thought it was a bit weird, but weren’t that concerned.<span id="more-678"></span></p>
<p>The hardware came in, we loaded up a FreeBSD 9.0 Release install CD, and the RAID controller was actually detected by an entirely different driver, the mpt driver.  It turns out there is a bug in the release version of that driver, which we were able to get around fairly easily (more on that later).  Still, none of the drivers in the release version would detect the raid controller. Eventually we were able to find a project on FreeBSD’s SVN site where a new version of the mfi driver was being developed that supports the new line of Dell PERC cards.  Below is a quick how-to on making an install CD that will work perfectly with the new RAID controllers.  After we were able to get a clean install, we then discovered that the Broadcom LOM NIC is detected, but doesn’t function, so we just installed an Intel Pro1000 PCI NIC.  We later discovered that Intel has source code for a newer version of the igb driver for the optional Intel LOM NICs available <a href="http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&amp;DwnldID=15815&amp;ProdId=3356&amp;lang=eng&amp;OSVersion=FreeBSD*&amp;DownloadType=Drivers">here</a>.  We haven’t tested this yet, but the same method of injecting new driver code we use below should work for this driver, as well.</p>
<p>Ok, start with a clean install of FreeBSD 9.0 Release. (We used a VM, but physical hardware is fine, too.)  Just make sure you have plenty of drive space, as we ended up using about 20GB by the time we were done</p>
<p>1. Download the stable version of source code, which we’ll use when building world and our kernel to build our install CD</p>
<p><em>csup stable-supfile</em></p>
<p>2. Do a standard buildworld.  We don’t need to install it, but the output of this is used when building the install CD</p>
<p><em>cd /usr/src/</em></p>
<p><em>make buildworld</em></p>
<p>3. Now we need to get the new driver source code.  The SVN checkout link is <a href="http://svn.freebsd.org/base/projects/head_mfi/sys/dev/mfi">here</a>, and <a href="http://svnweb.freebsd.org/base/projects/head_mfi/sys/dev/mfi/">here </a>is the link to the websvn page if you want to take a look at the commit logs.  Once you have all the files downloaded, put them in /usr/src/sys/dev/mfi folder, overwriting all existing files.</p>
<p>4. The new code actually added 2 new C files, so we need to add them to one of the included make files so when we build our kernel, it will see these new files.</p>
<p>Fire up your favorite text editor and edit /usr/src/sys/conf/files</p>
<p>Find the lines with the mfi driver files by searching for mfi, and add the following lines after those:</p>
<p><em>dev/mfi/mfi_syspd.c     optional mfi</em></p>
<p><em>dev/mfi/mfi_tbolt.c       optional mfi</em></p>
<p>5. Now, since we are using stable source files, we will get the updated mpt driver to get around the bug were this driver incorrectly tries to attach to our RAID controller   Alternatively, you could edit the GENERIC kernel config file and comment out the mpt driver altogether.  Now, just build the kernel</p>
<p><em>cd /usr/src</em></p>
<p><em>make buildkernel KERNCONF=GENERIC</em></p>
<p>6. Now we just need to make our release, which will generate our install ISO files.</p>
<p><em>cd /usr/src/release</em></p>
<p><em>make release</em></p>
<p>7. ISO files are in /usr/obj/usr/src/release, pull them off then either just boot from an iDRAC with it, or burn to a CD.</p>
<p><a href="http://engineering.wayfair.com/files/2012/03/clioutput.jpg"><img class="alignnone size-full wp-image-683" title="clioutput" src="http://engineering.wayfair.com/files/2012/03/clioutput.jpg" alt="" width="1112" height="190" /></a></p>
<p>For good measure, we installed <a href="http://www.iozone.org/">iozone </a>on the server in an attempt to stress test the RAID controller and drivers, and had no issues with the drivers or errors in the messages.log file. It&#8217;s worth noting that the stats below are based on an 8x 15k SAS disk RAID 5 array.</p>
<p><a href="http://engineering.wayfair.com/files/2012/03/diskread.jpg"><img class="alignnone size-full wp-image-684" title="Disk Read Performance" src="http://engineering.wayfair.com/files/2012/03/diskread.jpg" alt="" width="478" height="287" /></a><a href="http://engineering.wayfair.com/files/2012/03/diskwrite.jpg"><img class="alignnone size-full wp-image-685" title="Disk Write Performance" src="http://engineering.wayfair.com/files/2012/03/diskwrite.jpg" alt="" width="478" height="285" /></a></p>
<p>&nbsp;</p>
<p>We hope that you are as excited about the new line of servers as we are, and that this article is helpful to not only those trying to get FreeBSD running on Dell&#8217;s new hardware, but also for those that need a custom install CD for other reasons as well.  As always, thanks for reading.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/freebsd-9-0-on-dell-poweredge-12g-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Moving Constants out of APC and into CDB</title>
		<link>http://engineering.wayfair.com/moving-constants-out-of-apc-and-into-cdb/</link>
		<comments>http://engineering.wayfair.com/moving-constants-out-of-apc-and-into-cdb/#comments</comments>
		<pubDate>Wed, 14 Mar 2012 13:27:41 +0000</pubDate>
		<dc:creator>Jason</dc:creator>
				<category><![CDATA[Web Performance]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=618</guid>
		<description><![CDATA[As has been discussed at great length on this blog recently, performance is a key part of the work we do here at Wayfair.  Recently, we’ve been putting a lot of extra effort into our technology developments to make our &#8230; <a href="http://engineering.wayfair.com/moving-constants-out-of-apc-and-into-cdb/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As has been discussed at great length on this blog recently, performance is a key part of the work we do here at Wayfair.  Recently, we’ve been putting a lot of extra effort into our technology developments to make our pages load faster and more reliably.  One of the most recent releases to this end was an update to how we store a lot of our data in our various caching systems.  I am going to focus this post on the introduction of a new technology to our caching layers: CDB (Constant Database). <span id="more-618"></span>CDB is a key/value store built for quick lookup speeds, utilizing a perfect hash function.  CDB is a supported PHP extension that has been tested in a number of environments, generally outperforming other key/value store options (e.g. <a href="http://qdbm.sourceforge.net/benchmark.pdf">http://qdbm.sourceforge.net/benchmark.pdf</a>).  Our interest in adapting another caching mechanism stemmed from some issues we were having with APC (Alternative PHP Cache).  APC is a powerful local opcode cache, but in general the APC User Cache does not perform very well under heavy load.   Our performance metrics were affirming this fact, so we needed to replace it with some other system.  CDB boasts great performance metrics under heavy load, so we went ahead and mapped out a plan to incorporate CDB into our set of caching technologies.</p>
<p>At the onset of this project our cache system was essentially a two-tiered system with local APC caching on all servers, backed up by remote memcached servers.  One benefit of APC is that during code execution and page loads we can very easily set and get values in cache, which makes its use very dynamic.  CDB works very well on heavy load for just read operations, but fails if multiple processes attempt to update or write new key/value pairs on the fly.  Therefore we envisioned a system in which both APC and CDB were utilized as local cache options, with memcache as a backup for both.  CDB would handle our constants, which in general are static and do not change often.  We update them once an hour with an asynchronous job, which keeps them fresh enough for our needs.    If CDB could handle the constant load well, we planned to move our language resources (how we localize our site for other countries) over from APC to CDB as well and observe how performance was affected.</p>
<p>While the initial move of constants to CDB did not show a huge performance improvement s, we did notice some small gains once we added language resources to CDB.  On top of a marginal decrease in load time, CDB performed <em>much </em>better under load, showing significantly less periodic behavior than what we had seen previously with APC.</p>
<p>Below are graphs of homepage performance over an 8 hour window, before and after our change.  The huge drop seen in the first graph is due to clearing APC on our servers.  Clearing APC on a somewhat regular basis became necessary to keep our site performance stable under load.  With CDB taking over a huge portion of the storage that lived in APC previously, we have achieved a level of stability that was not possible with the APC user cache.</p>
<p>Before (November):</p>
<p><a href="http://engineering.wayfair.com/files/2012/03/Graph1.png"><img class="alignnone size-medium wp-image-664 aligncenter" src="http://engineering.wayfair.com/files/2012/03/Graph1-300x198.png" alt="" width="300" height="198" /></a></p>
<p>After (February):</p>
<p><a href="http://engineering.wayfair.com/files/2012/03/Graph2.png"><img class="alignnone size-medium wp-image-665 aligncenter" src="http://engineering.wayfair.com/files/2012/03/Graph2-300x215.png" alt="" width="300" height="215" /></a></p>
<p>Ignoring the blue line from the initial graph, what we see here is twofold.  First off, over the course of a couple months, our team has made huge strides to improve the performance of our servers and our code, decreasing the overall load time for the Wayfair.com homepage.  CDB contributed to this vast decrease in load times, but many other factors/projects contributed as well.  CDB’s main contribution comes with improved stability and reliability for our sites.  Moving forward, we have more ideas in the works to make our pages faster and more stable, and we plan to continue our efforts in this area well into the future.  As one of our mottos states very clearly, we are never done.</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/moving-constants-out-of-apc-and-into-cdb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software Adaptations &#8211; Wayfair&#8217;s Partners</title>
		<link>http://engineering.wayfair.com/software-adaptations-wayfairs-partners/</link>
		<comments>http://engineering.wayfair.com/software-adaptations-wayfairs-partners/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 17:01:55 +0000</pubDate>
		<dc:creator>Bernardo</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=247</guid>
		<description><![CDATA[You would think data replication is a piece of cake these days given all the advances in database technology, and that’s true for the most part when you’re dealing with databases of the same type, but when you have to &#8230; <a href="http://engineering.wayfair.com/software-adaptations-wayfairs-partners/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>You would think data replication is a piece of cake these days given all the advances in database technology, and that’s true for the most part when you’re dealing with databases of the same type, but when you have to replicate parts of your product catalog with other companies, things get a bit tricky. At Wayfair Engineering we’ve figured out how to make it happen by creating a great software solution that keeps our retail partner operations working like a well-oiled machine.<span id="more-247"></span></p>
<p>The biggest challenge we face is the number of different ways of communication and data formats we have to support to accomplish the integration with our partners. For example, with Amazon we build XML messages and send them using web services, for Walmart we create XML transaction files to upload via FTP, for Sears we submit XML transactions via REST API services, for eBay we use SOAP messages over HTTPS, and for Buy.com we generate tab-delimited text files and send them via FTP.</p>
<p>We’ve established a baseline of data structures and processing steps that we can use as a starting point for each new partner integration. But we didn’t get there overnight or even close to the beginning of this endeavor. For our first few partnerships we tried to accommodate their specifications independently, and that’s when we realized we wouldn’t get too far if we kept doing it that way forever. So we started building a common set of rules and data structures to use as a foundation for future integrations.</p>
<p>We also started building consolidated services designed to provide the necessary data for all our partners without duplicating efforts. For example, when we had about six partnerships already integrated, we had six different systems for preparing inventory information, and in many cases we were retrieving the same exact data six times every night. This was obviously overwhelming our servers and limiting our ability to scale and grow our partnerships. So we created a single system to function as an inventory data service for all partners. Before the improvements the individual inventory processes took between 8 and 16 hours to run. After the consolidation, we got the whole process done in under 4 hours. Eventually we created similar services for price change tracking, product variations, kits, etc. We now have a collection of services available for any new retail partner integration. Bring them on!</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/software-adaptations-wayfairs-partners/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eliminate Implicit Conversions</title>
		<link>http://engineering.wayfair.com/eliminate-implicit-conversions/</link>
		<comments>http://engineering.wayfair.com/eliminate-implicit-conversions/#comments</comments>
		<pubDate>Wed, 29 Feb 2012 20:28:03 +0000</pubDate>
		<dc:creator>Rukaiya</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[DBA]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=630</guid>
		<description><![CDATA[At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways &#8230; <a href="http://engineering.wayfair.com/eliminate-implicit-conversions/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways to have our queries run faster – by maintaining indexes, optimizing queries and procedures, creating any missing indexes based on query usage, generating statistics on currently running queries, and filtering out queries with top CPU usage, among other improvements. Of late, we’ve been trying to eliminate any implicit data type conversions that happen at runtime. Implicit data type conversions come with cost, especially when the conversion is performed at the column side of the query – not the literal side. We have had scenarios where for high volume processing jobs (processing millions of records) we had index scan execution on queries due to implicit conversions. A simple demonstration of an implicit conversion is: <strong><em>WHERE a.OrderID = b.OrderNo</em></strong>; a.OrderID being varchar(30) and b.OrderNo is nvarchar(30). Here the execution plan would do an implicit cast to nvarchar(30) and would perform an index scan operation on the millions of records – with you waiting endlessly for the query or job to finish.<span id="more-630"></span></p>
<p>Needless to say, in an in-house development environment like ours, it is difficult to hammer in the point about matching column types. Especially when there is no fool proof method in SQL Server, like Oracle’s PL/SQL %TYPE syntax which guarantees that the parameters’ type matches the type of the corresponding column.</p>
<p>We looked at methods to find these troublemakers, and developed a query which uses SQL Server’s Dynamic Management Views (DMVs) – sys.dm_exec_cached_plans and sys. dm_exec_query_stats. The query can find a query in the cache that performs implicit conversions, and gives us all of the below mentioned metrics along with the physical operation being performed due to the implicit conversion, whether it does an index scan, full table scans, etc. We were only interested in insert, update, delete, select, and execute proc operations that cause a convert_implicit. We also filtered out any implicit conversions on the same data type but on different lengths. For example, implicit conversions from varchar(256) to varchar(200) were filtered.</p>
<p><strong>Metrics from the Query:</strong></p>
<p>-  SQL Query – the sql statement that causes the convert implicit to happen.<br />
-  SQL Type – the kind of SQL – whether &#8216;SELECT&#8217;, &#8216;INSERT&#8217;, &#8216;INSERT EXEC&#8217;, &#8216;UPDATE&#8217;, &#8216;EXECUTE PROC&#8217;, &#8216;DELETE&#8217;<br />
-  Databasename where the implicit conversion took place<br />
-  Schemaname – the schema where the table is stored<br />
-  Tablename – the table involved in the implicit conversion<br />
-  Columnname – the column involved in the implicit conversion<br />
-  ConvertFrom – tells you the implicit conversion, converts the column from this datatype<br />
-  ConvertTo – tells you the implicit conversion, converts the column to this datatype<br />
-  ConvertFromLength &#8211; tells you the implicit conversion, converts from this datatype length<br />
-  ConvertToLength &#8211; tells you the implicit conversion, converts to this datatype length<br />
-  Scalarstring – gives the internal convert implicit statement, that does the conversion.<br />
-  PhysicalOp – tells you whether it was a Clustered Index Scan, Index Scan, Seek, TableScan<br />
-  EstimateRows – estimated # of rows that the query affects<br />
-  EstimateIO – estimated IO<br />
-  EstimateCPU – estimated CPU<br />
-  AvgRowSize – average number of rows the query expects to return<br />
-  EstimatedTotalSubtreeCost – estimated cost of the query<br />
-  Usecounts – the number of times the query plan/cache object is used since its inception<br />
-  size_in_bytes – number of bytes consumed by the cache object<br />
-  Object Type – whether the sql query is a proc or SQL statement<br />
-  exec_dbname – database where the query was actually run<br />
-  exec_schemaname – schema where the query was actually run<br />
-  exec_objectname – object name (stored procedure) where the query was actually run<br />
(for eg: a stored procedure in one database DB_1, can run a query against a table in another database DB_2, where the implicit conversion takes place)<br />
-  query_plan – compile time show plan representation of the query execution plan.</p>
<p>We’ve been able to identify several queries in our environment that were doing implicit conversions and saw a 1000x performance improvement in query execution times, with CPU usage humming back to normal after tuning them.</p>
<p>Implicit conversions probably aren’t the reason your site is having performance problems, but once you’ve taken care of the low hanging fruit (index maintenance, statistics, etc.), you need to take DB optimization to the next level and look for these types of queries which hopefully can be easily refactored (perhaps using temp tables) to give you that extra bit of speed.</p>
<p><a title="SQL Script " href="http://engineering.wayfair.com/wp-content/files/ImplicitConversion.sql">Download SQL Script Here</a></p>
<p>We’re always looking to squeeze out that last bit of performance, so we’ll let you know what needle in the haystack we find next.</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/eliminate-implicit-conversions/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Recommendations with Markov Clustering</title>
		<link>http://engineering.wayfair.com/recommendations-with-markov-clustering/</link>
		<comments>http://engineering.wayfair.com/recommendations-with-markov-clustering/#comments</comments>
		<pubDate>Thu, 23 Feb 2012 16:26:40 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[recommendations]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=462</guid>
		<description><![CDATA[Our story begins in Holland in 1997, where a researcher named Stijn van Dongen, who is pretty good at Go, has a 5-minute flash of insight into modeling flows with stochastic matrices.  He writes a thesis about it and makes &#8230; <a href="http://engineering.wayfair.com/recommendations-with-markov-clustering/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Our story begins in Holland in 1997, where a researcher named <a title="Stijn van Dongen's W3 node" href="http://www.micans.org/stijn/index.html">Stijn van Dongen</a>, who is pretty good at <a title="Game of Go" href="http://en.wikipedia.org/wiki/Go_%28game%29">Go</a>, has a <a title="mcl discovery process description" href="http://www.micans.org/mcl/index.html">5-minute flash of insight into modeling flows</a> with stochastic matrices.  He writes a <a title="Stijn van Dongen's thesis link" href="http://www.library.uu.nl/digiarchief/dip/diss/1895620/full.pdf">thesis</a> about it and makes a <a title="MCL software page" href="http://www.micans.org/mcl/index.html">toolkit</a> called MCL with a free software license.</p>
<p>Flash forward to late 2011.  It turns out that MCL is pretty useful if you are trying to sell home goods on the internet, and perhaps other types of goods as well.   The search and recommendations team at Wayfair has just launched a simple recommender component, as described <a title="Simple recommendations machine article" href="http://engineering.wayfair.com/recommendations-with-simple-correlation-metrics-on-implied-preference-data/">here</a>.  Our system is working pretty well and giving the people something like what they want, but we suspect we can find more interesting connections among people and things than the ones we are finding. <a title="Greg Padowski bio link" href="/author/gpadowski">Greg</a> and I are reading academic and industry research papers, when Greg finds Stijn&#8217;s research and MCL. We give it a try, and our recommendations improve.<span id="more-462"></span></p>
<p>That&#8217;s the short version of the story.  But we think it may interest some readers if we describe our thought process a little further.</p>
<p>There&#8217;s a passage in <a title="Philipp Janert's home page" href="http://www.beyondcode.org/">Philipp Janert</a>&#8216;s book <a title="Data Analysis with Open Source Tools" href="http://shop.oreilly.com/product/9780596802363.do">Data Analysis with Open Source Tools</a>, where he&#8217;s describing the state of machine learning, classification schemes in particular:</p>
<blockquote><p>The difficulty of developing some recommendations that work in general and for a broad range of application domains may also explain one particular observation regarding classification: the apparent scarcity of spectacular, well-publicized successes. Spam filtering seems to be about the only application that clearly works and affects many people directly. Credit card fraud detection and credit scoring are two other widely used (if less directly visible) applications. But beyond those two, I see only a host of smaller, specialized applications. This suggests again that every successful classifier implementation depends strongly on the details of the particular problem—probably more so than on the choice of algorithm. (page 424)</p></blockquote>
<p>More generally, all this machine learning, when you first hear about it, sounds as if it&#8217;s going to solve all your problems by crunching unattended through a bunch of simulations, cluster-finders and test/training sequences on a big computer in your back room.  But unless you&#8217;re working in one of the magical domains where the textbook techniques &#8216;just work&#8217;, you&#8217;re probably not going to be able to simply download a software library and plug your data in.  You&#8217;re going to need to get your hands into the seemingly shapeless muck, and try different approaches until you&#8217;ve molded it into a form where it does something useful for you.  The people who won the Netflix contest had an &#8216;everything-but-the-kitchen-sink&#8217; approach.  I suspect in the end, so will we.  Be warned, though: once you get past the naive approaches, you can try some very sophisticated techniques and move the needle of recommendations quality only a little.  But don&#8217;t be discouraged.  The sense of the community, and certainly our own experience, is that with a large data set, small improvements can make a big difference in relevance and ultimately profit.</p>
<p>So, the kitchen sink it is.  But where to start?  We can&#8217;t just choose our techniques at random.  We need to make some educated guesses as to what might be helpful.  Off to the library!  Or at least the internet.  There&#8217;s a pretty good general discussion of advanced techniques in the recommendations space <a title="link to paper &quot;Matrix Factorization Techniques for Recommender Systems&quot;" href="http://www2.research.att.com/%7Evolinsky/papers/ieeecomputer.pdf">here</a>, where the authors divide the families of approaches into the Pandora and Netflix types: content filtering and collaborative filtering, respectively.  Translated into our world, let&#8217;s call these Homeflix and Homedora (h/t <a title="Joss &amp; Main, which is part of Wayfair, and of which John Mulliken is a co-founder" href="https://www.jossandmain.com/">John Mulliken</a> for those labels).  Which one?   No doubt some sort of hybrid in the end, as<a title="Survey of collaborative filtering techniques" href="http://www.hindawi.com/journals/aai/2009/421425/"> these nice people suggest</a>.  For now let&#8217;s start with collaborative filtering, so Homeflix.  It would be great if we could use one of the already well-researched recommendations models based on <a href="http://en.wikipedia.org/wiki/Singular_value_decomposition">singular value decomposition (SVD)</a>, which take explicit customer ratings of products and in some cases implicit customer preferences, to learn latent customer preferences and product attributes. Then it would just be a matter of running stochastic gradient descent or alternating least squares on top of a good linear algebra package and waiting a few hours for the customer preference and product attribute matrices to converge.  Yehuda Koren and Robert Bell describe the relevant ideas in <a title="Advances in collaborative filtering by Yehuda Koren and Robert Bell" href="http://research.yahoo.com/files/korenBellChapterSpringer.pdf">this article</a>, which is chapter 5 of <a title="Recommender Systems Handbook" href="http://www.springer.com/computer/ai/book/978-0-387-85819-7">this excellent book</a>. Something along those lines is apparently state of the art for Netflix. IBM has even worked out a <a title="matrix factorization technques for recommender systems, distributed stochastic gradient descent" href="http://www.almaden.ibm.com/cs/people/peterh/dsgdTechRep.pdf">way to distribute this type of work if your data is too big to fit in memory</a>, at least for SVD models based on purely explicit ratings. But all that gives pretty poor results when we try to shoehorn our data into it.</p>
<p>Our data just doesn&#8217;t seem to be shaped like the data that works well with these approaches.  How is it shaped?  Well, it turns out, a lot like the proteins Stijn van Dongen was analyzing, which he depicts like so:<br />
<img src="http://www.micans.org/mcl/img/fa75.png" alt="" /></p>
<p>As you can see from that progression, he&#8217;s gradually eliminating the wispy, more tenuous connections between clusters of proteins defined by the sturdier, weightier links.  We need to do something a lot like that.  To be specific, if you recall our definition of &#8216;flagging&#8217; an item in the previous article, then you can model a flag as an edge of weight &#8217;1&#8242; in a graph like the one depicted.</p>
<p>And lo and behold, when we started recommending things based on connections as established by this process, our results improved again.  Better furniture shopping through protein analysis!  Who would have figured that?  This is perfect for e-commerce people in a hurry.  Both the science and the primary engineering (the toolkit with all its switches) are done!  Greg just had to realize it would help.  Of course, that&#8217;s a bit of a black art in itself.</p>
<p>I wonder if the author of the toolkit might take a dim view of all this, based on what he says <a title="Stijn van Dongen's portal pain rant" href="http://www.micans.org/stijn/views/portalpain.html">here</a>. Where is the line between helpfully eliminating the useless irrelevant nonsense from the lists of things we present to our users, and pre-emptively discarding things they might have been interested in?  We&#8217;ll have to be careful about how much if any of all this we use in our internal search results, I suppose.  But for showing similar items in scrollers down the page, or cross-selling at judiciously chosen spots, this will be a big help.</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/recommendations-with-markov-clustering/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recommendations with simple correlation metrics on implied preference data</title>
		<link>http://engineering.wayfair.com/recommendations-with-simple-correlation-metrics-on-implied-preference-data/</link>
		<comments>http://engineering.wayfair.com/recommendations-with-simple-correlation-metrics-on-implied-preference-data/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 18:59:54 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[recommendations]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=375</guid>
		<description><![CDATA[When you sit down to write a recommendations system, there are quite a few  well-practiced techniques you can use, and it&#8217;s difficult to know in advance how well they are going to work out when applied to your data.  Thanks &#8230; <a href="http://engineering.wayfair.com/recommendations-with-simple-correlation-metrics-on-implied-preference-data/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When you sit down to write a recommendations system, there are quite a few  well-practiced techniques you can use, and it&#8217;s difficult to know in advance how well they are going to work out when applied to your data.  Thanks to the <a title="Netflix prize link" href="http://www.netflixprize.com/">Netflix prize</a>, which was initiated in 2006 and awarded in 2009, a lot has been written on recommender systems for the Netflix data set.  If you happen to have a product catalogue similar to Netflix&#8217;s (those movies from the 60s are still being viewed and rated), and your users happen to have scored it with a 5-point explicit ratings system, there are some awesome advanced techniques and frameworks that you can take for a spin.  Does that sound like you? Show of hands?  I didn&#8217;t think so.  Our data is certainly nothing like that.</p>
<p><span id="more-375"></span>What to do? I decided to start with something simple, before our inevitable trek into the forests of matrix factorization, stochastic gradient descent, Markov clusters and other impressive-sounding stuff: more on all that in subsequent posts.</p>
<p>So <a title="George Carrette" href="/author/gcarrette/">George</a> and I began with the most obvious available literature, the O&#8217;Reilly Book<a title="Programming Collective Intelligence by Toby Segaran" href="http://shop.oreilly.com/product/9780596529321.do"> Programming Collective Intelligence</a> by Toby Segaran (if there&#8217;s an O&#8217;Reilly book on it, you must be able to just do it, right?), and with the simplest data set we could imagine: a set of relations between users and items, which we interpret as the user&#8217;s preference for the item.  This might be a item-view, an item-purchase (unless closely followed by a return), or any other event we think might come in handy.  We need a general term for this activity, for discussion purposes, not &#8216;view&#8217; or &#8216;purchase&#8217;: let&#8217;s call it &#8216;flagging&#8217;.  This data is most like the book&#8217;s del.icio.us example: people either linked to something or did not.  We&#8217;re also going to limit ourselves to the simplest possible tool, a sql interface to something that is more or less a bunch of tables.  We have tried the following, or at least parts of it, on MS SQL Server, <a title="Link to IBM Netezza" href="http://www.netezza.com">Netezza</a> and <a title="Link to Apache Hive, a Hadoop component" href="http://hive.apache.org/">Hive</a>.</p>
<p>It&#8217;s impractical for us to load our entire data set into memory, or even to represent the relationships of all users and all items as explicit data, so we look for a sparse durable representation: a relational table in which a record means that a user flagged an item within a certain context. The context depends on the data source: a user viewed an item in a particular month/week/day/session, a user purchased an item in a particular month/week/day/order, etc.</p>
<p>Now let&#8217;s compute the following:</p>
<ol>
<li>For each user-item pair: how many times did the user flag the item?</li>
<li>For each user: how many items were flagged? how many total flag events?</li>
<li>For each pair of items flagged by at least one user: how many users flagged both items? We call this &#8216;overlap&#8217;.  We exclude outliers at this point: users with too few items flagged, or too many flag events.</li>
<li>For each item: how many users flagged it?  We call this &#8216;popularity&#8217;.</li>
</ol>
<p>Now we&#8217;re ready to compute any of the correlation metrics in the book.  Which ones make sense?  Not Pearson correlation or Euclidean distance.  You can compute them well enough, but try to imagine what they mean in the context of this data. What kind of straight line, or triangle&#8217;s hypotenuse, are you fitting these data points to?  None that I can picture.  The data is too much of a degenerate case of anything to which those concepts might usefully apply: you get a lot of scores of exactly &#8217;1&#8242; or &#8217;0&#8242; or <img src="http://engineering.wayfair.com/wp-content/plugins/wpmathpub/phpmathpublisher/img/math_993.5_52ef8013c05ff62d72ce781af35d4c9e.png" style="vertical-align:-6.5px; display: inline-block ;" alt="sqrt{2}" title="sqrt{2}"/> .  Raw frequency makes sense, but it&#8217;s a bit of a blunt instrument.  Prior to our setting up this system, there was something on the site that essentially used frequency along the lines we&#8217;re talking about here.  It overvalued very popular things, to be sure, but in the end people clicked and bought things off those recommendations, so it wasn&#8217;t terrible.  But what about <a title="Jaccard coefficient wikipedia article" href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard coefficient</a> (sometimes called Tanimoto distance)?  J(A,B) =   <img src="http://engineering.wayfair.com/wp-content/plugins/wpmathpub/phpmathpublisher/img/math_983_a275c5ee5d81ca057e982132848cbd20.png" style="vertical-align:-17px; display: inline-block ;" alt="{A {inter} B} / {A {union} B}" title="{A {inter} B} / {A {union} B}"/> .  Sounds plausible.  We&#8217;ll interpret the Jaccard coefficient of our items A and B as 1 minus (overlap of A and B)/(popularity of A + popularity of B &#8211; overlap of A and B).  Makes sense to me, and it&#8217;s straightforward in sql!   Our final table (let&#8217;s call it &#8216;item_affinity_jaccard&#8217;) will have at least 3 columns: the id of A, the id of B, and the coefficient.</p>
<p>We placed those results in a test harness, and the results were visibly, obviously better than the frequency-based thing that was there before.  But could we trust our eyes?  Hard to know without trying it.  We replaced it on the site, and clickthrough rose 18%.  That we can trust!  If you&#8217;re starting out with recommendations, I&#8217;d say give that a try.</p>
<p>For extra points, let&#8217;s move on to a less degenerate case.  We&#8217;ll add a new product C to our A and B, and observe that if A is connected to B, and B is connected to C, then A is connected, in a way, to C.  This will be quick, dirty, and not scalable at all (scalable in the sense that, if you wanted to add a D, E or F, you would quickly be out of luck). But if you&#8217;ve got this far, and you&#8217;ve never gone through an exercise to convince yourself that graph processing gets ugly fast when your only tools are bunch of relational tables, try the following:</p>
<ol>
<li>Summarize previous results in a table: for each item, compute the count of users who flagged it, total flags, and the number of other items for which you can compute the Jaccard coefficient (let&#8217;s call this the &#8216;recommendation count&#8217;, and these items &#8216;recommendable items&#8217;).</li>
<li>Make an item_relationship_step2 table that contains all the connected pairs.  Avoid combinatorial explosion by only including items where recommendation count is greater than 0 and less than something that excludes items for which you already have so many direct pairs that you don&#8217;t really need the farther-away things.</li>
<li>Join item_affinity_jaccard to itself and then to item_relationship_step2, and compute the two-hop distance in whatever way you think best.</li>
</ol>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/recommendations-with-simple-correlation-metrics-on-implied-preference-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Puppet provider for python library installation</title>
		<link>http://engineering.wayfair.com/puppet-provider-for-python-library-installation/</link>
		<comments>http://engineering.wayfair.com/puppet-provider-for-python-library-installation/#comments</comments>
		<pubDate>Tue, 24 Jan 2012 16:33:35 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Code Deployment]]></category>
		<category><![CDATA[Puppet Python Pip Setuptools]]></category>

		<guid isPermaLink="false">http://engineering.wayfair.com/?p=352</guid>
		<description><![CDATA[We run a python/Tornado-based recommendations service behind the scenes at Wayfair.  As part of our code deployments, we need to install various third-party libraries to our Tornado servers. The python tools that do this kind of thing are a bit &#8230; <a href="http://engineering.wayfair.com/puppet-provider-for-python-library-installation/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We run a python/Tornado-based recommendations service behind the scenes at Wayfair.  As part of our code deployments, we need to install various third-party libraries to our Tornado servers. The python tools that do this kind of thing are a bit half-baked, so we paper over their inadequacies with puppet.</p>
<p>A while back a fellow name <a title="Richard Crowley's twitter stream" href="https://twitter.com/#!/rcrowley">Richard Crowley</a> wrote a <a title="Puppet provider for python's pip tool" href="https://github.com/rcrowley/puppet-pip">puppet-pip provider</a>, which seems to have been folded into Puppet 2.7, or replaced by a module in Puppet 2.7, or something like that. So in a sense his little project is dead. But <a title="Karthick Duraisamy Soundararaj" href="/author/ksoundararaj/">Karthick</a> on our team has resurrected a fork of it, a hybrid provider using subcommands of <a title="Setuptools (easy_install) project page" href="http://pypi.python.org/pypi/setuptools">setuptools</a> (easy_install) and <a title="pip project page" href="http://pypi.python.org/pypi/pip">pip</a> for different aspects of installation, version checking and uninstallation. We call it easypip (easypip.rb), and the forked project containing it is now up on <a title="Wayfair's modifications to puppet-pip" href="https://github.com/wayfair/puppet-pip">github</a>.  Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://engineering.wayfair.com/puppet-provider-for-python-library-installation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: engineering.wayfair.com @ 2012-05-17 01:39:02 -->
