At Wayfair, we are never done. And the DBA team here is a true example of it. We are constantly looking to improve performance and we rigorously tune our databases on a daily basis. We are always looking at ways to have our queries run faster – by maintaining indexes, optimizing queries and procedures, creating any missing indexes based on query usage, generating statistics on currently running queries, and filtering out queries with top CPU usage, among other improvements. Of late, we’ve been trying to eliminate any implicit data type conversions that happen at runtime. Implicit data type conversions come with cost, especially when the conversion is performed at the column side of the query – not the literal side. We have had scenarios where for high volume processing jobs (processing millions of records) we had index scan execution on queries due to implicit conversions. A simple demonstration of an implicit conversion is: WHERE a.OrderID = b.OrderNo; a.OrderID being varchar(30) and b.OrderNo is nvarchar(30). Here the execution plan would do an implicit cast to nvarchar(30) and would perform an index scan operation on the millions of records – with you waiting endlessly for the query or job to finish.
Needless to say, in an in-house development environment like ours, it is difficult to hammer in the point about matching column types. Especially when there is no fool proof method in SQL Server, like Oracle’s PL/SQL %TYPE syntax which guarantees that the parameters’ type matches the type of the corresponding column.
We looked at methods to find these troublemakers, and developed a query which uses SQL Server’s Dynamic Management Views (DMVs) – sys.dm_exec_cached_plans and sys. dm_exec_query_stats. The query can find a query in the cache that performs implicit conversions, and gives us all of the below mentioned metrics along with the physical operation being performed due to the implicit conversion, whether it does an index scan, full table scans, etc. We were only interested in insert, update, delete, select, and execute proc operations that cause a convert_implicit. We also filtered out any implicit conversions on the same data type but on different lengths. For example, implicit conversions from varchar(256) to varchar(200) were filtered.
Metrics from the Query:
- SQL Query – the sql statement that causes the convert implicit to happen.
- SQL Type – the kind of SQL – whether ‘SELECT’, ‘INSERT’, ‘INSERT EXEC’, ‘UPDATE’, ‘EXECUTE PROC’, ‘DELETE’
- Databasename where the implicit conversion took place
- Schemaname – the schema where the table is stored
- Tablename – the table involved in the implicit conversion
- Columnname – the column involved in the implicit conversion
- ConvertFrom – tells you the implicit conversion, converts the column from this datatype
- ConvertTo – tells you the implicit conversion, converts the column to this datatype
- ConvertFromLength – tells you the implicit conversion, converts from this datatype length
- ConvertToLength – tells you the implicit conversion, converts to this datatype length
- Scalarstring – gives the internal convert implicit statement, that does the conversion.
- PhysicalOp – tells you whether it was a Clustered Index Scan, Index Scan, Seek, TableScan
- EstimateRows – estimated # of rows that the query affects
- EstimateIO – estimated IO
- EstimateCPU – estimated CPU
- AvgRowSize – average number of rows the query expects to return
- EstimatedTotalSubtreeCost – estimated cost of the query
- Usecounts – the number of times the query plan/cache object is used since its inception
- size_in_bytes – number of bytes consumed by the cache object
- Object Type – whether the sql query is a proc or SQL statement
- exec_dbname – database where the query was actually run
- exec_schemaname – schema where the query was actually run
- exec_objectname – object name (stored procedure) where the query was actually run
(for eg: a stored procedure in one database DB_1, can run a query against a table in another database DB_2, where the implicit conversion takes place)
- query_plan – compile time show plan representation of the query execution plan.
We’ve been able to identify several queries in our environment that were doing implicit conversions and saw a 1000x performance improvement in query execution times, with CPU usage humming back to normal after tuning them.
Implicit conversions probably aren’t the reason your site is having performance problems, but once you’ve taken care of the low hanging fruit (index maintenance, statistics, etc.), you need to take DB optimization to the next level and look for these types of queries which hopefully can be easily refactored (perhaps using temp tables) to give you that extra bit of speed.
We’re always looking to squeeze out that last bit of performance, so we’ll let you know what needle in the haystack we find next.
If this type of thing interests you, please visit our careers page, and join the team!