DDD Southwest Session Notes 1 : Performance & Scalability
The first session of my day at DDD Southwest was a talk by Marc Gravell of Stack Overflow [http://stackoverflow.com/] about how they approach performance and scalability issues. As with any talk specific to a work place or specific site, your mileage may vary but the two tools he demonstrated could certainly be used anywhere.
Performance Myth #1: Adding a server will make your site faster. This is NOT TRUE. If you add a server to your web farm or cluster, your site will not get served any faster. Websites do not trouble the CPU very much. Indeed, StackOverflow runs at 10% CPU load give or take. That’s for a website serving ~7 million page views a day according to Quantcast [http://www.quantcast.com/stackoverflow.com].
Profiling When a performance-related issue is logged on a site, sometimes it is possible to use a coarse-grained event log or profiling tool to find and diagnose the issue. Most of the time, this is not possible.
Of course, the finer-grained the level of profiling on your live site, the greater the overheads and the slower the actual site. Beware net admins with guns. Especially NRA members (such at those at StackOverflow :-)
One tool written by the StackOverflow team to address this issue is called (MVC) MiniProfiler [http://miniprofiler.com/], a very lightweight, almost zero friction profiler for websites.
Install it into your site using nuGet
PM> Install-Package miniprofiler
This adds two new references and two new files to your web project. Simply uncomment the call to RenderIncludes in your layout.cshtml page to have MiniProfiler start working. On each page, MP throws in basic profile info in top left. Clicking on the MP tab will display the time for each function to complete before the rendering of the page.
MiniProfiler can also monitor database requests by wrapping your DbConnection object or DbContext object if your site uses Linq2SQL or the Entity Framework.
DbConn conn = new SqlConn(…); conn = new ProfiledDbConnection(conn, MiniProfiler……); // do something with profiled connection
When run MP now also displays the number of SQL commands sent to db and what they were (formatted nicely). It will also alert you if a page is running duplicate commands or other (n+1) scenarios which you can go back to improve. Look for the ! in the MP smart tag which will appear if this is happening. MP also has a share button (provider-based) for passing the profiling info to others.
Access to MP can be configured in App_Start/Miniprofiler.cs. By default, MiniProfiler appears onscreen only if the page request is to localhost but you can change that to something role-based, page-based etc.
- Q: MP should work with any MVC ViewEngine as it just wraps the ViewEngines much like the DB connections. See A-S/MP.cs for code. * Q: MP can’t really account for the ‘works on my machine’ / ‘fails on production environment’ scenarios * Q: MP also works with an AJAX update. MP adds extra timings for the async call down the left hand side fo the screen. * Q: You can use the DBProfiling side of it without the Web stuff. Ask the prfiling object to get the db data and then work with it. * Q: MP would work with webservices and CMD apps but you need to think about how and when MP would show its info. * Q: MP doesn’t currently support async multi-threaded stuff….
Introducing Dapper, A Read-Focused Data Helper class StackOverflow was originally built using LINQ2SQL in .NET 3.5 When .NET v4.0 came out, LINQ2SQL appeared to stall reasonably frequently. A database request would suddenly take 400ms and not 4ms. The LINQ library wasn’t open source, so SO had to diagnose and then figure out how to go around the problem?
- Tried their own sql generation trees * Tried just executing raw SQL rather than letting L2S generate it. That didn’t work either.
In the end they wrote their own data access stack called Dapper [http://code.google.com/p/dapper-dot-net/]. Heavily favouring reads. 1in500-ish commands in Stack Overflow are updates. The rest are reads. Dapper has a very similar syntax to LINQ Context queries but uses anon objects to pass parameters to queries. The speed increases in Dapper are the improvements in the materialiser (getting the field names etc and pushing them into the object) which appears to have been the issue in LINQ2SQL v4
- Q/A : Dapper objects are not connected to the LINQ context object so calling SubmitChanges doesn’t work. SO still uses L2S for writes mostly (although Dapper is starting to do writes also). * Q/A : Dapper wraps generic Connection objects. So it should work over any .net conn object as long as the syntax for that particular DB is covered (diff plsql syntax over t-sql etc) * Q/A : Dapper source includes perf tests against several other ORMs (run in release mode)
StackOverflow uses Redis [http://redis.io/] for caching. You could also use it for session state etc. Redis is a hi-perf key-value store.
In conclusion, Marc offered some simple rules for improving the performance of your data access routines:
- Learn SQL - dont rely on LINQ * Keep it simple * Cache , cache,cache * Investigate other tools – other ORMs, write your own? e.g. a noSQL store for Caching, Session such as Redis * Check the serialization - how much is serialized? Can it be reduced etc.