About the author

J Sawyer is a developer based in Houston, TX and loves to write code, especially ASP.NET and other web-related stuff. He is currently working on implementing Team Foundation Server at a large energy company in Houston and is loving that too.

He also loves to ride his Yamaha FZ1. And sometimes his Ninja 650.

But he doesn't code and ride at the same time. That would be bad.

Linq vs. ADO.NET - Simple Query

April 16, 2008 12:32 AM

In my last blog post, I took a look at how Linq handles anonymous types. I also promised to do some performance comparisons between Linq and traditional ADO.NET code. Believe it or not, creating a "fair" test is not as easy as one would think, especially when data access is involved. Due to the nature of connection pooling, whichever method is first to be tested gets hit with the cost of creating the connection ... which skews the test. Yeah, I'm sure this is out there in the blogosphere, but I do like to do these things myself. Call it the Not-Invented-Here syndrome.

This particular test set is for a very simple query. I created a set of 4 methods to test for performance within a standard Windows Console Application, which should give an overall comparison of data access. All tests used the AdventureWorks sample database, with the statement (or its Linq equivalent) Select FirstName, LastName From Person.Contact. This is about as simple a query as you can get. From there, each method concatenated the two field results into a single string value ... The Linq test used an anonymous type going against a data class created with the Data Class designer. Data Reader Test 1 (DataReaderIndex) used the strongly-typed DataReader.GetString(index) ... and I did cheat a little with this one by hardcoding the index rather than looking it up before entering the loop (though this is how I'd do it in the "real world"). In previous tests that I've done, I've found that this gives about 10-20% better performance than DataReader[columnName].ToString() ... though that does include the "lookup" that I mentioned previously. Data Reader Test 2 represents the more common pattern that I've seen out there ... using DataReader[columnName].ToString(). Now, I'm not sure which of these methods Data Binding uses and, honestly, that's not in the test ... though, now that I think of it, it may be a good thing to test as well. Finally, I included a test for DataSets (TestDataSet) ... using an untyped DataSet. I've found (again, from previous tests) that this performs far better than a typed DataSet ... the typed DataSet gets hit (hard) by the creation/initialization costs. Before running any tests, I included a method called InitializeConnectionPool, which creates and opens a connection, creates a command with the Sql statement (to cache the access plan), calls ExecuteNonQuery and then exits. This is not included in the results, but is a key part of making sure that the test is as fair as possible. Additionally, all of the tests access the connection string in the same way ... using the application properties. In looking at the code generated by the LinqToSql class, this is how they get the connection string. This ensures that the connection string for all methods is the same, which means that the connection pools will be the same.

To actually do the test, I called each method a total of 30 times from the applications Main, each function in the same loop. This would help to eliminate any variances. After running each test, I also called GC.Collect() to eliminate, as much as possible, the cost of garbage collection from the results.  I also closed all unnecessary processes and refrained from doing anything else to ensure that all possible CPU and memory resources were allocated to the test. One thing that I've noticed from time to time is that it seems to matter the order in which functions are called, so I made a total of 4 runs, each with a different function first. For each run, I tossed out the min and max values and then averaged the rest -- (total - min - max)/(numCalls -2). This gave me a "normalized" value that, I hoped, would provide a fair, apples-to-apples comparison. Each method had a set of 4 values, each with 30 calls, 28 of which were actually included in the normalized value. I then took the average of the 4 values. I know that sounds like an overly complex methodology ... and I agree ... but I've seen some weird things go on and some pretty inconsistent results. That said, in looking at the results, there was not a lot of difference between each of the 4 runs, which makes me feel pretty good about the whole thing.

So ... without further ado ... the results (values are in milliseconds):

Method Normalized Average
TestDataReaderIndex 56.64767857
TestLinq 75.57098214
TestDataSet 117.2503571
TestDataReaderNoIndex 358.751875


Now, I have to say, I was somewhat surprised by the TestDataReaderNoIndex results ... previous tests that I had done didn't show such a big difference between this and TestDataReaderIndex ... though I wonder if that has something to do with the way I did this test - hardcoding the indexes into TestDataReaderIndex. I'm not surprised that TestDataReaderIndex turned out the be the fastest. DataReaders have been, and still are, the absolute fastest way to get data from the database ... that is, if you do it using integer indexes. However, TestLinq didn't come that far behind and was certainly more performant than the untyped DataSet. So ... let's think about this for a second. The Linq collection that is returned is more like a DataSet than it is a DataReader. DataReaders are forward-only, read-only server-side cursors. Use them once and kiss them goodbye. Both the Linq collection and the DataSet allow random access and are re-startable ... and they are both updatable as well. I've had a lot of folks ask about the performance of Linq and now I can, without question and with all confidence, tell them that the performance is quite good.

Still, let's be honest ... the difference between the fastest and the slowest is a mere 300ms. Do you really think users will notice this?

UPDATE: You can download the code and the tests that I used for this at https://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=jdotnet&ReleaseId=948. If you get different results, I'd be interested to hear about it. Even more, I'd be interested in the methodology that you used to create the report.



Tags: , ,

.NET Stuff | Performance | Linq

Linq and Anonymous Types

April 14, 2008 9:20 PM

I've been playing with Linq quite a bit recently. I have to say ... it's some cool stuff and revolutionizes data access on the .Net platform. One of the things in Linq that I'm really fascinated with is anonymous types. These classes are created based on a Linq statement and only have the properties that you specified. They're nicely type-safe and work with IntelliSense. Beauty and goodness.

Now, for a time, I just played with them and used them without much thought about what's going on behind the scenes. But ... my curiosity got the better of me and I decided to dig a bit and see what's going on. And the best way to do this? Lutz Roeder's Reflector of course!

So first ... the code. Not much, pretty simple.

            using (DataClasses1DataContext dc = new DataClasses1DataContext())
            {
                var contactNames = from c in dc.Contacts
                                   select new { c.FirstName, c.LastName };
                foreach(var contactName in contactNames)
                {
                    Console.WriteLine(contactName.FirstName + contactName.LastName);
                }
            }

I could have made it even simpler ... remove the foreach loop. But that let's me know that all's well.

So ... what happens with the anonymous type? It's actually compiled in the assembly. Yup, that's right ... it's a compiled class, just like a case that you create. But there is some black voodoo majik going on and, I'm certain, some significant compiler changes to make this happen.

Here's the raw IL generated for the class (with attributes):

.class private auto ansi sealed beforefieldinit lt;<FirstName>j__TPar, <LastName>j__TPar>
    extends [mscorlib]System.Object
{
    .custom instance void [mscorlib]System.Diagnostics.DebuggerDisplayAttribute::.ctor(string) = 
         { string('\\{ FirstName = {FirstName}, LastName = {LastName} }') Type=string('<Anonymous Type>') } .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor()

And here's the C# version if the IL:

[DebuggerDisplay(@"\{ FirstName = {FirstName}, LastName = {LastName} }", Type="<Anonymous Type>"), CompilerGenerated]
internal sealed class <>f__AnonymousType0<<FirstName>j__TPar, <LastName>j__TPar>

 

If you had any doubt at all, the "CompilerGenerated" attribute pretty much says it all. All of the references to the anonymous type in the code are replaced by this class in the compiled IL. And the return value from the query? It's a generic class:
[mscorlib]System.Collections.Generic.IEnumerable`1<class <>f__AnonymousType0`2<string, string>>.

Pretty cool, eh?

Now I'm off to dig into the performance of these beasties when compared to a DataReader and a DataSet. Early results look promising, but I've got some work to do to make sure it's a valid and fair comparison.



Tags: , ,

.NET Stuff | Linq | Performance