About the author

J Sawyer is a developer based in Houston, TX and loves to write code, especially ASP.NET and other web-related stuff. He is currently working on implementing Team Foundation Server at a large energy company in Houston and is loving that too.

He also loves to ride his Yamaha FZ1. And sometimes his Ninja 650.

But he doesn't code and ride at the same time. That would be bad.

Linq Performance Part II - Filtering

October 22, 2008 1:43 PM

Continuing on the previous topic of Linq Performance … I’m now doing something a bit more interesting than just a “Select From”. All of the key conditions (machines, specs, methodology, blah blah blah) remain the same; no changes at all there. However, I’ll be digging around in filtering this time, comparing filtering between ADO.NET, Linq to SQL and, just for giggles, Linq to Objects and Linq to ADO.NET. Based on the previous results, I’m not using constructors for the custom classes, but rather property binding. The performance of the full property binding (rather than fields) is good and, let’s be honest here, that’s how you should be doing it anyway.

First, an overview of the different types of filters that I’m going to be running:

Find By First Letter: This will do a search/filter for Persons by the first letter of their first name … a LIKE query. Rather than getting the database all optimized and the query results cached, I select the first letter randomly from a cached copy of the table, but this is not included in the results. Yes, the query plan will be cached, but that’s normal and a part of the overall performance system that we want to test anyway.

Find By Non Key: This does a search/filter for Persons by First Name and Last Name. This uses an equality operator and will (most likely, though I didn’t check) return a single row. As before, the First Name/Last Name combination comes from a cached copy of the table and the values are randomly selected. As with the previous test, the query plan is cached and, again, that’s a normal thing.

Find Key: The last test does a search for a row by the primary key value. This does return a single row in all cases. The key to search for is randomly selected from a cached copy of the table.

For all of the tests, actual, valid values were used – hence the random selection from a cached copy of the table. Originally, this was not the case, but I quickly found that, in particular, the Linq tests that returned a single item would throw an exception if nothing was found – though this is likely because I used to First() method on the query return (the exception said that the list was empty). This would not have been an issue if I didn’t call this method and, instead, enumerated over the collection of 0 or 1 with the return.

For each of the test batches, five different methodologies were used.

Data View: This uses an ADO.NET DataView on an existing DataTable to do the filtering. The creation and filling of the table was not included in the test result. This is a method that you would use for cached data and tests the filtering capabilities of the DataView on its own.

DataSet FIlter: This uses the Filter() method to retrieve a subset of the rows. As with the previous, the table that is used comes prefilled.

Linq Detached: Essentially, this is Linq to Objects. The results come from the database and are then detached from the database, putting the results into a generic List<> class. As with the previous, creating and filling the list is not included in the results.

Linq To ADO: For something different, this filters a DataTable using Linq. Again, this is something that you’d do with a cache. And, yet again (I’m beginning to feel like a broken record here), the filling of the DataTable that is used for this is not included in the results.

Linq To Sql: This uses pure Linq to Sql, retrieving the results from the database and then returning the results. In this case, the cost of actually hitting the database is included in the results. As you can, I’m sure, imagine, this is the only test where the query plan caching made any difference at all; the rest of the tests were working on data in memory.

I did not include results where a DataSet returns results directly from the database; the performance characteristics of this with respect to the Linq To Sql tests would be the same as in the previous selection tests.

So, without further ado, the results:

Test Batch Data View DataSet Filter Linq Detached Linq to ADO Linq to Sql
Find By First Letter 25.687 23.679 49.844 36.979 28.084
Find By Non Key 34.516 138.782 9.066 27.020 12.787
Find Key 17.115 0.162 6.200 7.029 9.064
Average 25.773 54.208 21.703 23.676 16.645

image

I have to say, I found the results quite interesting. There are some pretty wide variations in the methods, depending on what you are doing. I was also surprised to see that the Find By First Letter had the worst performance for Linq Detached … this was not what I was expecting and not something that I had seen in previous test runs on a different machine (but that was also testing against a Debug build rather than a Release build). The average time for the DataSet Filter was very highly impacted by the Find By Non Key batch … this is just really bad with DataSets. Find Key for the dataset was very fast though … so much so that you can’t even see the bar in the chart; this is due to the indexing of the primary key by the DataSet. Linq Detached was hurt by the Find By First Letter batch; my theory is that this is due to string operations, which have always been a little on the ugly side. Other than that, the find performance of Linq to Objects was quite good and finding by key and by non-key fields were little different – and this difference would, again, most likely be due to the string comparison vs. integer comparisons.



Tags: ,

Linq | Performance

Linq Performance - Part I

October 8, 2008 7:00 PM

Well, it’s been a while since I did my initial review of some simple Linq performance tests. Since then, I’ve done a bit more testing of Linq performance and I’d like to share that. The results are enlightening, to say the least. I did this because I’ve gotten a lot of questions regarding the performance of Linq and, in particular, Linq to Sql – something that is common whenever there is a new data-oriented API. Now, let me also say that performance isn’t the only consideration … there are also considerations of functionality and ease of use, as well as the overall functionality of the API and its applicability to a wide variety of scenarios. I used the same methodology that I detailed in this previous post.

Now, all of the tests were against the AdventureWorks sample database’s Person.Contact table with some 20,000 rows. Not the largest table in the world, but it’s also a good deal larger that the much-beloved Northwind database. I also decided to re-run all of the tests a second time on my home PC (rather than my laptop) as the client and one of my test servers as the database server. The specs are as follows:

Client DB Server
AMD Athlon 64 X2 4400+ AMD Athlon 64 X2 4200+
4 GB RAM 2 GB RAM
Vista SP1 x64 Windows Server 2008 Standard x64
Visual Studio 2008 SP1 Sql Server 2008 x64

So, with that out of the way, let’s discuss the first test.

Simple Query

This is a simple “SELECT * FROM Person.Contact” query … nothing special or funky. From there, as with all of the tests, I loop through the results and assign them to temporary, local variables. An overview of the tests is below:

DataReaderIndex Uses a data reader and access the values using the strongly-typed GetXXX methods (i.e. GetString(int ordinal)). With this set, the ordinal is looked up using GetOrdinal before entering the loop to go over the resultset. This is my preferred method of using a DataReader.
int firstName = rdr.GetOrdinal("FirstName");
int lastName = rdr.GetOrdinal("LastName"); 
while (rdr.Read())
{
    string fullName = rdr.GetString(firstName) + rdr.GetString(lastName);
}
rdr.Close();
DataReaderHardCodedIndex This is the same as TestDataReaderIndex with the exception that the ordinal is not looked up before entering the loop to go over the resultset but is hard-coded into the application.
while (rdr.Read())
{
    string fullName = rdr.GetString(0) + rdr.GetString(1);
}
rdr.Close();
DataReaderNoIndex Again, using a reader, but not using the strongly-typed GetXXX methods. Instead, this is using the indexer property, getting the data using the column name as an object. This is how I see a lot of folks using Data Readers.
while (rdr.Read())
{
    string fullName = (string)rdr["FirstName"] + (string)rdr["LastName"];
}
rdr.Close();
LinqAnonType Uses Linq with an anonymous type
var contactNames = from c in dc.Contacts
                   select new { c.FirstName, c.LastName };
foreach (var contactName in contactNames)
{
    string fullName = contactName.FirstName + contactName.LastName;
}
LinqClass_Field Again, uses Linq but this time it’s using a custom type. In this class the values are stored in public fields, rather than variables.

IQueryable<AdvWorksName> contactNames = from c in dc.Contacts
                   select new AdvWorksName()
                    {FirstName= c.FirstName, LastName= c.LastName };
foreach (var contactName in contactNames)
{
    string fullName = contactName.FirstName + contactName.LastName;
}
DataSet This final test uses an untyped dataset. We won’t be doing a variation with a strongly-typed dataset for the select because they are significantly slower than untyped datasets. Also, the remoting format for the dataset is set to binary, which will help improve the performance for the dataset, especially as we get more records.
DataSet ds = new DataSet();
                ds.RemotingFormat = SerializationFormat.Binary; 
                SqlDataAdapter adp = new SqlDataAdapter(cmd);
                adp.Fill(ds);
                foreach (DataRow dr in ds.Tables[0].Rows)
                {
                    string fullName = dr.Field<String>("FirstName") + dr.Field<String>("LastName"); 
                }
                cnct.Close();
LinqClass_Prop This uses a custom Linq class with properties for the values.
IQueryable<AdvWorksNameProps> contactNames = from c in dc.Persons
                                        select new AdvWorksNameProps() { FirstName = c.FirstName, LastName = c.LastName };
foreach (var contactName in contactNames)
{
    string fullName = contactName.FirstName + contactName.LastName;
}
LinqClass_Ctor This uses the same Linq class as above but initializes the class by calling the constructor rather than binding to the properties.
IQueryable<AdvWorksNameProps> contactNames = from c in dc.Persons
                                        select new AdvWorksNameProps(c.FirstName,  c.LastName);
foreach (var contactName in contactNames)
{
    string fullName = contactName.FirstName + contactName.LastName;
}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If you are wondering why the different “flavors” of Linq … it’s because, when I first started re-running these tests for the blog, I got some strange differences that I hadn’t seen before between (what is now) LinqAnonType and LinqClassField. On examination, I found that these things made a difference and wanted to get a more rounded picture of what we were looking at here … so I added a couple of tests.

And the results …

 

 

  Average
LinqClass_Field 277.61
DataReaderIndex 283.43
DataReaderHardCodedIndex 291.17
LinqClass_Prop 310.76
DataSet 323.71
LinqAnonType 329.26
LinqClass_Ctor 370.20
DataReaderNoIndex 401.63

image

These results are actually quite different from what I saw when I ran the tests on a single machine … which is quite interesting and somewhat surprising to me. Linq still does very well when compared to DataReaders … depending on exactly how you implement the class. I didn’t expect that the version using the constructor would turn out to be the one that had the worst performance … and I’m not really sure what to make of that. I was surprised to see the DataSet do so well … it didn’t on previous tests, but in those cases, I also didn’t change the remoting format to binary; this does have a huge impact on the load performance, especially as the datasets get larger (XML gets pretty expensive when it starts getting big).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I’ve got more tests, but due to the sheer length of this post, I’m going to post them separately.



Tags: , , ,

.NET Stuff | Linq | Performance

ASP.NET Async Page Model

October 1, 2008 7:07 PM

I just did a Code Clinic for the Second Life .NET User’s Group on using the ASP.NET async page model and it occurred to me that it’d be a good idea to do a little blog post about it as well. I’ve noticed that a lot of developers don’t know about this little feature and therefore don’t use it. It doesn’t help that the situations where this technique helps aren’t readily apparent with functional testing on the developer’s workstation or even on a separate test server. It only rears its head if you do load testing … something that few actually do (I won’t go there right now).

So, let me get one thing straight from the get-go here: I’m not going to be talking about ASP.NET AJAX. No way, no how. I’m going to be talking about a technique that was in the original release of ASP.NET 2.0 and, of course, it’s still there. There are some big-time differences between the async model and AJAX. First, the async model has nothing at all to do with improving the client experience (at least not directly, though it will tend to). Second, the async model doesn’t have any client-side goo; it’s all server-side code. And finally, there is no magic control that you just drop on your page to make it work … it’s all code that you write in the code-behind page. I do want to make sure that this clear ‘cuz these days when folks see “async” in relation to web pages, they automatically think AJAX. AJAX is really a client-side technique, not server side. It does little to nothing to help your server actually scale … it can, in some cases, actually have a negative impact. This would happen when you make additional round trips with AJAX that you might not normally do without AJAX, placing additional load on the server. Now, I’m not saying that you shouldn’t use AJAX … it’s all goodness … but I just want to clarify that this isn’t AJAX. Now, you can potentially this this for AJAX requests that are being processed asynchronously from the client.

Now that we have that out of the way, let me, for a moment, talk about what it is. First, it’s a really excellent way to help your site scale, especially when you have long-running, blocking requests somewhere in the site (and many sites do have at least a couple of these). Pages that take a few seconds or more to load may be good candidates. Processes like making web services calls (for example, to do credit card processing and order placement on an eCommerce site) are excellent candidates as well.

Why is this such goodness? It has to do with the way ASP.NET and IIS do page processing. ASP.NET creates a pool of threads to actually do the processing of the pages and there is a finite number of threads that will be added to the pool. These processing threads are created as they are needed … so creating additional threads will incur some overhead and there is, of course, overhead involved with the threads themselves even after creation. Now, when a page is requested, a thread is assigned to the page from the pool and that thread is then tied to processing that page and that page alone … until the page is done executing. Requests that cannot be serviced at the time of the request are then queued for processing as a thread becomes available. So … it then (logically) follows that pages that take a long time and consume a processing thread for extended periods will affect the scalability of the site. More pages will wind up in the queue and will therefore take longer since they are waiting for a free thread to execute the page. Of course, once the execution starts, it’ll have no difference on the performance … it’s all in the waiting for a thread to actually process the page. The end result is that you cannot services as many simultaneous requests and users.

The async page model fixes this. What happens is that the long running task is executed in the background. Once the task is kicked off, the thread processing the thread is then free to process additional requests. This results in a smaller queue and less time that a request waits to be serviced. This means more pages can actually be handled at the same time more efficiently … better scalability. You can see some test results of this on Fritz Onion’s blog. It’s pretty impressive. I’ve not done my own scalability testing on one of my test servers here, but I think, shortly, I will. Once I do, I’ll post the results here.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

How do you do this? To get started is actually quite easy, simple in fact. You need to add a page directive to your page. This is required regardless of which method you use (there are two). ASP.NET will then implement IAsyncHttpHandler for you behind the scenes. It looks like this:

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" Async="True" %>

Simple enough, right? Let me just add a couple of things that you need to make sure you have in place. You will need to follow the .NET asynchronous pattern for this to work … a Begin method that returns IAsyncResult and an end method that takes this result. It’s typically easiest to do this with API’s that already have this implemented for you (you just return their IAsyncResult object). There’s a ton of them and they cover most of the situations where this technique helps.

Now, to actually do this. Like I said, there’s two different ways to use this. The first is pretty easy to wireup and you can add multiple requests (I misstated this during the Code Clinic), but all of the async requests run one at a time, not in parallel. You simply call Page.AddOnPreRenderCompleteAsync and away you go. There are two overloads for this method, as follows:

void AddOnPreRenderCompleteAsync(BeginEventHandler b, EndEventHandler e)

 

 

 

 

 

 

 

 

 

 

 

 

void AddOnPreRenderCompleteAsync(BeginEventHandler b, EndEventHandler e, object state)

The handlers look like the following:

IAsyncResult BeginAsyncRequest(object sender, EventArgs e, AsyncCallback cb, object state)
void EndAsyncRequest(IAsyncResult ar)

 

 

 

 

 

 

The state parameter can be used to pass any additional information/object/etc. that you would like to the begin and the end methods (it’s a member if the IAsyncResult interface), so that can be pretty handy.

The code behind for such a page would look like the following:

    protected void Page_Load(object sender, EventArgs e)
    {
        LoadThread.Text = 
            Thread.CurrentThread.ManagedThreadId.ToString(); 
        AddOnPreRenderCompleteAsync(new BeginEventHandler(BeginGetMSDN),
            new EndEventHandler(EndAsyncOperation)); 

    }

    public IAsyncResult BeginGetMSDN(object sender, EventArgs e, AsyncCallback cb, object state)
    {
        BeginThread.Text =
            Thread.CurrentThread.ManagedThreadId.ToString(); 
        HttpWebRequest  _request = 
            (HttpWebRequest)WebRequest.Create(@"http://msdn.microsoft.com");
        return _request.BeginGetResponse(cb, _request);
    }

    void EndAsyncOperation(IAsyncResult ar)
    {
        EndThread.Text =
            Thread.CurrentThread.ManagedThreadId.ToString(); 
        string text;
        HttpWebRequest _request = (HttpWebRequest)ar.AsyncState;
        using (WebResponse response = _request.EndGetResponse(ar))
        {
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                text = reader.ReadToEnd();
            }
        }

        Regex regex = new Regex("href\\s*=\\s*\"([^\"]*)\"", RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(text);

        StringBuilder builder = new StringBuilder(1024);
        foreach (Match match in matches)
        {
            builder.Append(match.Groups[1]);
            builder.Append("<br/>");
        }

        Output.Text = builder.ToString();
    }

}

If you run this (on a page with the proper controls, of course), you will notice that Page_Load and BeginGetMSDN both run on the same thread while EndAsyncOperation runs on a different thread.

The other method uses a class called PageAsyncTask to register an async task with the page. Now, with this one, you can actually execute multiple tasks in parallel so, in some cases, this may actually improve the performance of an individual page. You have two constructors for this class:

 

 

public PageAsyncTask(
        BeginEventHandler beginHandler,
        EndEventHandler endHandler,
        EndEventHandler timeoutHandler,
        Object state)

and

public PageAsyncTask(
    BeginEventHandler beginHandler,
    EndEventHandler endHandler,
    EndEventHandler timeoutHandler,
    Object state,
    bool executeInParallel){}

 

The only difference between the two is that one little argument … ExecuteInParallel. The default for this is false, so if you want your tasks to execute in parallel, you need to use the second constructor. The delegates have identical signatures to the delegates for AddOnPreRenderComplete. The new handler timeoutHandler, is called when the operations times out and has the same signature to the end handler. So … it’s actually trivial to switch between the two (I did it to the sample listing above in about a minute.) I, personally, like this method better for two reasons. One, the cleaner handling of the timeout. That’s all goodness to me. Second, the option to have them execute in parallel. The same page as above, now using PageAsyncTask looks like to following:

public partial class _Default : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        LoadThread.Text = 
            Thread.CurrentThread.ManagedThreadId.ToString();
        
        PageAsyncTask t = new PageAsyncTask(
            BeginGetMSDN,
            EndAsyncOperation,
            AsyncOperationTimeout,
            false);
    }

    public IAsyncResult BeginGetMSDN(object sender, EventArgs e, AsyncCallback cb, object state)
    {
        BeginThread.Text =
            Thread.CurrentThread.ManagedThreadId.ToString(); 
        HttpWebRequest  _request = 
            (HttpWebRequest)WebRequest.Create(@"http://msdn.microsoft.com");
        return _request.BeginGetResponse(cb, _request);
    }

    void EndAsyncOperation(IAsyncResult ar)
    {
        EndThread.Text =
            Thread.CurrentThread.ManagedThreadId.ToString(); 
        string text;
        HttpWebRequest _request = (HttpWebRequest)ar.AsyncState;
        using (WebResponse response = _request.EndGetResponse(ar))
        {
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                text = reader.ReadToEnd();
            }
        }

        Regex regex = new Regex("href\\s*=\\s*\"([^\"]*)\"", RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(text);

        StringBuilder builder = new StringBuilder(1024);
        foreach (Match match in matches)
        {
            builder.Append(match.Groups[1]);
            builder.Append("<br/>");
        }

        Output.Text = builder.ToString();
    }

    void AsyncOperationTimeout(IAsyncResult ar)
    {
        EndThread.Text = Thread.CurrentThread.ManagedThreadId.ToString(); 
        Output.Text = "The data is not currently available. Please try again later."
    }

}

Not much difference there. We have 1 additional method for the timeout and the registration is a little different. By the way, you can pass null in for the timeout handler if you don’t care about it. I don’t recommend doing that, personally, but that’s up to you.

There you have it … a quick tour through the ASP.NET asynchronous page model. It’s clean, it’s easy, it’s MUCH better than spinning up your own threads and messing with synchronization primitives (this is mucho-bad-mojo, just say NO) and it’s got some pretty significant benefits for scalability.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

With that, I’m outta here. Happy coding!



Austin Code Camp Stuff ...

May 24, 2008 10:50 PM

I promised that I'd make the materials from my talk at the Austin Code Camp available for download. I've finally gotten it compressed and uploaded. It's 111 MB so be forewarned. Since I used WinRar (and that's not as ubiquitous as zip formats), I've made is a self-extracting archive. You'll need Visual Studio 2008 Team Edition for Software Developers (at least) to read all of the performance results. But I do have an Excel spreadsheet with the pertinent data.



Tags: , , ,

.NET Stuff | Linq | Performance | User Groups

More Notes on Performance Testing

May 14, 2008 11:06 PM

Well, I wanted to provide a little update on my previous discussion on the my performance testing methodology; I've refined it a bit while getting ready for the Austin Code Camp.

Of course, GC.Collect() is still very important ... but I must correct myself in the previous post. It's called before each test method run. This ensures that the garbage collector is all cleaned up and collected before the test run even starts executing.

Now, on the calculations. I still do a normalized (or perhaps weighted, but we're getting into semantics here) average. But ... I've altered the equation a bit to subtract the overhead associated with the profiler probe. These were, surprisingly, pretty different across the board with the different test methods. It really is appropriate to discount these from the overall results as they do impact the overall numbers. And, considering the differences between them in the various methods (in one set of tests, it ranged from .1 msec to 2.54 msec), they really needed to be removed from the results.

The final tweak was to make a call to each of the test methods before I went into the actual test. This was done in a separate Initialize method. This ensures that all of the classes being used (as was mentioned in the previous post) are loaded into memory and initialized. It also ensures that the methods themselves are JIT'd before the test runs begin as well; again, this is something that we need to take out of the final equation.



Tags: ,

Performance | Visual Studio Tools