Techniques for cleaner dynamic SQL generation
Very often when working in databases we have to resort to dynamic SQL generation to solve a problem. For example, common instances of this would be when performing an action across a number of different tables, or performing a PIVOT
on a data set where the set of pivot results is not pre-determined. This latter example is one which I’ll take through this walk through.
The scenario is that we have a table of Person data and a table of Payslip data, and the requirement is to return each Person with the salaries listed out year on year. Its a simple example, but not unlike a real-world problem.
First up we need some data to work with. See the appendix for some setup scripts, but below is the basic data.
Person Table
Payslip Table
The results we’re after should have one line per Person, with the years listed in the as columns, with the values being the salary for that person/year:
Static (ie non-dynamic) query
So we’ll look at the query needed to get this results set, in its static form, and make clear the limitation with the static version.
Here’s the static pivot:
This is fairly standard as a pivot goes, however it has the major limitation that in two separate places we need to enumerate the years - once for the pivot itself, and once for the output of results. Any change to the list of years requires the sql to be updated to include the new years.
A dirty workaround for this limitation would be to include lots of possible years in the SQL, however this would always be a limited range, and would cause there to be unecessary columns output into the results for those years that didn’t have any data yet.
What we need is to output exactly the years that have data in the results, which is where the dynamic SQL is useful. We can generate the list of years in the pivot from the years that have entries in the Payslip table.
Cursor based dynamic SQL
Very often, a piece of dynamic SQL like this would be generated using a loop through the list of years. This would either be done using a cursor, or using a temp table and a while loop through the rows.
Here is what a cursor version looks like:
So what’s wrong with this then?
- It’s non-linear - it requires the dyn sql to be built in a sequence different to that in which you’d write normal sql. With large dyn sql jobs this becomes a problem for understanding/debugging.
- It’s rather long - we’ve gone from around a dozen lines (depending on formatting) to nearer 40-50 lines, quite a lot of which is more to do with maintenance of the cursor instead of actually generating our results.
So what can we do to improve this? Here are a couple of alternative approaches to dynamic SQL generation that may help.
1: Using a select concatenation
This uses the ability to add/concatenate to a sql variable for each row of a select statement.
That’s better! - much fewer lines, and very little sql that isn’t actually generating the sql/results. It’s still non-linear though - we’re still compiling some sql out of place and inserting it lower down.
2: Using XML Path Trick
This version uses the ability to contatenate strings together by using the SELECT ... FROM ... FOR XML PATH ('')
pattern. Often this is paired with STUFF
to remove any extra code added in the loop that isn’t wanted in the final string.
Wow! - now we’ve only got a handful more rows than the original static sql, but most importantly, the dynamic sql build reads like the original static sql too.
This method is very flexible, as each such sub select can come from a completely different source, and yet still remain in line within the overall flow.
Wrapping up:
Personally I prefer the latter of these two options, although there have been the odd occasions where the former has worked a little better.
Either though is far cleaner than the original cursor or temp table based pattern.
In setting up these queries I’ve tried to be consistent with the formtating of the SQL to that you can a true impression of the difference between them.
Appendix: Setup Scripts
If you want to try the above yourself, the following will create the test data I used for the scripts: