Scalar Value and Built-In Functions

Throughout my career, it has become my strong opinion that the majority of performance problems are self-inflicted due to database design or coding that is less than optimal. This series will address common coding malpractice and the ways to mitigate them.

I once heard that there are 3 types of developers in the IT world; application developers, database developers, and the hybrid of the two. A successful hybrid between database developers and application developers is rare as usually, the two components require different ways of thinking. Application development is object-oriented with the use of methods and functions. Database development is primarily set based.

Many application developers will employ object-oriented coding practices in their environments, and use the same techniques T-SQL.

Scalar Valued Functions as part of the Select clause

The first concept that I will focus on is Scalar Valued Functions. Scalar Valued Functions are “typical” functions that accept a value or set of values and return a value. This sounds fantastic, right? Reuse code to get my information. If your goal is to reuse code, then you can be successful using scalar-valued functions. If your goal is to allow your application and database to scale and grow with minimal performance degradation, then you may want to avoid scalar-valued functions.

Create and use the Function

Let’s look at the AdventureWorks2012 example below. We’ve created a function that returns the order quantity for a sales order ID as shown below.

use    AdventureWorks2012
go

if     object_id('dbo.ufnSalesOrderQuantity') is not null
drop   function dbo.ufnSalesOrderQuantity
go

create function dbo.ufnSalesOrderQuantity
(
@SalesOrderID        int
)
Returns       smallint
BEGIN

Declare       @OrderQty smallint

select @OrderQty = OrderQty
from   Sales.SalesOrderDetailEnlarged
where  SalesOrderID = @SalesOrderID

return @OrderQty
END
Go

Now that we’ve got the function, how do we use it? See the screenshot below. We simply call the function. What’s great about this? We can change the code in the function and we won’t have to change code anywhere else. Less management, but what is the tradeoff?

Use the function within a select statement.

Now we will use the function within a select statement. We don’t have to include SalesOrderDetailEnlarged in the query as we already reference it.

print  '
/*********************************************************************
** BEGIN Function
*********************************************************************/'
set    statistics io on
set    statistics time on

select top 100000 s.OrderDate, s.SalesOrderNumber, s.PurchaseOrderNumber, s.SalesOrderID, dbo.ufnSalesOrderQuantity(s.SalesOrderID) as OrderQuantity
from   Sales.SalesOrderHeaderEnlarged s
Where  TerritoryID = 6

set    statistics time off
set    statistics io off
print  '
/*********************************************************************
** END Function
*********************************************************************/'

I’ve even included the execution plan below. Index SEEK. Just what we want, right?

Notice that we have included our statistics to show how many read and writes we have against the database as well as the time taken. ONLY 628 reads against the database. Fantastic! Let’s look at the same query doing the traditional join.

/*********************************************************************
** BEGIN Function
*********************************************************************/

(100000 row(s) affected)

Table 'SalesOrderHeaderEnlarged'. Scan count 1, logical reads 628, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 23904 ms,  elapsed time = 23934 ms.

/*********************************************************************
** END Function
*********************************************************************/

Traditional Join

Now we will query the same results, but querying the SalesOrderDetailEnlarged table directly. So much work.

print  '
/*********************************************************************
** BEGIN Query
*********************************************************************/'
set    statistics io on
set    statistics time on

select top 100000 s.OrderDate, s.SalesOrderNumber, s.PurchaseOrderNumber, d.OrderQty
from   Sales.SalesOrderHeaderEnlarged s
inner join sales.SalesOrderDetailEnlarged d on d.SalesOrderID = s.SalesOrderID
Where  TerritoryID = 6

set    statistics io off
set    statistics time off
print  '
/**************************************************************************
** END Query
*************************************************************************/'

Let’s take a look at the execution plan. 77% of the cost is from a clustered index scan, this can’t be what I want.

Let’s now review the IO and time statistics for this query. So many more reads, the function must be the way to go, right?

/*********************************************************************
** BEGIN Query
*********************************************************************/
(100000 row(s) affected)

Table 'SalesOrderDetailEnlarged'. Scan count 1, logical reads 6093, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SalesOrderHeaderEnlarged'. Scan count 1, logical reads 106, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 281 ms,  elapsed time = 957 ms.
SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 0 ms.
/*********************************************************************
** END Query
*********************************************************************/

Comparing the results

Did you notice the difference between the two sets of statistics? The statement calling the function has 10% of the reads of the traditional query and the execution plan is seek on only one index, or so it appears. There is one other glaring difference. Notice the CPU Time and Elapsed Time for both. For the call with the function, the CPU time is 23904 milliseconds and a total elapsed time of 23934 milliseconds. For the traditional call, CPU time is 281 milliseconds with a TOTAL elapsed time of 957 milliseconds. How can this be? The execution plan for the statement with the function call is much simpler and there are fewer reads. Could there be more?

The problem here is that the execution of the function is not part of the IO statistics or the execution plan. The results are misleading. In this case, the best way to review the actual work is through either a server-side trace or extended events. I’ve taken the liberty of recording the results using a server-side trace and putting the results to a table to analyze.

Below you will see the results show 100,000 results for this query call. Notice how the query was a “top 100000”. This function is being called 100,000 times, once for each result!

Now let’s take a look at the total cost of the query. 307346 reads. The duration, in this case, was also 23 seconds(I ran the query multiple times).

How much of this cost came from the function? 306716 reads came from the function.

Conclusion

Scalar Value Functions are a great way to reuse code, but as we can see here, there are hidden catastrophes caused when using scalar-valued functions on larger data sets. This forces SQL Server to treat each record individually(aka row-by-row processing). We may have more code to maintain by not using the function, but a little work for organizing a release will save migraines when analyzing performance.

Up next: The cost of the union…

5 thoughts on “Scalar Value and Built-In Functions”

John Sterrett says:

February 14, 2014 at 8:38 pm

Great job Bill, I am looking forward to part 2!

Brian Byrd says:

February 17, 2014 at 7:52 pm

In Oracle the you can use a with clause to force the database to create a temporary table, then join the temporary table to the main part of the query. It looks like SQL Server has something similar. If you move scalar functions into a with clause that can help. Ideally a with clause has a relatively low cardinality compared with the rest of the data.

Good article though.

1. william.j.wolf says:
  
  February 18, 2014 at 1:32 am
  
  Mr. Byrd,
  
  Functions and temp tables are not one in the same. SQL Server has temporary tables as well that we can make proper use of. Good post idea!
  
  1. william.j.wolf says:
    
    February 18, 2014 at 1:34 am
    
    The with clause in SQL creates a CTE. Generally poor for performance as well.
    
Mendl says:

April 24, 2017 at 5:19 pm

Please. More. Now.

Scalar Value and Built-In Functions

Scalar Valued Functions as part of the Select clause

Create and use the Function

Use the function within a select statement.

Traditional Join

Comparing the results

Conclusion

5 thoughts on “Scalar Value and Built-In Functions”

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Newsletter