A Couple of notes on the STRING_AGG function

  IMAG1304_1_thumb5Just a couple of quick notes from playing around with the T-SQL
  STRING_AGG function.

  As it says on the tin (well in the online documentation actually) 
  “STRING_AGG is an aggregate function that takes all expressions
  from rows and concatenates them into a single string.”
 
  That sounds handy, so let’s put it to use in the heady glamour of the
  AdventureWorksDW2017 database.

I’m aiming to produce a mailing list for each city. Here’s my test query based on a limited subset of just 10 cities:

--Generate list of e-mail addresses per city
select geog.City, STRING_AGG(EmailAddress,'; ') as EmailAddressList
from DimCustomer cust
left join DimGeography geog
on cust.GeographyKey = geog.GeographyKey
group by geog.City
order by geog.City desc
offset 0 rows fetch next 10 rows only 

Which produces my desired 10 city mailing list result:

image_thumb3

Now let’s remove the 10 city limitation and see the full result:

--Generate list of e-mail addresses per city
select geog.City, STRING_AGG(EmailAddress,'; ') as EmailAddressList
from DimCustomer cust
left join DimGeography geog
on cust.GeographyKey = geog.GeographyKey
group by geog.City
order by geog.City desc

Msg 9829, Level 16, State 1, Line 50
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.

Turns out that EmailAddress is an NVARCHAR(50), and according to the documentation…

image_thumb10[1]

…that data type will produce an aggregated result of NVARCHAR(4000). That is to say that all the aggregated addresses in their entirety for a given city have to fit into 4000 (unicode) characters.
Clearly this isn’t sufficient for the aggregated email addresses of some of the larger cities.

The error message does tell us “Use LOB types to avoid result truncation”, and the table above shows that input expressions of NVARCHAR(MAX) and VARCHAR(MAX) produce a result of those same respective types.

This means that we can circumvent the issue by CASTing the input expression to one of these data types before feeding it into the STRING_AGG function:

--Generate list of e-mail addresses per city
select geog.City
,STRING_AGG(cast(EmailAddress as NVARCHAR(MAX)),'; ')
 as EmailAddressList
from DimCustomer cust
left join DimGeography geog
on cust.GeographyKey = geog.GeographyKey
group by geog.City
order by geog.City desc

Which then allows the full result to be produced:

image_thumb14

This got me thinking that some folks will be using STRING_AGG in a similar way to that  shown in the original query. This means that it will work well for perhaps weeks or months, until the data size increases sufficiently to one day cause truncation with an NVARCHAR(4000) or a VARCHAR(8000). At this point the code will break suddenly and without warning.

Which makes me believe that maybe the safer and more scalable way to use STRING_AGG would be to always consider using it with a CAST to one of the MAX data types.

One other useful thing I realised with STRING_AGG, is that the aggregated result can be ordered. Although STRING_AGG doesn’t accept the OVER clause, it does have a WITHIN GROUP clause to impose an order.

So for e-mail addresses grouped by city, with addresses in reverse alphabetical order:

STRING_AGG(cast(EmailAddress as NVARCHAR(MAX)),'; ')
within group (order by EmailAddress desc)

image_thumb8

Of course the ordering doesn’t have to just be by the expression being aggregated. For example given this table:

image_thumb10

This query will aggregate the Word column, but order it with the ID column:

select STRING_AGG(Word,' ') 
within group (order by ID asc) as Phrase
from #t1

image_thumb11

Posted in T-SQL | 40 Comments

You’re out of order! You’re out of order! The whole result is out of order!

image  The question of where NULL marks should
    appear within an ordered result is an
    interesting one.

    The SQL standard states that NULLs should be
    placed together, but it is left to the
    implementation as to whether they appear
    before or after non-NULLs.

SQL Server places NULLs at the beginning of an ordered result, so for example given this table:

image

The following query sorts the rows by country in an ascending (A-Z) order, this being the default order that will be applied if the sort direction (ASC or DESC) is not specified:

SELECT FirstName, Country

FROM Person

ORDER BY Country ASC

 image

As per the standard, NULLs are ordered together, and as per the implementation, NULLs are placed first.

In a descending result, NULLs would therefore be placed last:

SELECT FirstName, Country

FROM Person

ORDER BY Country DESC

image

This then raises the question, what can be done if a different sort behaviour is required?
It turns out that Oracle has exactly this facility:

ORDER BY Country NULLS FIRST

ORDER BY Country NULLS LAST

 

Although we don’t have this within T-SQL, it’s easy enough to do the same thing:

SELECT FirstName, Country

FROM Person

ORDER BY

IIF(Country IS NULL,2,1) ,COUNTRY ASC

 

image

In a two way decision like this, I like the brevity of IIF, but if using a pre-2012 version of SQL Server, or the preference is for standard SQL, then the following is exactly the same (IIF is just a convenient wrapper around CASE):

SELECT FirstName, Country

FROM Person

ORDER BY

CASE

       WHEN Country IS NULL THEN 2

       ELSE 1

END ,COUNTRY ASC

 

Using CASE like this within the ORDER BY clause can be extended further.

Imagine a query coming from a webserver. The webserver passes the ID of the person making the search, and we have a table somewhere telling us the country where this person resides.

It’s conceivable that the person would be interested in seeing results relating to their own country first, so we’d like to prioritise the search based upon that.

A reasonable order to aim for might be their own country first, followed by other countries alphabetically, and finally NULLs:

DECLARE @UserCountry varchar(50) = ‘France’

 

SELECT FirstName, Country

FROM Person

ORDER BY

CASE

       WHEN Country = @UserCountry THEN 1

       WHEN COUNTRY IS NULL THEN 3

       ELSE 2

END ,COUNTRY ASC

 

image

 

DECLARE @UserCountry varchar(50) = ‘Germany’

 

SELECT FirstName, Country

FROM Person

ORDER BY

CASE

       WHEN Country = @UserCountry THEN 1

       WHEN COUNTRY IS NULL THEN 3

       ELSE 2

END ,COUNTRY ASC

 

image

Using this case-based ordering becomes really useful when supplying results for web servers or applications that flick through pages of results:

DECLARE @UserCountry varchar(50) = ‘Germany’

SELECT FirstName, Country

FROM Person

ORDER BY

CASE

       WHEN Country = @UserCountry THEN 1

       WHEN COUNTRY IS NULL THEN 3

       ELSE 2

END ,COUNTRY ASC

OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY

 

SELECT FirstName, Country

FROM Person

ORDER BY

CASE

       WHEN Country = @UserCountry THEN 1

       WHEN COUNTRY IS NULL THEN 3

       ELSE 2

END ,COUNTRY ASC

OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY

 

 

Posted in T-SQL | 2 Comments

Generate SSRS Subscription E-Mail Address Lists from AD with PowerShell

letters

It takes a lot of time to create SSRS e-mail subscription recipient lists when there are a large number of recipients, and all you have are the recipients’ human names.

 

I wrote a PowerShell function to take a list of names and generate all the recipients’ e-mail addresses from AD in an SSRS friendly format. 

 

  

I hope this will save you some time when you are creating SSRS subscriptions like this too.

 

Here’s the scenario; you’re asked to create one or more new SSRS e-mail subscriptions, and you’re provided with the desired recipients’ human names in a long list, perhaps via an e-mail from someone, like this:-

 

                                                         Robert Smith

                                                         Alan Jones

                                                         William White

                                                         Victor Stephenson

                                                         Maria Brown

                                                         Helen Wood

                                                         etc.

 

Assuming that the organisation you’re working in doesn’t create an AD distribution group per subscription (and I’ve never seen that done), you will need to work through the list of recipients, and for each one look up their individual e-mail addresses manually, and then add them piecemeal to the subscription.

 

To save the time and hassle doing this, I wrote a PowerShell function to query Active Directory for the e-mail addresses of the accounts matching the names from the list, and then generate a long string of these e-mail addresses with semi-colon separators.

 

So given the above list, we will end up with something like this:-

 

robert.smith2@ricketyrocket.com; alan.jones1@ricketyrocket.com; will.white@ricketyrocket.com; victor.stephenson@ricketyrocket.com; m.brown@ricketyrocket.com; helen-jane.wood@ricketyrocket.com; mick.smith@ricketyrocket.com; julie.stirrup@ricketyrocket.com; d.delante@ricketyrocket.com; chris.woode@ricketyrocket.com; christian.browne@ricketyrocket.com; brenda.chapel@ricketyrocket.com; maurice.laurence@ricketyrocket.com; Dan.Delaney@ricketyrocket.com; cynthia.knott@ricketyrocket.com; linda.bell@ricketyrocket.com

The function then automatically copies the string to the clipboard so that it can then be pasted straight into the SSRS report subscription recipient box – saving time and avoiding the risk of a typo.

 

The function de-duplicates the source list of names, in case the same name has been inadvertently added to the list more than once (thus avoiding the same e-mail address being added to the output string more than once).

 

If multiple e-mail addresses exist for any name in the list, this most likely means that multiple AD accounts must have the same name property e.g. you have two people in your organisation named John Smith. In this situation the function adds all e-mail addresses matching the name in question to the output string. However  a warning is issued, advising that any undesired e-mail addresses should be removed from the list manually.

 

WARNING! – Multiple AD accounts exist with the name Will White (will.white@ricketyrocket.com will.white2@ricketyrocket.com)

Please review the final e-mail address string and remove any that are not required.

 

 

Naturally the human names specified in the list need to be the same as they are defined within Active Directory. So for example if the name is “Bob Smith” in the list, but “Robert Smith” in AD then there won’t be a match. Additionally people obviously need to have an e-mail account defined within Active Directory.

 

In the event of either of these two failings, an alert is displayed like this:-

 

Mark Madeup not found in Active Directory, or doesn’t have an e-mail address.

Please check the name e.g. Robert instead of Bob etc.

Also check that this person has an e-mail address defined.

 

 

To use the function, simply:-

 

(i) Create the function by running the PowerShell script below.

 

(ii) Copy the list of desired recipient names from the source (e.g. e-mail) into a text file.

 

(iii) Call the function with a single parameter, giving the path to the text file like this:-

    Get-AddressList -File C:\temp\Folks.txt

 

(iv) Paste the contents of the clipboard into the recipient box for the subscription:-

 

Subscription

 

Here’s the script to create the function, I hope that you will find it useful:-

Function Get-AddressList

{

 

<#            

 .SYNOPSIS

 Given a list of human names, generates a string of their associated e-mail addresses.

            

 .DESCRIPTION

 Reads a text file containing a list of human names, and returns e-mail addresses

 for those names by querying Active Directory.

        

 E-mail addresses are returned in a string separated by a semi-colon and space,

 which is automatically copied to the clipboard ready to be pasted into an SSRS subscription.

 

 A problem is indicated for any names that cannot be resolved to an e-mail address.

 

 A warning is indicated for names that resolve to more than one e-mail address

 (e.g. multiple AD accounts with the same name).

 It is left to the operator to review the final address string and remove the e-mail

 adresses that are surplus to requirements.

 

 .NOTES

 Version: 1.0

 Author: Andy Hogg

 

 .PARAMETER FileName

 Specifies the text file name containing the list of human names.

 

 .INPUTS

 None. You cannot pipe objects to Get-AddressList.

 

 .OUTPUTS

 System.String. Get-AddressList returns a string of e-mail addresses and copies this to the clipboard.

 

 

 .EXAMPLE

 Get-AddressList -File “c:\temp\folks.txt”

                      

 .LINK

 https://andyhogg.wordpress.com/2016/03/05/generate-ssrs-subscription-e-mail-addresses-from-ad-4

 

 .LINK

 http://msdn.microsoft.com/en-us/library/ms156307.aspx#bkmk_create_email_subscription

#>

 

[cmdletbinding()]

 

PARAM

(

 [parameter(Mandatory=$true)]

 [string] $File

)

 

 

 

#Read the list of names from the file

[Array]$Users=get-content $File

 

#Remove any empty lines from the end of the file

$Users = $Users | where {$_ -ne “”}

 

#Deduplicate names

$Users = $Users | Sort-Object | Get-Unique

 

[String] $Recipients = #The final list of e-mail addresses

[String] $Multiples =   #Names and addresses matching more than one AD account

[Array]  $Addresses =   #The e-mail address(es) associated with the current name

 

#Iterate over each name from the list

foreach ($User in $Users)

 {

  Try

   {

   

    #Query AD for e-mail address for the current name

    $Addresses = Get-ADUser -Filter ‘Name -like $User’ -Properties mail |

     select -ExpandProperty mail

   

    #If more than one AD account matches the name, then note the details

    if ($Addresses.Count -gt 1)

     {

      $Multiples = $Multiples + $User + ” “ + “($Addresses) “

     }

   

    #If there’s no e-mail address then throw an error

    if (!$Addresses) {throw}

   

    #If there is an e-mail address or addresses then add them to the list of recipients

    for ($i=0; $i -lt $Addresses.count; $i++)

     {

      $Recipients = $Recipients + $Addresses[$i] + ‘; ‘

     }       

   }

    

  #Catch block for names not found in AD or names without an e-mail address

  Catch

  {

    cls

    Write-Host `n

    Write-Host “PROBLEM! – $User not found in Active Directory” -ForegroundColor Red

    Write-Host “Or doesn’t have an e-mail address.” -ForegroundColor Red

    Write-Host “Please check the name e.g. Robert instead of Bob etc.” -ForegroundColor red

    Write-Host “Also does this person have an e-mail address defined?” -ForegroundColor red

    Write-Host `n

    Return

   }       

 }

 

#Remove the trailing space and semi-colon from the final list of recipients

$Recipients = $Recipients.Substring(0,$Recipients.Length2)

 

cls

Write-Host `n

 

#If multiple AD accounts exist for the same name, alert to this fact

if ($Multiples.length -gt 0)

 {

  write-host “WARNING! – Multiple AD accounts exist with the name $Multiples -ForegroundColor Red

  write-host “Review the final e-mail address string and remove those addresses not required.” -ForegroundColor Red

 }

 

#Write the full list of e-mail addresses

Write-Host `n

Write-Host “The full list of recipient e-mail addresses is:-“

Write-Host $Recipients -ForegroundColor Cyan

 

#Copy them to the clipboard

$Recipients | clip

Write-Host `n

Write-Host “This string of addresses has already been copied to the clipboard”

Write-Host “and is ready to be pasted intothe recipients box of an SSRS e-mail subscription.”

Write-Host `n

 

} #Close function block

Posted in PowerShell, Reporting Services | 1 Comment

Repairing Dropped SSAS Performance Counters

imageMy last post covered rebuilding dropped performance counters for the SQL Server (Relational Engine).

However SQLSentry Performance Advisor can also monitor Analysis Services, which from time to time is prone to the same dropped performance counter malady.

If you’re suffering from dropped SSAS counters, you’ll persistently see something like the image above within Performance Advisor.

You can remedy this easily enough though by rebuilding the SSAS performance counters, and this can be accomplished without any downtime.

1) RDP to the server being monitored and open a command prompt (you will need to run this as Administrator).

2) In the command prompt, navigate to the directory:-

C:\Program Files\Microsoft SQL Server\MSAS<version>.<instance name>\OLAP\bin\Counters

Replace <version> with the version of SSAS you are running, and replace <instance name> with the SSAS instance name that has poorly performance counters.

For example:-

For an SSAS 2014 instance named LEYLAND, this would be MSAS12.LEYLAND

For an SSAS 2012 instance named QUACKERS, this would be MSAS11.QUACKERS

I’m using SSAS 2014 and my instance has the highly imaginative name INST2, so my target directory is:-

1.Dir

3) Now issue the command to unload the SSAS counters:-

UNLODCTR MSOLAP$INST2

(Replace INST2 with the name of your instance).

My command looks like this:-

2.unlod

4) Now issue the command to reload the SSAS counters:-

LODCTR perf-MSOLAP$INST2msmdctr.ini

Once again replace INST2 with your instance name; with mine this looks like:-

3.lod

Success is implied by the lack of an error message. You’ll only receive feedback if there is a problem.

5) Restart the Remote Registry Service.

6) Restart the Performance Logs and Alerts Service.

7) On the SQL Sentry server running the monitoring service, restart the SQL Sentry Monitoring Service. In a larger environment you may have more than one server running monitoring services, so you will need to establish which monitoring server to restart the service on.

You can find this out by looking at the SQL Sentry Client. Go to the Navigator pane on the left, and look under Monitoring Service => Connections List

This displays every site, every monitoring server monitoring the site, and every SQL Server instance being monitored by that monitoring server. Simply find the SSAS instance on which  you are seeing the problem, note the monitoring server responsible for monitoring that instance, and then restart the SQL Sentry Monitoring Service on that specific monitoring server.

You can do this with PowerShell (substituting your monitoring server’s name for SQLSentryServerName) with the script below:-

cls

$cred = Get-Credential

Invoke-Command -ComputerName SQLSentryServerName -Credential $cred -ScriptBlock {Stop-Service “SQL Sentry Monitoring Service”}

Invoke-Command -ComputerName SQLSentryServerName -Credential $cred -ScriptBlock {Start-Service “SQL Sentry Monitoring Service”}

8)Restart your SQL Sentry client. You should now start seeing the missing SSAS counters being collected.

PresentSSASCounter

Posted in Analysis Services | Leave a comment

SQL Sentry Tips and Tricks

I’ve used the SQL Sentry product suite (Performance Advisor and Event Manager) for some time now. It’s a great product, which has proven its value time and time again.

As well as experience in implementing and using the product, I had the very good fortune the other week to spend a day with a couple of the folks from SQL Sentry.

I thought I’d write about a few of the things that I’ve learned. This is partly for my own benefit so that I can find the details easily in the future; and partly in case these are of interest to anyone else.

The Actions Log is a store in the repository which contains details for any “actions” that have occurred in response to “conditions” being met. Example – A deadlock occurs (the condition) and an e-mail alert is sent (the action) in response to this.

It’s unlikely that you’ll want to be able to see actions such as this from way back, so it’s a good idea to reduce this retention figure from its generous default (of one year) in order to save space in the repository.

In the Navigator pane on the left, under Monitoring Service => Settings => Message Processor, reduce Purge Action Log History Older Than from the default of 365 days to a lesser number which still meets your requirements e.g. 60 days.

image

 

If you find yourself being inundated with e-mail alerts for a condition that you are already aware of, it’s possible to reduce the number of e-mails sent for the same condition re-occurring on the same object within a given time period.

Under Monitoring Service => Settings => Message Processor, reduce Don’t send more than n emails from the default of 10 to a smaller number e.g. 2.

Fewer e-mail alerts will now be sent for the same condition occurring on the same object.

image

 

Non-numeric data (e.g. deadlock graphs, blocking chains, query plans, query traces, top SQL) is stored in the repository for 15 days by default.

You may find a 15 day retention of such detail to be excessive, in which case you can reduce this in order to save space in the repository.

Under Monitoring Service => Settings =>Performance Monitor, reduce the Purge History Older Than from the default of 15 to a more suitable figure e.g. 7.

image

 

If you find yourself seeing truncation when examining large queries within SQL Sentry, you may wish to modify the Text Data Maximum Length from its default length of 15,000 characters to something larger e.g. 30,000

You can find this setting under Monitoring Service => Settings =>Performance Monitor

image

Note, the similarly named setting Text Data (Normalized) Maximum Length controls the size of queries which have been parameterised for aggregation purposes (what you see when you press the Sigma button with the tooltip “Show Totals” from the “Top SQL” tab). For example, one execution of…

EXEC SP_PremiumEarned @AccountNum=1234

…and one execution of…

EXEC SP_PremiumEarned @AccountNum=5678

…will be parameterised and aggregated to 2 executions of…

EXEC SP_PremiumEarned @AccountNum=?

There is less of a case for increasing the size of this, since you will be unlikely to be using this aggregated view to analyse queries in depth, rather merely to identify them. The default size of 4000 characters is likely to be quite sufficient.

__

The SQL Sentry repository will increase in size over time in proportion to the number of instances being monitored (and how noisy they are).

With a larger repository you can sometimes see slower performance when querying historic data from the SQL Sentry console.

There are a couple of things that you can use to improve this.

Firstly, configure a SQL Agent job on the SQL Sentry repository server to perform regular statistics updates with a higher sample rate and targeting certain important tables which can grow quite large.

The initial sample size that I’ve tried is 25%, and I schedule a job to run every 4 hours. This sample size and schedule aren’t set in stone though, and different values for both may well be more optimal for your environment. I suggest that you use these as a starting point and then experiment.

The tables to target are as follows:-

UPDATE STATISTICS PerformanceAnalysisData WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataRollup2 WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataRollup4 WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataRollup6 WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataRollup8 WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataTableAndIndexCounter WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataDiskCounter WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisDataDatabaseCounter WITH SAMPLE 25 PERCENT

UPDATE STATISTICS PerformanceAnalysisTraceData WITH SAMPLE 25 PERCENT

UPDATE STATISTICS EventSourceHistory WITH SAMPLE 25 PERCENT

UPDATE STATISTICS EventSourceHistoryDetail WITH SAMPLE 25 PERCENT

 

__

Secondly, to improve performance you can enable trace flag 2371 globally on the SQL Server hosting the SQL Sentry repository database.

Configure this trace flag to persist when SQL Server restarts. You can either add the -T switch to the SQL Server service; or configure a SQL Agent job scheduled to run on SQL Server start-up, that will run this command:-

DBCC TRACEON(2371,-1)

There’s a nice explanation of what trace flag 2371 actually does here.

__

From time to time, Windows performance counters can be a little flaky, and you can occasionally see them “dropped”. When this happens, in Performance Advisor you will see a message such as “No data exists for this range and resolution”. In order to remediate this:-

1) RDP to the server being monitored and open a command prompt (you will need to run as Administrator).

2) Navigate to the Binn directory of the SQL Server instance in question. For example on my system that would require this command:-

CD C:\Program Files\Microsoft SQL Server\MSSQL12.INST2\MSSQL\Binn

3) Unload the counters by issuing the command:-

unlodctr MSSQL$INST2

(Substitute your instance name for mine, which is INST2).

4) Load the counters by issuing the command:-

lodctr perf-MSSQL$INST2sqlctr.ini

(Again, substitute your instance name for mine, which is INST2).

5) Restart the Remote Registry Service.

6) Restart the Performance Logs and Alerts Service.

7) On the SQL Sentry server running the monitoring service, restart the SQL Sentry Monitoring Service. In a larger environment you may have more than one server running monitoring services, so you will need to establish which monitoring server to restart the service on.

You can find this out by looking at the SQL Sentry Client. Go to the Navigator pane on the left, and look under Monitoring Service => Connections List

This displays every site, every monitoring server monitoring the site, and every SQL Server instance being monitored by that monitoring server. Simply find the SQL Server instance that you are seeing the problem with, note the monitoring server responsible for monitoring it, and then restart the SQL Sentry Monitoring Service on that specific server.

8) Restart your SQL Sentry client. You should now start seeing the missing counters being collected.

The above process should be achievable without causing any service outage to the SQL Server being monitored.

Posted in Monitoring | 2 Comments

If You Can’t Stand the Heat, Stay Out of the Kitchen

photo3

The Data Warehouse luminary Dr Ralph Kimball once made an analogy in one of his books comparing a Data Warehouse to the kitchen of a busy restaurant.

He said that if we were running a restaurant, we would never allow our diners to come into the kitchen. There is just too much danger – too many boiling liquids, hot surfaces, sharp knives, spillages on the floor etc.

Additionally we would not want diners dipping their fingers in sauces, or sampling food before it is fully cooked.

Instead diners should remain seated in the surroundings of the dining area, which has been especially designed to provide a safe and pleasant environment for them. Their dinner can then be served to them in the way that they expect – plated and presented nicely, and conforming to what they actually ordered in the first place.

What Dr Kimball was alluding to, was that customers of a data warehouse should always access data via some kind of presentation layer, and not query the Data Warehouse directly. Just like the restaurant kitchen, there are too many dangers for them to fall victim to, and the result that they get back from their queries is unlikely to be what they actually wanted.

I like this analogy a lot, and so here is a little SQL puzzle illustrates the point nicely. This is not contrived – it actually happened, and I was asked to explain the results.

I’ve probably already given away the game with all the pre-amble above, but here it is anyway:-

 

I want to know the highest value for PuzzleID, so I run the query..

SELECT MAX(PuzzleID) as HighestValue

FROM dbo.KitchenPuzzle;

…and  I receive the following result…

image

Knowing from previous experience that there are rows with a PuzzleID greater than 9999, I try another query…

SELECT MAX(PuzzleID) as HighestValue

FROM dbo.KitchenPuzzle

WHERE PuzzleID > 9999;

…and I receive the following result.

image

Explain why the MAX() function is behaving illogically.

 

As I am sure you’ve figured out, a fallacious assumption has been made about the data type that we are working with. And as that great 20th century philosopher Benny Hill once said “When you ASSUME you make an ASS out of U and ME”.

Here is the code to reproduce this little puzzle for yourself.

IF OBJECT_ID(‘dbo.KitchenPuzzle’) IS NOT NULL

DROP TABLE dbo.KitchenPuzzle;

GO

CREATE TABLE dbo.KitchenPuzzle(

       PuzzleID NVARCHAR(50)

       );

 

WITH CTE1(n) as

       (

              SELECT ROW_NUMBER() OVER (ORDER BY t1.timestamp) as n

              FROM sys.dm_os_ring_buffers t1

              CROSS JOIN

              sys.dm_os_ring_buffers t2

       )

 

INSERT dbo.KitchenPuzzle(PuzzleID)

       SELECT CAST(n as nvarchar) from CTE1

       WHERE n<= 50000;

 

SELECT MAX(PuzzleID) as HighestValue

FROM dbo.KitchenPuzzle;

 

SELECT MAX(PuzzleID) as HighestValue

FROM dbo.KitchenPuzzle

WHERE PuzzleID > 9999;

 

Just to elaborate further with regard to data types and sort order (partial result shown)…

SELECT PuzzleID

FROM dbo.KitchenPuzzle

ORDER BY PuzzleID ASC;

image

An amusing diversion, but a somewhat contrived puzzle you might feel?

Then observe the schema of the DimBuild table in the Data Warehouse provided by Microsoft’s Team Foundation Server:-

CREATE TABLE [dbo].[DimBuild](

       [BuildSK] [int] IDENTITY(1,1) NOT NULL,

       [BuildBK] [nvarchar](64) NOT NULL,

      [BuildID] [nvarchar](64) NULL,    

       [BuildName] [nvarchar](260) NULL,

       [BuildType] [nvarchar](260) NULL,

       [BuildDefinitionName] [nvarchar](260) NULL,

       [DropLocation] [nvarchar](512) NULL,

       [BuildStartTime] [datetime] NULL,

       [LastUpdatedDateTime] [datetime] NOT NULL,

       [TeamProjectCollectionSK] [int] NULL,

PRIMARY KEY CLUSTERED

(

       [BuildSK] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

Posted in Data Warehouse, T-SQL | Leave a comment

Get-Curry | Look-Disinterested – How PowerShell has a part time job working in a takeaway on the Tottenham Court Road

Stepping out the other day for a quick lunchtime takeaway, my conversation with the staff went something like this:-

Me: “Hi there”.

Server: <Looks at me like I’ve just asked to defecate in his hat>

Me: “May I have a tofu curry please?”.

Server: <Shovels some tofu curry into a pot and passes it to the cashier>

Cashier: <Puts pot into a bag with a fork, and hands it to me> “£4.99 please”.

Me: <Touches card to reader> “Thank you”.

Cashier: <Yawns and looks at watch >

The problem here was a lack of interaction. I was receiving no feedback to what I was saying, and was left feeling like I’d had a very one-sided conversation. Interestingly enough, that very same afternoon, I had exactly the same experience with PowerShell.

The scenario was that I was that I wanted to know if 3 hot-fixes (which were recommended for servers running Analysis Services) had been applied to a Windows 2008 R2 server (with remote PowerShell disabled). So the best route I thought would be to RDP onto the server, export a list of applied hot-fixes to a file, copy the file off to my desktop, and then examine it offline for the hot-fixes of interest.

All went well to start with, I connected with RDP to the server and issued the PowerShell command…

Get-Hotfix | Export-Csv c:\temp\hotfix.csv

Which as I’m sure you are aware (or can probably work out for yourself) creates an inventory of hotfixes that have been installed and then writes these out to a CSV (Comma Separated Values) file:-

#TYPE System.Management.ManagementObject#root\cimv2\Win32_QuickFixEngineering

“PSComputerName”,”InstalledOn”,”__GENUS”,”__CLASS”,”__SUPERCLASS”,”__DYNASTY”,”__RELPATH”,”__PROPERTY_COUNT”,

“__DERIVATION”,”__SERVER”,”__NAMESPACE”,”__PATH”,”Caption”,”CSName”,”Description”,”FixComments”,”HotFixID”,”InstallDate”,

“InstalledBy”,”Name”,”ServicePackInEffect”,”Status”

 

“MyServerName”,”01/03/2014 00:00:00″,”2″,”Win32_QuickFixEngineering”,”CIM_LogicalElement”,”CIM_ManagedSystemElement”,

“Win32_QuickFixEngineering.HotFixID=””KB974405″”,ServicePackInEffect=”””””,”11″,”System.String[]”,”MyServerName”,

“root\cimv2″,”\\MyServerName\root\cimv2:Win32_QuickFixEngineering.HotFixID=””KB974405″”,ServicePackInEffect=”””””,

http://support.microsoft.com?kbid=974405&#8243;,”MyServerName”,”Update”,””,”KB974405″,,”MyDomain\MyLogin”,,””,

“MyServerName”,”01/02/2014 00:00:00″,”2″,”Win32_QuickFixEngineering”

 

etc…..

 

I then copied the file to my workstation and logged off the server. PowerShell will helpfully consume a CSV file and allow its contents to be queried. Here’s the query that I ran:-

import-csv c:\temp\hotfix.csv |

select hofixid | ? {$_.hotfixid -eq ‘KB958488’}

 

PowerShell just looked at me blankly, with the same disinterest as I’d experienced earlier:-

image

It’s interesting to note the lack of feedback here compared to the results of the same query expressed in another language, like T-SQL querying SQL Server. If we imagine for a moment a nice table containing the contents of that same CSV file, then the corresponding query in T-SQL would look like:-

SELECT HotfixID

FROM dbo.Hotfix                                                     

WHERE HotfixID = ‘KB958488’

And on submitting this query, we would see some nice helpful feedback in the form of a rowcount:-

image

Hi, this is SQL Server. You know that hotfix thingy you were on about? Well hey guess what – there isn’t one here with that name. Have a great day”.

Ok perhaps T-SQL querying SQL Server is not quite that friendly, however it’s just told us 2 very important things though.

Firstly that the query made sense to it, syntactically at least. That is to say that we have phrased our question in T-SQL in a way that SQL Server understands.

Logically, our query could be complete rubbish. Here’s an example of a query which is syntactically correct but logically nonsensical. You can see that SQL Server still dutifully humours us, and still gives us some feedback.

Set up something to query…

CREATE TABLE #TT

(

ValueCanOnlyBeOneOrZero bit

)

 

INSERT #TT(ValueCanOnlyBeOneOrZero)

VALUES (1)

INSERT #TT(ValueCanOnlyBeOneOrZero)

VALUES (0)

 

…then query it with an illogical query and observe the feedback…

 

 image

 

 

 

Secondly, the feedback is telling us that everything that we’ve mentioned in the query exists. In this example there is a table named #TT and there is a column in that table named ValueCanOnlyBeOneOrZero.

 

Look what happens when I mistype something in the query:-

 

image

 

Syntactically it still made sense, but SQL Server spotted the invalid column name and told me in no uncertain terms to buck my ideas up.

 

 

Dragging the discussion back to my PowerShell command from earlier, I’d completely failed to spot my typo:-

 

 

import-csv c:\temp\hotfix.csv |

select hofixid | ? {$_.hotfixid -eq ‘KB958488’}

 

Of course, there is no property named hofixid it is in fact hotfixid

Did PowerShell tell me that I have sausage fingers? Nope, PowerShell will happily run your command featuring properties that don’t actually exist. It actually enjoys sitting back smirking whilst you stare at your screen shivering in the cold embrace of nothingness.

 

 

So I guess this post is about feedback, and recognising that sometimes you’re going to get feedback and it’s going to be helpful. Other times, you’re going to get diddly-squat and you’ll just have to do your best to figure out what’s going on.

Posted in PowerShell, T-SQL | Leave a comment

The Truth about Cats and Dogs (Why Stored Procedures don’t use Filtered Indices)

With SQL Server 2008 came an interesting enhancement – the filtered index. This is a really useful tool when trying to deal with columns containing a wildly unbalanced distribution of values. For example in a database of medical patients, where the very rare blood types (like AB negative) are in the minority amongst the more common types (like O positive).

However the filtered index is not without an interesting (although ultimately understandable) quirk. This is a link to all the T-SQL in this blog if you’d like to follow this example though and explore this quirk with me:-

Link to T-SQL Example Script

I’m going to assume that you already have a numbers table created. If you don’t, then just refer to the script above to create one.

Let’s start by creating a table named Animal:-

Let’s add a large number of dogs to our table…

…followed by a significantly lesser number of cats…

Now, let’s recap what we have…

…which yields a nicely unbalanced distribution of data:-

Now we’re ready to start running some queries to see what the optimiser makes of them. We’re going to want to view their execution plans, so be sure to switch this on by right-clicking in the query pane and from the resulting menu choosing “Include Actual Execution Plan“.

Let’s run the first query to ask for some dogs:-

We’re not terribly interested in the result here (10000 consecutive numbers). However, if we click on the Execution plan tab, and hover our mouse over the farthest right node…

…we’ll see that the optimiser elected to use a table scan on the Animal table:-

If we run the same query but this time with Cat as the search predicate, we see similar behaviour (since no indices exist):-

Now let’s try to make things easier for the optimiser by adding a couple of filtered indices. One on Dog, one on Cat:-

We can then examine the result of our handiwork:-

This confirms the properties of the statistics for this particular index (filtered on “Dog”; 4,999,990 rows in index; 5,000,000 rows in table) are exactly as we might expect.

We can do a similar check on the Cat index too:-

Filtered on “Cat”; 10 rows in index; 5,000,000 rows in the table.

Now let’s try again running the first of those two queries that we ran previously:-

You’ll see that despite the index that we created, the optimiser is still electing to use a table scan. This is because for the data distribution that we have, this is actually a reasonable choice.

However, if we now re-run the Cat query again we’ll see something different to before:-

The optimiser has realised that it has a really useful filtered index. Since there are significantly fewer rows returned for “Cat”, an index seek of the filtered index “CatIndex” is an efficient way to service the query (as opposed to a table scan).

To summarise so far, the optimiser is making good choices in devising plans to service our queries depending upon the distribution of the data.

Now let’s see how everything changes when we try to encapsulate the same query within a parameterised Stored Procedure:-

Let’s now clear down the plan cache…

…and then execute the procedure with “Dog” as the parameter:-

The resulting plan shows us that as before, with “Dog” the optimiser utilises a table scan:-

(Note in the above, the predicate is now showing [Species]=[@What] which is different to what we were seeing previously. This is going to prove significant in a minute).

However, if we now clear down the plan cache again…

…and then execute the procedure again but this time with the parameter “Cat”, we see (and this is the important bit):-

Another table scan!!!

This is despite the fact that the optimiser knows that it has a nice shiny filtered index to use which would perfectly service this query, much more efficiently than a table scan.

We can try modifying our procedure to use WITH RECOMPILE; this will mean that the plan for the procedure will not be saved in the plan cache, and a new plan will be devised upon each execution:-

You might think this would help since the optimiser will be generating a new plan every time that the procedure is executed, so let’s try re-running the procedure with “Cat” as a parameter:-

Unfortunately we see that although the optimiser is calculating a new plan every time the procedure is run, the new plan for “Cat” still uses a table scan:-

As a last resort we’ll try using the newer OPTION RECOMPILE hint.

Now we’ll execute the procedure again, first for “Dog”:-

Which yields the familiar and reasonable table scan:-

Crucially, the predicate is now showing as [Species]=’Dog’ whereas in our earlier experiments it was showing as [Species]=[@What]

Let’s find out what a difference this is going to make for our problem case:-

Finally! The optimiser is using an index seek with the filtered index that we created.

In fact clearing the plan cache in this particular case is not necessary (I only did it here to validate the experiment by repeating exactly the same sequence as used previously). You can prove this yourself by executing the procedure several times alternating between “Dog” and “Cat” and you’ll still see plans which use the appropriate filtered indices.

As well as now allowing us to use filtered indices to optimise stored procedures, this nicely illustrates the difference between WITH RECOMPILE and OPTION (RECOMPILE).

With the older WITH RECOMPILE clause, the optimiser will generate a new and complete plan in its entirety every time that the procedure is executed.

With the newer OPTION (RECOMPILE), a given statement within the procedure is recompiled independently of the procedure as a whole. An overall plan for the procedure is stored in the plan cache; however the plan for just that one marked statement is created for each execution.

Using WITH RECOMPILE, the optimiser is bound to the immutable rule that any plan it devises must be good for any possible parameter value (Cat or Dog or Crocodile or Shark or….). It therefore cannot make use in its plan of anything value specific – such as the filtered index for “Cat” (since it would not be useful if the parameter given were “Budgerigar”).

Using OPTION (RECOMPILE), the optimiser understands that this statement level plan is for use with one single execution only. With this in mind, the optimiser is therefore able to consider using value specific indices to furnish the query result (like using the filtered index for “Cat” when the parameter given is “Cat”).

Posted in Indexing | Leave a comment

Confusing Signage in SQL Server 2008 R2 Service Pack 2

The other day whilst applying SQL Server 2008 R2 Service Pack 2 to an instance, I had cause to cancel out of the installation. I did this at the pre-flight check stage by clicking the “Cancel” button.

Upon doing this I was rather surprised to be told:-


“If you quit the Setup operation, you must remove SQL Server 2008 R2 before you run setup again”.

 

 

This really didn’t sound right. Cancelling out of a service pack installation would necessitate completely uninstalling SQL Server???

I had to investigate and so I clicked the “Yes” button.

Of course the service pack installation exits cleanly, leaving SQL Server intact. Furthermore it can be re-applied at any time with no trouble at all and without requiring anything to be uninstalled.

It’s just a case of some very confusing signage.

Posted in Uncategorized | 6 Comments

Puzzling Lock Escalation Behaviour in a Partitioned Table

Here’s some lock escalation behaviour that I found interesting and thought I’d share.

This is a link to all the T-SQL in this blog if you’d like to play along with me:-

Play along with SQL

Let’s start by creating a table with an identity column and some kind of data value that we’re interested in also stored as an INT.

By default, SQL Server creates a clustered index when a primary key is declared, but sometimes we might not want this. Here, ID is an artificial key whose value has no real world meaning. Queries are unlikely to reference it as a predicate, so instead we’ve declared the clustered index on another column which does contain real world values more likely to be used as query predicates. This is not an uncommon thing to do.

We’ll now insert some values into the table using a pre-created numbers table. If you don’t have one of these already then just refer to the attached script to create one.

So let’s recap – we have a 200 row table, which contains 200 rows – 100 rows for the year 1666 and 100 rows for the year 1812.

Let’s try to delete all the rows for one of those years (we’ll do this inside a transaction that we’ll keep open just so we can monitor the locking and then roll back the deletion).

Now let’s open up a second query session in Management Studio, and from this other query we’ll delete the rows corresponding to the other year value.

Both deletions are able to succeed, if we execute sp_lock we can see why.

SQL Server has effectively taken out a lock on every individual row that we are asking it to delete.

Actually, there are 400 locks, because 100 rows are being deleted in each of the 2 sessions – but a lock is acquired on both cluster index and primary key for each (100 x 2 x 2).

So, all seems good so far. Our 2 separate sessions have happily deleted the rows that we have asked them to, and neither has trodden on the other’s toes.

Let’s roll back the deletions in both the first session…

…and second query session that we have open.

Now let’s see how things behave when we repeat the same test but with a larger quantity of data. First we’ll increase the number of occurrences of each of our 2 values up from 100 to 15000.

Now, exactly as before we’ll try to delete all the rows for one of those years…

…and in the second query session we’ll delete the rows corresponding to the other year value.

Things behave a little differently this time. The first deletion completes but the second sits there whirring away indefinitely – unable to complete. A call to our friend sp_lock tells us what has happened:-

We can see that this time, instead of a series of individual row locks, the first session has taken out a lock on the whole table. This is now blocking the second session since all the rows it needs to get at have already been locked.

The reason for this behaviour is well documented (and is not the puzzling behaviour to which I was referring). Each lock that SQL Server has to create takes resources and creating many locks therefore takes a lot of resources.

SQL Server defends itself against running out of resources by a process known as lock escalation. Books Online describes this fairly succinctly…

“Lock escalation is the process of converting many fine-grain locks into fewer coarse-grain locks, reducing system overhead while increasing the probability of concurrency contention”.

Indeed this is exactly what we are seeing here. SQL Server realises that creating a lock on so many rows is going to be a very resource intensive operation for it, so it has locked the whole table instead. Good news for resource management, bad news for concurrency (our second session trying to delete rows).

Let’s again roll back the transaction in the first session…

…and stop the deletion and roll back the transaction in the second query session .

So, we need a way to keep this resource preserving behaviour that we’re seeing, but at the same time without completely sacrificing concurrency. One solution to this is of course table partitioning.

With a partitioned table, we increase concurrency by allowing SQL Server an additional level of locking to employ – the partition level. This preserves resources (by allowing many row locks to escalate to a single partition lock) without locking the entire table – thus giving other sessions a chance to do some work at the same time.

Let’s see how this idea works out for us. We’ll create the partition function and scheme:-

We’ll change the lock escalation on the table to allow SQL Server the ability to lock at the partition level:-

We’ll then partition the table by recreating the clustered index on the partition scheme:-

Our table is now nicely partitioned, separating out the 2 values for year that we have in our table into 2 separate partitions. This should solve our previous concurrency problem.

Now, once again we’ll try to delete all the rows for one of those years…

…and in the second query session we’ll delete the rows corresponding to the other year value.

We can now see how well partitioning has fixed the concurrency problem.

It hasn’t.

The second session is still unable to complete and a quick look at sp_lock will once again show that the first session has acquired a table lock – even though SQL Server should surely be taking out a much more environmentally friendly partition lock:-

This behaviour had me scratching my head.

After all, we’ve partitioned the table on the Puzzle_Year column, and our delete commands are predicated on Puzzle_Year. Surely SQL Server should be using partition-level locking here?

The dynamic management view sys.partitions gives us a clue:-

Index 1 (the clustered index) is clearly partitioned into 3.

Index 2 (the primary key) clearly isn’t.

So SQL Server can’t escalate from key lock to partition lock for this primary key (since it isn’t partitioned) – and is therefore escalating straight to table lock.

Once again let’s roll back the transaction in the first session…

…and stop the deletion and roll back the transaction in the second query session .

Now let’s align the Primary key to the partitioning scheme. We can do this by adding the partition key to the primary key:-

Let’s repeat our test one more time…

…and in the second query session

Finally, we have the behaviour that we hoped for:-

You can see that the first session has a single partition lock on the primary key and clustered index, whilst the second session has locked another single partition on the primary key and clustered index.

(The table locks indicated are not eXclusive).

Finally we have achieved our economy and concurrency.

Just worth mentioning though that in the process of solving this puzzling behaviour we’ve also created what database theorists would call a “super key”.

Despite its name, a super key isn’t good. It means that we’ve taken a unique key (the ID column) and then added something else to it (the Puzzle_Month column) which it didn’t need since it was already unique in the first place.

However in this scenario it seems to be the only way to achieve the outcome that we wanted.

Thanks to my friend Ceri for helping me with this puzzling behaviour.

Posted in Indexing, Table Partitioning | Leave a comment