Removing columns named “Column” when importing Excel files with Power Query

When importing Excel files using Power Query, any columns without a name in the first row will end up titled “Column123”, with an number at the end that increments for each column. This happens whether they contain actual data, or whether they’re just formatted columns that contain nothing at all.

Example of imported Excel file with two named columns and two just called “Column”.

If you’re importing a single file, you can rename or delete these columns as you need to. However, if you’re importing an entire folder of Excel files or the format changes unpredictably, you may want to automatically remove all o these columns at the end.

To do this, you’ll use the “Advanced Editor” to add a line of M-code:

"Remove Columns Named Column" = Table.RemoveColumns(#"Promoted Headers", List.FindText(Table.ColumnNames(#"Promoted Headers"), "Column")), 
After adding the new step, columns named “Column” are now gone.

After you add this step, you may need to modify your “Changed Type” step to only set the type on columns you’re keeping, as it automatically mentions every single column by name.

Adding this step will remove the additional columns, dynamically, every time the data is imported. Now when you import a folder full of Excel files and some of them contain additional columns or rogue formatting, the step will drop them if they don’t get named.

View SQL Server table updates per second

When trying to guage the level of database activity, you can use SQL Profiler to view the type and volume of transactions in motion at any given time and to view the overall level of database IO, but you can’t use it to directly tell which database tables are being updated.

However, there’s a handy dynamic management view called sys.dm_db_index_usage_stats that tells you the number of rows that have been updated in each database index since the instance was last restarted (or since the table/index was created, if that happened more recently):

SELECT *
FROM sys.dm_db_index_usage_stats[plain]

The view also has some additional information on index usage, including the number of scans, seeks, and lookups performed on each index – super helpful information if you’re looking for unused indexes or which objects are heaviest-hit. If you look at indexes 0 and 1 (zero is the table heap, 1 is the clustered index), you’ll see activity on the underlying table data itself.

I needed to see the row updates per second for every table in the database, so rather than run that select over and over (and compare the results), I wrote a quick script to do the comparison repeatedly for me:

SET NOCOUNT ON

-- Remove the working table if it already exists
-- so it doesn't get in the way
IF OBJECT_ID('tempdb..#TableActivity_After') IS NOT NULL
DROP TABLE #TableActivity_After

-- Collect our working data
SELECT object_name(us.object_id) as TableName,
		user_updates as UpdatedRows,
		last_user_update as LastUpdateTime
INTO #TableActivity_After
from sys.dm_db_index_usage_stats us
join sys.indexes si
	on us.object_id = si.object_id
	and us.index_id = si.index_id
where database_id = db_id()
and user_seeks + user_scans + user_lookups + user_updates > 0
and si.index_id in (0,1)
order by object_name(us.object_id)

-- Figure out if we're running it the first time or again
-- Put the data into the correct tables
IF OBJECT_ID('tempdb..#TableActivity_Before') IS NULL
BEGIN
	-- First time it's being run - stage the existing data
	PRINT 'Initial table usage collected - execute again for changes'

END
ELSE
BEGIN
	-- Running script a subsequent time
	-- Compare this set of data to our last set

	-- See how long it's been since we ran this script last
	-- Or at least since last change in any table in the database
   DECLARE @SecondsSince DECIMAL(10,2)
	SELECT @SecondsSince = CONVERT(FLOAT, DATEDIFF(ms, MAX(LastUpdateTime ), GETDATE()))/1000
	  FROM #TableActivity_BEFORE

	SELECT @SecondsSince as 'Seconds since last execution'

	-- Do actual table comparison and give results
	SELECT a.TableName,
		   a.updatedrows - isnull(b.UpdatedRows,0) as RowsUpdated,
		  CONVERT(INT, (a.updatedrows - isnull(b.UpdatedRows,0)) / @SecondsSince) as RowsPerSecond
	 FROM #TableActivity_After a
	 LEFT
	 JOIN #TableActivity_Before b
	   ON b.TableName = a.TableName
    WHERE a.updatedrows - isnull(b.UpdatedRows,0) > 0
	ORDER BY RowsUpdated DESC

END

-- Swap the tables so the AFTER table becomes the new BEFORE
-- Then clean up AFTER table since we'll get a new one next time
IF OBJECT_ID('tempdb..#TableActivity_Before') IS NOT NULL
DROP TABLE #TableActivity_Before

SELECT *
  INTO #TableActivity_Before
  FROM #TableActivity_After

DROP TABLE #TableActivity_After

Running that script the first time will grab an snapshot of table activity. Running it again will tell you what has changed since you ran it the first time, and running it again will continue to tell you (updating the “before” image each time so you’re getting an update on only the most recent database activity).

If you wanted to see activity on all database indexes, you could update the query at the top to show index name and remove the “WHERE si.index_id in (0,1)” and you’d see all the index details.

I hope this is helpful – if you have any feedback or would like to see something added, please feel free to leave a comment below!

Download the full script here

Querying Active Directory from SQL Server

SQL Server provides some pretty flexible integration with Active Directory through the ADSI Linked Server provider, something that’s present by default when you install SQL Server. If you’ve never used it before, it allows you to connect to a domain controller and query AD the same way you’d query any other linked server. For example, it gives you the option to:

  • Identify when logins to SQL Servers or databases that support financial applications exist, but have no matching AD account (either direct integrated logins, or if SQL logins or rows in a “User” table have been set up to match the AD login)
  • Kick off alerts to provision the user in various systems based on their AD group membership
  • Automatically trigger an action when a new account appears in active directory (for example, we auto-provision security badges and send an email alert to our head of security to assign the appropriate rights)

While much of this could also be done from Powershell as well, we use the SQL Server Agent to manage many of our scheduled job (because it’s so handy to have the agent remotely accessible), as well as sometimes just needing data from AD in a query. To support a number of processes we have in place, we run a synchronization job every so often throughout the day that pulls about two dozen fields for all users and synchronizes them into a table if anything has changed.

Setting up the linked server itself is pretty straightforward (courtesy of http://community.spiceworks.com/how_to/show/27494-create-a-sql-linked-server-to-adsi):

  1. Create the linked server itself
  2. Set the security context (if you want to query AD as something other than the SQL Server Service account – by default, all domain users can do this and it’s only required if the domain is remote or if, for some reason, your SQL Service account’s AD rights have been restricted, like if you’re running as “LOCAL SERVICE”)
  3. Enable OPENQUERY (Ad Hoc Distributed Queries)

You’ll notice that setting up the linked server itself doesn’t actually specify where Active Directory is located or what domain/forest you’ll be querying – that’s actually done in the query itself. In each query, you’ll need to specify the FQDN (Fully-qualified domain name) of the domain (or OU) of the domain you’re querying. For example, we’d get all users from a domain by issuing the following query (in this example, “ADLinkedServerName” is the linked server we just created, and our domain is “corp.mycompany.local”):

SELECT EmployeeNumber, Name AS FullName, givenName as FirstName, sn as LastName,
L AS Location, samAccountName as ADAccount
FROM OPENQUERY(ADLinkedServerName,'SELECT Name, L, givenName, sn,
EmployeeNumber, EmployeeID,samAccountName,createtimestamp
FROM ''LDAP://OU=Users,DC=corp,DC=mycompany,DC=local''
WHERE objectClass =''user''') ad

This query will search that OU (“Users”, in this case) and everything below it, so changing the FROM to “LDAP://DC=corp,DC=mycompany,DC=local” would fetch the entire directory (for all the “user” objects), regardless of what folder they appeared it – if your directory puts users in another OU (like “Associates”, for example), you should adjust the query accordingly.

For column names, you can pull any AD properties at all that you’re looking for – even custom ones that aren’t part of a standard AD configuration. To get an easy list of AD properties to choose from, I like using ADSIEDIT (part of Microsoft’s Remote Server Administration Tools – download RSAT for Windows 7 or RSAT for Windows 8.1) – just drill down all the way down to an object, like a user, right click on them and select “Properties”, and you can see a list of all the properties on that account. If you’ve got Domain Admin rights, this tool can be used to modify these values too, but for querying, you only need to be a domain user or somebody who has rights to browse AD. Make a note of the names of particular properties that you’re interested in – also note that AD queries are case-sensitive, so you’ll need to note the casing of these properties as well.

One potential gotcha that I’ve run into is that maximum result size that AD will return in a single query can be set as part of domain policy – by default it’s 1000 records at once, and can be configured by setting or adjusting the “PageSize” property on your domain controllers (see https://support.microsoft.com/kb/315071/en-us). Also, there’s a “MaxResultSetSize” property as well that’s set to 256KB by default, but I’ve never hit it – unless you’re pulling every single property back, you’d likely hit the PageSize row limit before you hit the ResultSize byte limit, but remember that both are there. If you do hit the AD result count limit, it will return the rows up to the limit, but then execution stops with a kind of cryptic error:

Msg 7330, Level 16, State 2, Line 2
Cannot fetch a row from OLE DB provider "ADsDSOObject" for linked server "YOURDOMAIN".

If your domain is larger than the PageSize limit, you’ll need to cut your query into multiple return sets of data so you don’t exceed the limit on any single query. Since our domain contains about 2400 users, we were able to do it in two queries, broken up like this:

SELECT samAccountName
  FROM OPENQUERY(ADLinkedServerName,'SELECT samAccountName
                                       FROM ''LDAP://OU=Users,DC=corp,DC=mycompany,DC=local''
                                      WHERE objectClass =''user''
                                        AND givenName<''L''') as c
UNION ALL
SELECT samAccountName
  FROM OPENQUERY(ADLinkedServerName,'SELECT samAccountName
                                       FROM ''LDAP://OU=Users,DC=corp,DC=mycompany,DC=local''
                                      WHERE objectClass =''user''
                                        AND givenName>=''L''') as c

By dividing the names on L, this cut the directory roughly in half – if yours was larger, you could divide it by querying each OU separately, or by looping through letters of the alphabet, or whatever makes sense in your setting. You could even do something dynamic like pull as many records as you can, then grab the value from the last record you pulled and use it as the baseline to pull the next set as far as you can, and then repeat until you run out of records. Linked servers don’t allow you to dynamically assemble your query at run-time – it has to be hard-coded in the query – but there are some ways around that (like building your OPENQUERY as a string and then executing it via sp_executesql, for example).

Now that you have your AD records stored in a temp table, you can identify new/changed records and merge them into a SQL table you already have ready using an INSERT/UPDATE/DELETE or MERGE statement, or possibly trigger notifications or some other business process.

I hope this is helpful – if you’d like some more detail, please leave a comment and I’m happy to elaborate where it’s necessary!

Refreshing changed .NET SQL CLR assemblies after patching/updates

After applying some Windows updates to one of my servers, I started getting the following error when I ran a customer .NET SQL-CLR stored proc:

Server: Msg 6522, Level 16, State 2, Line 1
A .NET Framework error occurred during execution of user defined routine or aggregate ‘somemethodyourecalling’:

System.IO.FileLoadException: Could not load file or assembly ‘System.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies. Assembly in host store has a different signature than assembly in GAC. (Exception from HRESULT: 0x80131050)

I’d imported some additional assemblies into SQL Server for use in SQL CLR mapped stored procedures, and the Windows updates had included a service pack for .NET, which changed some of the assemblies. Now the version I’d linked to SQL Server no longer existed on disk, and SQL couldn’t load it – that made sense, and is documented pretty clearly in this MS Support article: http://support.microsoft.com/kb/949080

However, I had dozens of servers with SQL CLR components enabled, and hundreds of different assemblies loaded across them all, and not always the same in each server, so a standard update script wouldn’t work to refresh all the changed assemblies (the MS Support link provides a list of the standard ones that cause that error, but if you’ve got custom assemblies loaded, or you’ve loaded an assembly that’s not specifically cleared for SQL CLR, then it’s not on the list either). To deal with this, I wrote a script that fetches the assembly list for a database and attempts to refresh every one of them from their disk location. If they haven’t changed, the update attempt will fail with a specific error message about the MVID, and there’s no change for that assembly.

Also, I’ve commented out the line that restricts it to just framework assemblies (System.* and Microsoft.*), but you can uncomment that line if you’d like to restrict the refresh from attempting to reload your custom assemblies as well.

DECLARE @AssemblyName VARCHAR(255),
	    @AssemblyLocation VARCHAR(255),
	    @AlterAssemblyCommand NVARCHAR(1024),
	    @DotNetFolder VARCHAR(100)

   SET @DotNetFolder = 'C:\Windows\Microsoft.NET\Framework\v2.0.50727'

CREATE TABLE #Results (
		AssemblyName VARCHAR(255),
		AssemblyLocation VARCHAR(255),
		AlterAssemblyCommand NVARCHAR(1024),
		Results VARCHAR(1024)
)

select sa.name as AssemblyName,
		saf.name as Assemblylocation,
		case when charindex('', saf.name) = 0
			then 'ALTER ASSEMBLY [' + sa.name + '] FROM ''' + @DotNetFolder
			else 'ALTER ASSEMBLY [' + sa.name + '] FROM '''
		end + saf.name + (case right(saf.name, 4) when '.dll' then '' else '.dll' end) + ''''
		as AlterAssemblyCommand
INTO #Refresh
from sys.assemblies sa
join sys.assembly_files saf
  on sa.assembly_id = saf.assembly_id
where sa.name <> ('Microsoft.SqlServer.Types')
  --and (sa.name like 'System.%' or sa.name like 'microsoft.%')

DECLARE Commands CURSOR FAST_FORWARD FOR
SELECT AssemblyName,
	   AssemblyLocation,
	   AlterAssemblyCommand
  FROM #Refresh

OPEN Commands

FETCH NEXT FROM Commands
INTO @AssemblyName,
	   @AssemblyLocation,
	   @AlterAssemblyCommand

WHILE @@FETCH_STATUS = 0
BEGIN

	BEGIN TRY
		exec sp_executesql @AlterAssemblyCommand

		insert into #Results
		SELECT @AssemblyName,
				@AssemblyLocation,
				@AlterAssemblyCommand,
				'Assembly refreshed successfully'

	END TRY
	BEGIN CATCH

		insert into #Results
		SELECT @AssemblyName,
				@AssemblyLocation,
				@AlterAssemblyCommand,
				CASE ERROR_NUMBER()
					WHEN 6285 THEN 'No update necessary (MVID match)'
					WHEN 6501 THEN 'Physical assembly not found at specified location (SQL Error 6501)'
					ELSE ERROR_MESSAGE() + ' (SQL Error ' + convert(varchar(10), ERROR_NUMBER()) + ')'
				END

	END CATCH

	FETCH NEXT FROM Commands
	INTO @AssemblyName,
		   @AssemblyLocation,
		   @AlterAssemblyCommand

END

CLOSE Commands
DEALLOCATE Commands

SELECT * FROM #Results

drop table #refresh
drop table #Results

While troubleshooting the error, I came across this as well – I don’t have a SQL 2012 server handy to check with, it looks like this problem might be resolved with a reboot in SQL 2012:

http://msdn.microsoft.com/en-us/library/hh479773.aspx

I’m not sure if that’s the case, but it would make things easier. Also, it would be nice if there was some kind of “ALTER ASSEMBLY [SomeAssembly] REFRESH” command to address this problem, but maybe in a future version.

Additionally, this error can be caused if the signing of an assembly has changed, but not necessarily the signature, but just reloading it from disk won’t work because the method that SQL Server uses to determine if there’s been a change is the MVID, and that doesn’t change unless there’s been a code change of some kind (see the Connect bug here). In those cases, you’ll actually need to drop and recreate the assembly, including any supporting objects that rely on that assembly.

Export from SQL Server to XLS and email results

Sometimes you want to take some query results and export them directly to an XLS file – here’s how you can set that up in SQL Server. The biggest caveat is that you need to run it from an x86 instance of SQL Server – the x64 instance won’t have access to the Jet driver needed to write the Excel file (Microsoft.Jet.OLEDB.4.0), where the x86 version will. In fact, we maintain an older x86 instance of SQL Server for random processes like this that need it – x64 is better in almost every case, but we can’t see to completely ditch x86… 🙂

I use a stored proc that I call from a SQL Agent Job, which works great. The actual process is a bit awkward – for starters, you’ll need access to xp_cmdshell. SQL Server can’t create a new Excel file from scratch, so you have to keep a blank Excel file around, make a copy of it, and then insert into the copy to get your final result.

That said, here’s the code to generate the XLS file from your query results:

SELECT Column1, Column2, Column3, Column4
  INTO ##YourTempTable
  FROM SomeOtherTable

SET @Folder = 'C:\Temp\'
SET @DocumentBlank = 'Your Document - Blank'
SET @DocumentLong = 'Your Document - ' + CONVERT(VARCHAR(10), GETDATE(), 120)

DECLARE @CMD NVARCHAR(4000)
SET @CMD = 'COPY "' + @folder + @DocumentBlank + '.xls" "' + @Folder + @DocumentLong + '.xls"'
exec master..xp_cmdshell @CMD

-- Export the Excel sheet
SET @CMD = 'insert into OPENROWSET(''Microsoft.Jet.OLEDB.4.0'',
	''Excel 8.0;Database=' + @Folder + @DocumentLong + '.xls;'',
	''SELECT * FROM [Sheet1$]'')
	select Column1, Column2, Column3, Column4 from ##YourTempTable'

exec sp_executesql @CMD

Once that’s exported, you can just set up the email process using sp_send_dbmail and attach the file you just generated:

DECLARE @Body VARCHAR(2000)

SET @Attachments = @Folder + @DocumentLong  + '.xls'
SET @Body = 'Your file has been generated for ' + CONVERT(VARCHAR(10), GETDATE(), 120)

exec msdb..sp_send_dbmail @profile_name = 'YourMailProfile',
	@Recipients = 'Recipients@YourDomain.biz',
	@subject = 'Your file is ready',
	@Body = @Body,
	@file_attachments = @DocumentLong