Excel – SQL Ryan

Removing columns named “Column” when importing Excel files with Power Query

When importing Excel files using Power Query, any columns without a name in the first row will end up titled “Column123”, with an number at the end that increments for each column. This happens whether they contain actual data, or whether they’re just formatted columns that contain nothing at all.

Example of imported Excel file with two named columns and two just called “Column”.

If you’re importing a single file, you can rename or delete these columns as you need to. However, if you’re importing an entire folder of Excel files or the format changes unpredictably, you may want to automatically remove all o these columns at the end.

To do this, you’ll use the “Advanced Editor” to add a line of M-code:

"Remove Columns Named Column" = Table.RemoveColumns(#"Promoted Headers", List.FindText(Table.ColumnNames(#"Promoted Headers"), "Column")),

After adding the new step, columns named “Column” are now gone.

After you add this step, you may need to modify your “Changed Type” step to only set the type on columns you’re keeping, as it automatically mentions every single column by name.

Adding this step will remove the additional columns, dynamically, every time the data is imported. Now when you import a folder full of Excel files and some of them contain additional columns or rogue formatting, the step will drop them if they don’t get named.

Import first sheet of an Excel workbook into Power Query, regardless of its name

I recently had to import a large number of Excel workbooks into Power Query (using “From Folder” – if you haven’t used this feature, it’s a huge time-saver when you’ve got a ton of matching files to import). The problem was that each workbook had a different name for the first tab and the import process gets data from tabs with the same name (assuming all your files match).

By default when you create a workbook, Excel names the first sheet “Sheet1”, but if the files you’re importing have sheets with different names, you can tell Power Query to load the first sheet of each file, regardless of the name – here’s the process (skip to step 5 to see the specific M-query solution):

Select “Get Data” -> “From File” -> “From Folder” – this lets you import all the files from a folder at once (assuming their format matches).
Select the folder where your files are stored (I have three files in the folder in this example).
Select an example file and which sheet you’d like to import – this is what you’ll use to create your transformations (before the files are merged together) – in this example, my sheet is called “Bob’s sheet” (which is specific to just that single file).
When you select “Import”, you’ll receive the error “[Expression.Error] The key didn’t match any rows in the table.” – this is because the other workbooks don’t contain any sheets with that name (though the error is a bit cryptic).
If you double-click on the “Data Files” query with the error, you can see that it loaded the data from the first file, but nothing from the others:
To correct this issue and tell Power Query to just use the first tab in the workbook in every case, open the “Transform Sample File from Data Files” query and click “Advanced Editor”.
You’re now looking at the M-query behind this object (if you want it, here’s a great video on M if you’ve never worked it before and want a primer – I watched a bunch of different videos and this one really connected the dots for me). The section with the sheet name is highlighted:
Replace the highlighted text with “Item=Source{0}[Item]” so that it looks like this:
(The “0” refers to the first sheet – if you want the second sheet, you’d use “1” and so on). If you’re using an XLS file rather than an XLSX file, the text looks a little bit different: “Name=Source{0}[Name]“. The XLSX file has a Name field as well, but it can vary from the “Item” column.
Once you make change, you can see in the Power Query window that it’s now getting data from every file.
Once you click “Close and Load” to get back to Excel, click on the “Refresh All” button to force a data refresh and you’ll see it import all the files, regardless of their sheet name:

That’s it – you’re pulling in all the workbooks regardless of their sheet name!

I’d never used the “From Folder” feature, but I’d definitely recommend trying it out – it was a huge time-saver for me and allowed me to import hundreds of matching Excel files into a single model in minutes!

Update: When you’re using older XLS files

Somebody brought to my attention that this technique doesn’t work when you’re using XLS files, only when you’re using the XLSX/XLSB format (I won’t call it new as it’s been around since Office 2007, but still – newish).

If you’re using XLS files, you need to change step 6 slightly as M uses a different syntax to drill into sheets in those files. Here’s what the M query looks like when you drill in manually:

Replace the highlighted portion ([Name=”Sheet1″]) with just an index number – “0” in this case:

Once you do that, Power Query will drill into the sheet in the the first row of your dataset.

However, I did notice an issue – the sheet in position “0” wasn’t the first sheet in the workbook. I added a number of sheets in my Excel doc, but I did it them in a random order (I renamed “Sheet1” to “Original” and then created the others in the order they’re named):

However, when you import the workbook in Power Query, the physical order isn’t reflected – it seems to be the order they were created, but with the original sheet always last (not sure what the thinking is there – maybe something in the physical file format?):

If you want the “Original” sheet, you can use the Power Query step “Keep Bottom Rows” and give it a value of “1”:

If you want one of the other sheets, though, and the name keeps changing, I’m not sure how you’d handle ensuring you get the right one. If you’ve got suggestions, I’d love to hear them and I’ll include them here!

Update 2

Another question from the comments – what if you want to get the last sheet in the workbook, not the first one? We can make that happen with one additional line right after Source and right before the line where you select the sheet:

"Kept Last Rows" = Table.LastN(Source, 1),

This is the command you get when you select “Keep Rows” -> “Keep Bottom Rows” from the toolbar up top, and you’ll be left with a single row to drill into. You’ll still need to drill into it as Row 0 so that it will work no matter what the sheet name is.

Export from SQL Server to XLS and email results

Sometimes you want to take some query results and export them directly to an XLS file – here’s how you can set that up in SQL Server. The biggest caveat is that you need to run it from an x86 instance of SQL Server – the x64 instance won’t have access to the Jet driver needed to write the Excel file (Microsoft.Jet.OLEDB.4.0), where the x86 version will. In fact, we maintain an older x86 instance of SQL Server for random processes like this that need it – x64 is better in almost every case, but we can’t see to completely ditch x86… 🙂

I use a stored proc that I call from a SQL Agent Job, which works great. The actual process is a bit awkward – for starters, you’ll need access to xp_cmdshell. SQL Server can’t create a new Excel file from scratch, so you have to keep a blank Excel file around, make a copy of it, and then insert into the copy to get your final result.

That said, here’s the code to generate the XLS file from your query results:

SELECT Column1, Column2, Column3, Column4
  INTO ##YourTempTable
  FROM SomeOtherTable

SET @Folder = 'C:\Temp\'
SET @DocumentBlank = 'Your Document - Blank'
SET @DocumentLong = 'Your Document - ' + CONVERT(VARCHAR(10), GETDATE(), 120)

DECLARE @CMD NVARCHAR(4000)
SET @CMD = 'COPY &quot;' + @folder + @DocumentBlank + '.xls&quot; &quot;' + @Folder + @DocumentLong + '.xls&quot;'
exec master..xp_cmdshell @CMD

-- Export the Excel sheet
SET @CMD = 'insert into OPENROWSET(''Microsoft.Jet.OLEDB.4.0'',
	''Excel 8.0;Database=' + @Folder + @DocumentLong + '.xls;'',
	''SELECT * FROM [Sheet1$]'')
	select Column1, Column2, Column3, Column4 from ##YourTempTable'

exec sp_executesql @CMD

Once that’s exported, you can just set up the email process using sp_send_dbmail and attach the file you just generated:

DECLARE @Body VARCHAR(2000)

SET @Attachments = @Folder + @DocumentLong  + '.xls'
SET @Body = 'Your file has been generated for ' + CONVERT(VARCHAR(10), GETDATE(), 120)

exec msdb..sp_send_dbmail @profile_name = 'YourMailProfile',
	@Recipients = 'Recipients@YourDomain.biz',
	@subject = 'Your file is ready',
	@Body = @Body,
	@file_attachments = @DocumentLong