Removing columns named “Column” when importing Excel files with Power Query

When importing Excel files using Power Query, any columns without a name in the first row will end up titled “Column123”, with an number at the end that increments for each column. This happens whether they contain actual data, or whether they’re just formatted columns that contain nothing at all.

Example of imported Excel file with two named columns and two just called “Column”.

If you’re importing a single file, you can rename or delete these columns as you need to. However, if you’re importing an entire folder of Excel files or the format changes unpredictably, you may want to automatically remove all o these columns at the end.

To do this, you’ll use the “Advanced Editor” to add a line of M-code:

"Remove Columns Named Column" = Table.RemoveColumns(#"Promoted Headers", List.FindText(Table.ColumnNames(#"Promoted Headers"), "Column")), 
After adding the new step, columns named “Column” are now gone.

After you add this step, you may need to modify your “Changed Type” step to only set the type on columns you’re keeping, as it automatically mentions every single column by name.

Adding this step will remove the additional columns, dynamically, every time the data is imported. Now when you import a folder full of Excel files and some of them contain additional columns or rogue formatting, the step will drop them if they don’t get named.

Import first sheet of an Excel workbook into Power Query, regardless of its name

I recently had to import a large number of Excel workbooks into Power Query (using “From Folder” – if you haven’t used this feature, it’s a huge time-saver when you’ve got a ton of matching files to import). The problem was that each workbook had a different name for the first tab and the import process gets data from tabs with the same name (assuming all your files match).

By default when you create a workbook, Excel names the first sheet “Sheet1”, but if the files you’re importing have sheets with different names, you can tell Power Query to load the first sheet of each file, regardless of the name – here’s the process (skip to step 5 to see the specific M-query solution):

  1. Select “Get Data” -> “From File” -> “From Folder” – this lets you import all the files from a folder at once (assuming their format matches).
  2. Select the folder where your files are stored (I have three files in the folder in this example).
  3. Select an example file and which sheet you’d like to import – this is what you’ll use to create your transformations (before the files are merged together) – in this example, my sheet is called “Bob’s sheet” (which is specific to just that single file).
  4. When you select “Import”, you’ll receive the error “[Expression.Error] The key didn’t match any rows in the table.” – this is because the other workbooks don’t contain any sheets with that name (though the error is a bit cryptic).
    If you double-click on the “Data Files” query with the error, you can see that it loaded the data from the first file, but nothing from the others:
  5. To correct this issue and tell Power Query to just use the first tab in the workbook in every case, open the “Transform Sample File from Data Files” query and click “Advanced Editor”.
  6. You’re now looking at the M-query behind this object (if you want it, here’s a great video on M if you’ve never worked it before and want a primer – I watched a bunch of different videos and this one really connected the dots for me). The section with the sheet name is highlighted:
    Replace the highlighted text with “Item=Source{0}[Item]” so that it looks like this:
    (The “0” refers to the first sheet – if you want the second sheet, you’d use “1” and so on)
  7. Once you make change, you can see in the Power Query window that it’s now getting data from every file.
  8. Once you click “Close and Load” to get back to Excel, click on the “Refresh All” button to force a data refresh and you’ll see it import all the files, regardless of their sheet name:

That’s it – you’re pulling in all the workbooks regardless of their sheet name!

I’d never used the “From Folder” feature, but I’d definitely recommend trying it out – it was a huge time-saver for me and allowed me to import hundreds of matching Excel files into a single model in minutes!

Update: When you’re using older XLS files

Somebody brought to my attention that this technique doesn’t work when you’re using XLS files, only when you’re using the XLSX/XLSB format (I won’t call it new as it’s been around since Office 2007, but still – newish).

If you’re using XLS files, you need to change step 6 slightly as M uses a different syntax to drill into sheets in those files. Here’s what the M query looks like when you drill in manually:

Replace the highlighted portion ([Name=”Sheet1″]) with just an index number – “0” in this case:

Once you do that, Power Query will drill into the sheet in the the first row of your dataset.

However, I did notice an issue – the sheet in position “0” wasn’t the first sheet in the workbook. I added a number of sheets in my Excel doc, but I did it them in a random order (I renamed “Sheet1” to “Original” and then created the others in the order they’re named):

However, when you import the workbook in Power Query, the physical order isn’t reflected – it seems to be the order they were created, but with the original sheet always last (not sure what the thinking is there – maybe something in the physical file format?):

If you want the “Original” sheet, you can use the Power Query step “Keep Bottom Rows” and give it a value of “1”:

If you want one of the other sheets, though, and the name keeps changing, I’m not sure how you’d handle ensuring you get the right one. If you’ve got suggestions, I’d love to hear them and I’ll include them here!

Migrate database indexes to a new file group

I recently had to mass-migrate all the indexes from a database to a new file group since we’d added some additional storage to our database server. I found¬†this article at SQL Server Central¬†(unfortunately, registration required, so I’ve included a copy of the original script in the download at the end). While it worked okay, there were some things I didn’t like about it:

  • Assumed 90% index fill rate
  • “Moved” indexes were all created as non-unique, regardless of original
  • Fail during index creation left you without an index (drop and then create, with no rollback)
  • Table was un-indexed during the move (index dropped and then created)
  • Script re-created indexes without any “Included” columns, even if original index had them

To address these limitations, I rebuilt the process using that script as a starting point. The new script:

  • Uses 90% fill rate by default, but if the original index had a different rate specified, it will use that
  • Re-creates indexes as unique if the source index was unique
  • Rollback problem resolved – new index is created with different name, old index is dropped, and then new index is renamed, all in a TRY-CATCH block
  • Since the new index is created and then the old one dropped, table indexing remains “online” during the move
  • Migrates “Included” columns in index
  • Updated the script to use SYS views (breaks compatibility with SQL 2000, since SYS is 2005/2008/beyond only)
I welcome any feedback on the script, and would love to know if you see any improvements that should be made.

Download .SQL scripts (contains both Original and Modified scripts)