Converting two-column DataFrame to a dictionary in Python

If you have a Pandas dataframe like the following (if you already have a DataFrame based on a query result, you can use that):

import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Basic Pandas DataFrame

I wanted a dictionary where I could provide the name (“tom”, “nick”, or “juli”) and receive their age as the result. To convert your DataFrame to a dictionary like this, you’d use the following command:

my_dict = pd.Series(df.Age.values,index=df.Name).to_dict()
print(my_dict)
Python dict with three key:value pairs

Now that you have a dictionary, you can do lookups or whatever else you want. Make sure you take note above that the first parameter uses “values” where the second one doesn’t. If you use values in the second parameter, or you just try to use the “.todict()” function on the DataFrame directly, without converting it to a series first as above, you can end up with Dictionaries with an entry for each column, or dictionaries that contain dictionaries, or a number of other things that don’t work.

Short and to the point – this took me about an hour to find the answer to and hopefully you find it before losing that much time!

Run Windows Troubleshooting Wizards manually (from the command line)

There are some great resources online that show you how to manually run various Windows troubleshooters from the command line (a great summary, or the details from Technet), but they didn’t include some of the newer items in Windows 10, including the Windows Store Apps Troubleshooter.

To run these manually, you can use the following commands (in some cases, you’ll be prompted to elevate to admin rights if you’re not already running CMD or Powershell as an admin):

Bluetooth:

%systemroot%\system32\msdt.exe -id  BluetoothDiagnostic

Keyboard:

%systemroot%\system32\msdt.exe -id  KeyboardDiagnostic

Speech:

%systemroot%\system32\msdt.exe -id  SpeechDiagnosticCalibrate

Video Playback:

 %systemroot%\system32\msdt.exe -id  VideoPlaybackDiagnostic

Windows Store Apps:

%systemroot%\system32\msdt.exe -id AppsDiagnostic

Additionally, here are the Troubleshooting packages listed at the Technet site in case that’s unavailable in the future:

Aero effects, like transparency:

%systemroot%\system32\msdt.exe -id  AeroDiagnostic

Troubleshoots problems connecting to a workplace network over the Internet using Direct Access:

%systemroot%\system32\msdt.exe -id  NetworkDiagnosticsDA

Troubleshoots problems using hardware and access devices connected to the computer:

%systemroot%\system32\msdt.exe -id  DeviceDiagnostic

Troubleshoots problems viewing computers or shared files in a homegroup:

%systemroot%\system32\msdt.exe -id  HomeGroupDiagnostic

Troubleshoots problems with allowing other computers to communicate with the target computer through Windows Firewall:

%systemroot%\system32\msdt.exe -id  NetworkDiagnosticsInbound

Troubleshoots problems connecting to the Internet or to a specific Web site:

%systemroot%\system32\msdt.exe -id  NetworkDiagnosticsWeb

Helps the user prevent add-on problems and optimize temporary files and connections:

%systemroot%\system32\msdt.exe -id  IEDiagnostic

Helps the user prevent malware, pop-ups, and online attacks:

%systemroot%\system32\msdt.exe -id  IESecurityDiagnostic

Troubleshoots problems with Ethernet, wireless, or other network adapters:

%systemroot%\system32\msdt.exe -id  NetworkDiagnosticsNetworkAdapter

Helps the user adjust settings to improve operating system speed and performance:

%systemroot%\system32\msdt.exe -id   PerformanceDiagnostic

Troubleshoots problems playing sounds and other audio files:

%systemroot%\system32\msdt.exe -id  AudioPlaybackDiagnostic

Helps the user adjust power settings to improve battery life and reduce power consumption:

%systemroot%\system32\msdt.exe -id  PowerDiagnostic

Troubleshoots problems printing:

%systemroot%\system32\msdt.exe -id  PrinterDiagnostic

Helps the user configure older programs so that they can run in the current version of Windows:

%systemroot%\system32\msdt.exe -id  PCWDiagnostic

Troubleshoots problems recording audio from a microphone or other input source:

%systemroot%\system32\msdt.exe -id  AudioRecordingDiagnostic

Troubleshoots problems with search and indexing using Windows Search:

%systemroot%\system32\msdt.exe -id  SearchDiagnostic

Troubleshoots problems accessing shared files and folders on other computers over the network:

%systemroot%\system32\msdt.exe -id  NetworkDiagnosticsFileShare

Helps the user clean up unused files and shortcuts and perform other maintenance tasks:

%systemroot%\system32\msdt.exe -id  MaintenanceDiagnostic

Troubleshoots problems playing a DVD using Windows Media Player:

%systemroot%\system32\msdt.exe -id  WindowsMediaPlayerDVDDiagnostic

Troubleshoots problems with adding media files to the Windows Media Player library:

%systemroot%\system32\msdt.exe -id  WindowsMediaPlayerLibraryDiagnostic

Helps the user reset Windows Media Player settings to the default configuration:

%systemroot%\system32\msdt.exe -id  WindowsMediaPlayerConfigurationDiagnostic

Troubleshoots problems that prevent Windows Update from performing update tasks:

%systemroot%\system32\msdt.exe -id  WindowsUpdateDiagnostic

To get a full list of Troubleshooters currently installed on your system, you can run the following command and then select “Additional Troubleshooters” if necessary:

%systemroot%\system32\control.exe /name Microsoft.Troubleshooting

“There is no data in the database for @catalogitemid” in Power BI Report Server

After updating from the May 2020 release of Power BI Report Server to the January 2021 release, we started to receive this error during our report refresh schedules:

An error has occurred. There is no data in the database for @CatalogItemId

This was happening when multiple reports were refreshing at once, during our overnight schedules, as well as when multiple reports were manually refreshed at the same time. I found a post that seemed to confirm this (https://community.powerbi.com/t5/Report-Server/Scheduled-Refresh-error-quot-There-is-no-data-in-the-database/m-p/1505277) and suggested that you change your schedules so multiple reports don’t refresh at once – since were using shared schedules (including a schedule that we trigger via the API when our ETL finishes), that wasn’t an option for us.

To work around it, we had to refresh our reports one at a time, waiting until the currently one had completed before starting the next one – if you started the next one before the previous was complete, it seemed like about a 50/50 chance that the second report would fail with this error. This left a 0% chance that the overnight schedules that were attached to dozens of reports would even refresh 2-3 successfully.

I came across a post in the Power BI Community forum (https://community.powerbi.com/t5/Report-Server/There-is-no-data-in-the-database-for-catalogitemid/m-p/1721489) where somebody shared that it was caused by a code change to the code that loads the data model from the SQL database in the October 2020 release – switching a query hint in a Stored Procedure (GetCatalogExtendedContentData) from READPAST to NOWAIT (it was even annotated with a comment from the PBIRS developer that made it – nice work, whoever this was! This is a great example of a comment that has the “Why” and not the “What”, so it’s actually useful):

FROM [CatalogItemExtendedContent] WITH (NOWAIT) -- DevNote: Using NOWAIT here because for large models the row might be locked for long durations. Fail fast and let the client retry.

You can read more about query hints in the Microsoft Docs, but at a high level, READPAST (the previous code) causes a query to skip past and locked rows as if they’re not even in the table – NOWAIT, however, causes a query to fail with a lock timeout error as soon as it encounters any locked rows that it wants to read. The code change results in a failure any time we attempted to refresh a report while a previous refresh has any data model contents locked.

I couldn’t force this error to happen to one of our test environments, no matter how many reports I refreshed at once or large the data models were – given this, I wasn’t sure that fixing this code would resolve our error since the test environment had the code with the NOWAIT hint. I wanted to be sure, so I ran a SQL Profiler trace while refreshing some reports in production so I could see the error happen (WARNING – Profiler in production is usually a bad idea as it drags down performance, so proceed with caution here). Here were the three lines I saw that confirmed it:

SQL Profiler trace showing Lock Request Timeout exception followed by the SP call that’s failing

This was exactly where the forum said the code had changed – I was seeing the stored procedure throw an immediate Lock Timeout, which was the confirmation that I needed. I ran the following script with SQL Management Studio in the Power BI Report Server database to revert the code to the May 2020 query hint (red is the code I commented out and blue is the code I added):

 USE [PBIReportServer]
 GO
 ALTER PROCEDURE [dbo].[GetCatalogExtendedContentData]
     @CatalogItemID UNIQUEIDENTIFIER,
     @ContentType VARCHAR(50)
 AS
 BEGIN
     SELECT
         DATALENGTH([Content]) AS ContentLength,
         [Content]
     FROM
         [CatalogItemExtendedContent] WITH (READPAST) -- Reverting this back to May 2020 code. Commented line below is original in January 2021 version.
 --        [CatalogItemExtendedContent] WITH (NOWAIT) -- DevNote: Using NOWAIT here because for large models the row might be locked for long durations. Fail fast and let the client retry.
     WHERE
         [ItemID] = @CatalogItemID AND ContentType = @ContentType
 END
 GO

With this change deployed, we didn’t have a single report refresh failure over the weekend when our shared schedules ran! I’m not sure why I couldn’t force the issue in our lower environments, though I suspect it has something to do with either lower total data model size stored there, or index maintenance, or something else related to the data storage. I checked and all the SQL Server settings I could see where the same (as they should be) so I can’t confirm what additional situation causes this error (since clearly it doesn’t happen in every installation).

If you are able to use this successfully or have issues, please weigh in on the community post above or in the comments below!

Setting your queue name with Microsoft Hive ODBC

When setting up an ODBC connection to a Hive server, you sometimes need to change your queue name – if the default queue you’re assigned doesn’t work for what you need to do, here’s how you change it in the DSN settings. You can use this process to set any other “Server Side Property” (or SSP) that your server requires.

  1. Open your DSN settings window for the data source where you want to set to queue name, and click “Advanced Options…” in the lower left:
The DSN Settings window that shows the common settings. This window is currently showing the default settings for the ODBC driver and your settings may be different.
  1. Click “Server Side Properties” in the lower-left corner of the Advanced Options window:
The “Advanced Settings” window with default settings.
  1. In the Properties window, click “Add..” on the right side to create a new property:
The window where you add new server side properties.
  1. Add any properties required by your Hive instance, including the mapreduce queue name if needed. You may have to ask your administrator what the name of this property is – some servers use “mapreduce.job.queuename” by default, and one server I use has this defined as “mapred.job.queue.name“, so you’ll need to confirm your server’s value. Once you know what it is, create it and click OK, and then it should look like this:
Show the results of the server side property for queue name once it’s set.
  1. That’s it! Go ahead and click “OK” and save the data source name settings, then you should be good to go!

If you have the need to use different queues for different purposes (or any other settings change based on your work), you can create multiple versions of your ODBC DSN and connect to whichever one is appropriate for the work you’re currently doing.

If you’re looking for a Hive driver, I’m showing the Microsoft Hive ODBC driver in this walkthrough and I’ve had a great experience with it (and it’s free – most other drivers at least require you to give your contact information, but this one doesn’t). There are other drivers you can use, and each has a place you can set any SSP that you need for your connection – please check the documentation for your specific driver if you need details.

Enjoy!

Power BI Report Server Oct 2020 – “Same key already added” error on refresh

If you’ve upgraded to the October 2020 release of Power BI Server, you might be seeing the following error when you refresh some of your reports:

[0] -1056505856: COM error: mscorlib, An item with the same key has already been added..

This error appeared when we install the November 18th patch for Power BI Server (15.0.1104.264) and wasn’t present in the original October 2020 release. The especially frustrating part was that reports refreshed just fine in the Power BI Desktop client, and the issue only appeared after publishing the report up to Power BI Server, making it much more difficult to troubleshoot.

However, the resolution ended up being pretty straightforward – there are two different ways to define a connection to a SQL Server database, and if you use both methods in the same data model and then merge the results, Power BI Server has a problem with it (and Power BI Desktop does not). To correct this issue, you need to pick a method and ensure that all the SQL Server connections in your data model use this single method to connect to the database.

One method is used when you specify a database server, but don’t specify a database:

SQL Server connection with no database specified

This results in the following M query, with the connection taking up the first two lines (one for server, one for database):

M-query showing two-line database connection

If you specify the database in the connection window instead, it merges the connection statement onto a single line and uses Sql.Database (singular) instead of Sql.Databases (plural). Since you’re required to specify a database name when you provide a SQL query, so a custom query always uses this one-line method to connect:

Connection window showing database name
M-query database connection merged into a single line

Regardless of whether you enter the database name (and have the one-line connection method) or don’t enter a database (and use the two-line method), it looks exactly the same in the step window in Power Query:

“Source” is the database connection, whether it’s one or two lines. Navigation is the step that specifies the table/view you’re connecting to, and is the “MyTable” step in the examples above.

I’d recommend using method two for all your connections, as that’s what’s needed to run a specific SQL query – if any of your data sources and query-based and not just selecting a table, you’ll have to use this approach.

I hope this helps somebody else! While I’d love it if Power BI Server avoided errors like this one, I’d also really appreciate it if Power BI Server and Desktop used the same code for shared functionality so that you can be aware of issues before you publish your reports and they fail to refresh.

This took some frustration to resolve and I’m thankful for finally finding the resolution to this issue through a bug report that “JeanMartinL” had filed: https://community.powerbi.com/t5/Issues/BUG-Report-Server-refresh-fails-if-data-source-are-created/idi-p/1528954. Whoever you are, thank you!