Converting two-column DataFrame to a dictionary in Python

If you have a Pandas dataframe like the following (if you already have a DataFrame based on a query result, you can use that):

import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Basic Pandas DataFrame

I wanted a dictionary where I could provide the name (“tom”, “nick”, or “juli”) and receive their age as the result. To convert your DataFrame to a dictionary like this, you’d use the following command:

my_dict = pd.Series(df.Age.values,index=df.Name).to_dict()
print(my_dict)
Python dict with three key:value pairs

Now that you have a dictionary, you can do lookups or whatever else you want. Make sure you take note above that the first parameter uses “values” where the second one doesn’t. If you use values in the second parameter, or you just try to use the “.todict()” function on the DataFrame directly, without converting it to a series first as above, you can end up with Dictionaries with an entry for each column, or dictionaries that contain dictionaries, or a number of other things that don’t work.

Short and to the point – this took me about an hour to find the answer to and hopefully you find it before losing that much time!

Identifying row in SQL view (or table) with a calculation error

Computed columns can be a great tool if you want to add a value to a table that’s dependent on other columns in the table, and you don’t want to maintain it on its own every time the source columns change. In a view, these can provide a single, consistent calculation to the end users – say, for a

-- Set up a view with the same columns and calculation errors in it
-- The first calculation will always work
-- The second gives a divide by zero error on every 10th row
CREATE VIEW SourceObject AS
SELECT object_id AS KeyColumn1,
       column_id as KeyColumn2,
       object_id - column_id as CalcColumn1,
       (object_id - (column_id % 10)) as CalcColumn2
  FROM msdb.sys.columns

Now that it’s set up, we can try selecting all the rows from the view we just created, and we’ll see about 100 rows output, and then the query will stop with a “Divide by zero” error:

SELECT * FROM SourceObject

The calculation in this query is pretty straightforward, and you can see which rows are causing a problem (where column_id is divisible by 10), but what if it was more complicated? The problem is that SQL doesn’t display the row that had a problem – it stops on the row before the problem, so finding the row with the error is bit more difficult. If there were multiple columns involved in the calculation, or different combinations of values that could result in an error? Tracking down the rows causing an error can be difficult – you have to find all the possible conditions that could cause an error, and then query for each of them.

This script will allow you to find all the rows in a view with a calculation error, all at once. It uses a cursor to pull the rows from the view one at a time, test the calculation, and then write any errors it finds to a table where you can see the rows that are causing problems. Using a CURSOR generally isn’t ideal, but in this case, it’s the only way to react to a bad calculation on a row-by-row basis and deal with it.

The script can use two key values from your view – they’re called KeyColumn1 and KeyColumn2 – and you can modify the script to name them whatever you want, or just a single a value if that makes more sense in your scenario. It also supports two computed columns – CalcColumn1 and 2 – though again, it could be changed to just check a single column.

 -- Set up variables
DECLARE @KeyColumn1 INT,
		@KeyColumn2 INT,
		@CalcColumn1 INT,
		@CalcColumn2 INT
		
DECLARE @CurrentRow BIGINT
	SET @CurrentRow = 1

-- Set up a place to hold key values for rows that work  
  SELECT TOP 0 KeyColumn1, KeyColumn2
    INTO #WorkingRows
    FROM SourceObject

-- Set up a place to hold errors for rows that don't work    
CREATE TABLE #ErrorRows (
	RowNumber BIGINT,
	KeyColumn1 INT,
	KeyColumn2 INT,
	[ERROR_NUMBER] INT,
	[ERROR_MESSAGE] nvarchar(4000)
)    

-- Begin loop to look through rows in the view 
  DECLARE cur INSENSITIVE CURSOR FOR
  SELECT KeyColumn1, KeyColumn2
  FROM SourceObject
  ORDER BY KeyColumn1, KeyColumn2
  
  OPEN cur
  
  FETCH NEXT FROM cur
  INTO @KeyColumn1, @KeyColumn2
	
  WHILE @@FETCH_STATUS = 0
  BEGIN
  
	BEGIN TRY

		-- Try to select the calculated columns	
		-- If there's an error, it will jump to the CATCH block
		SELECT @CalcColumn1 = CalcColumn1,
				@CalcColumn2 = CalcColumn2
		  FROM SourceObject
		 WHERE KeyColumn1 = @KeyColumn1
		   AND KeyColumn2 = @KeyColumn2
		   
		-- This lookup succeeded
		INSERT INTO #WorkingRows
		SELECT @KeyColumn1, @KeyColumn2
		
	END TRY
	BEGIN CATCH
		
		-- The lookup failed - save details
		INSERT INTO #ErrorRows
		SELECT @CurrentRow,
				@KeyColumn1,
				@KeyColumn2,
				ERROR_NUMBER(),
				ERROR_MESSAGE()			
		
	END CATCH
	
	SET @CurrentRow = @CurrentRow + 1
	    
	FETCH NEXT FROM cur
	INTO @KeyColumn1, @KeyColumn2
	    
  END
  
  -- Show the key columns of rows with errors
  SELECT * FROM #ErrorRows
  
  -- Show the key columns of working rows
  SELECT * FROM #WorkingRows
  
  -- Clean things up
  close cur
  deallocate cur
  drop table #ErrorRows
  drop table #workingrows

At the end, you’ll have two tables with results in them – #ErrorRows, which contains the key values for rows with errors in them, as well as details about the error message, and #WorkingRows, which contains the key values for all of the working rows from the view.

Note: I could just as easily set up a table with a computed column in it that causes the same problem You’d be unable to select the entire table without an error, and hunting down the row with an error is painful. The script to find the error is the same, but here’s an example of a table that has a computed column with this problem:

-- Set up table with a list of numbers in it
SELECT object_id AS KeyColumn1,
		RANK() OVER (PARTITION BY 0
					 ORDER BY NEWID()) as KeyColumn2
  INTO SourceObject
  FROM msdb.sys.columns
  
-- Add two calculations to the table
-- The first will always work
-- The second will give a "Divide by zero" every 100 rows
  ALTER TABLE SourceObject ADD CalcColumn1 as (KeyColumn1 - KeyColumn2)
  ALTER TABLE SourceObject ADD CalcColumn2 as (KeyColumn1 / (KeyColumn2 % 100))

-- Note that you can't add a persisted computed column to a table
-- if there's a calculation error in any of the rows, so this
-- command will fail with a "Divide by zero" error
  ALTER TABLE SourceObject ADD CalcColumn3 as (KeyColumn1 / (KeyColumn2 % 100)) PERSISTED