Converting two-column DataFrame to a dictionary in Python

If you have a Pandas dataframe like the following (if you already have a DataFrame based on a query result, you can use that):

import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
Basic Pandas DataFrame

I wanted a dictionary where I could provide the name (“tom”, “nick”, or “juli”) and receive their age as the result. To convert your DataFrame to a dictionary like this, you’d use the following command:

my_dict = pd.Series(df.Age.values,index=df.Name).to_dict()
print(my_dict)
Python dict with three key:value pairs

Now that you have a dictionary, you can do lookups or whatever else you want. Make sure you take note above that the first parameter uses “values” where the second one doesn’t. If you use values in the second parameter, or you just try to use the “.todict()” function on the DataFrame directly, without converting it to a series first as above, you can end up with Dictionaries with an entry for each column, or dictionaries that contain dictionaries, or a number of other things that don’t work.

Short and to the point – this took me about an hour to find the answer to and hopefully you find it before losing that much time!

Clustering walkthrough for SQL Server 2008 on Windows 2008

I recently stumbled across a great walk-through for clustering SQL Server on newer versions of Windows. It’s really thorough – everything from setting up iSCSI (in this case, to simulate a shared disk when it’s physically attached to one node – not ideal, but lets you test the walk-through), adding the required server roles, preparing the servers, and then a walkthrough of every screen in the SQL installation process. Thanks to the writer of this awesome blog!

http://dbperf.wordpress.com/2010/07/10/walkthrough-cluster-setup-sql-win-2008/

Roll your own lightweight SQL Server source control

I’ve wanted to implement some kind of source control on my SQL Servers before, but the only product available at the moment is Red-Gate’s SQL Source Control, and I didn’t need all the functionality it offered (or want to pay for it). Also, it relies on developers checking-in their changes, and that’s prone to forgetfulness anyways, as well as leaving your database prone when somebody just changes something in production, without using their development tool – ouch. Sure,  you’re protected against accidental drops, but what if somebody tweaks something in production without checking it back in? You’re hosed.

All I wanted was a simple process that would run automatically, taking periodic snapshots of the database objects and recording any changes. I decided to roll my own – it’s quick, simple, can be set up to run on a schedule, and automatically includes any new databases created on the server without any intervention.

This Stored Procedure goes through the following steps:

  1. If the Master.dbo.coSourceControl table (used to store the history) doesn’t exist, it creates it
  2. For each database on the server (so new databases are added automatically), it:
    1. Grabs the text contents of all the user objects (not flagged as “IsMsShipped”)
    2. Compares the contents of each to the last known copy (if there is one)
    3. If the object is new or has changed, add a new copy to the source control table in master
  3. Output the number of objects updated
  4. Optionally, it could email somebody to tell them about the results, but it currently does not

The history is kept in a single table – master.dbo.coSourceControl – which has the database it came from, the object_id, the object name, object contents, and the timestamp. Since it uses the object_id to track things, it will also record a name change in an object, even if the contents didn’t change.

To implement it, just grab the script and run it in the master database – it will create the stored procedure coSourceControlRefresh. That’s it – now either run it on demand, or you can schedule it. It will create the supporting table (if it’s missing) and scan every database every time it’s run. To see the history for an object, just do:

  SELECT db_name(databaseid) as [Database],
         object_name(objectid) as [Object Name],
         SourceDate,
         ObjectText
    FROM master.dbo.coSourceControl
   WHERE object_name(objectid) LIKE '%The name of some object%'
ORDER BY SourceDate DESC

Restoring a dropped or changed database object should be as simple as running the query above, grabbing the contents of ObjectText you’re interested in, and then pasting it in another window and executing it. Bam – previous version of the object restored (and this stored proc should, the next time it runs, see that you’ve altered the object and record that there’s a “new” version of it).

If you run it and like it – or don’t like it – please leave a comment to let me know – nothing expected in return, but it’s nice to know when people find it useful. I’m happy to make any enhancements you’d like to see. I hope you enjoy it and it’s able to save you from the headache of a dropped database object to which you can’t find the source!

Download the Source Control database script