As a follow-up to my last entry (attempting to clear up some Windows Clustering terminology), I’ve found an article that makes another distinction that I forgot to include – the difference between an active/passive and an active/active cluster:
The understanding among those new to cluster seems to be that a/a vs. a/p is a licensing question, and then if you’re licensed for it, you just turn it on. In reality, it really just describes whether you have clustered services living on only one node or split between both nodes (during normal operation – during a cluster failover, any cluster might be active/active for a short period of time. Or, I suppose, your cluster is active/active if your quorum drive lives on the opposite node from your clustered service). There’s no load-balancing involved in clustering at all – at any time, only one node owns a particular resource, and only that node is responding to client requests for that resource.
In SQL Server 2012 AlwaysOn, the new high-availability feature, the SQL Server service is running on both cluster nodes, but client access (through the “Availability Group”) is controlled by the cluster service. That means that all clients making a connection go first to the active server, and then the SQL Service there might send them to get their data from one of the other nodes (it’s worth reiterating here that, in AlwaysOn, SQL Server isn’t clustered, but the SQL services operate independently on each node).
I wanted to clear up some terminology around Windows Clusters that seems to cause a bit of confusion. I’ve stumbled across a few questions on StackOverflow and Experts-Exchange that seem to have some basic confusion around “clustering servers” and “how to install an application to a cluster”, and I’m hoping to set a few things straight.
There’s really no such thing as a clustered server. Servers can have clustering enabled and configured, but the servers themselves aren’t really clustered – they’re just set up to enable clustered applications. When servers are part of a cluster, they still do all their thinking on their own, including running their own applications, services, and tasks, without the other servers in the cluster even being aware.
You don’t cluster servers, you cluster applications and resources. Once servers have had clustering installed and are configured, you can cluster an application or a resource. This clustering is really just telling the cluster manager that you want it to control which server clients talk to when they want to access the resource. The cluster manager ensures that the application (or service or resource) is running on only one node at any given time, and to the extent it’s able, it ensures that it’s always running (watching for a failure and bringing the resource online on another node and then directing clients to that node instead).
Applications don’t have to be “cluster-aware” to be clustered. I work mostly with SQL Server, which is cluster-aware, but applications you cluster don’t need to be. You can cluster any service, or resource on a server by just adding it to the cluster manager – the cluster manager will ensure it only runs on one server at a time, not allowing it to start on other nodes. For example, we use a monitoring tool that runs as a service – we installed the service on each cluster node and then added to the cluster manager – it now can be failed back and forth between nodes as a clustered resource, so it’s always online, is failure-resistant, and shares a segment of the HKLM in the registry between nodes – all without being explicitly cluster-aware.
SQL Server doesn’t need to be clustered when it’s installed on a cluster. While you can install a clustered instance of SQL Server (which automatically registers everything with the cluster manager), you can also install stand-alone instances of SQL Server (or any other application) on a cluster. That’s actually how a new feature in SQL 2012 – AlwaysOn – works: You install a non-clustered instance of SQL Server on different cluster nodes, and then you let the cluster manager coordinate client connections to the SQL Servers, but they still operate independantly and replicate their data between each other.
Hopefully this clears things up and doesn’t lead to more confusion. When I first started working with clustering, I had the impression that setting up a cluster caused the servers to act as one and share all their resources, but that understanding led to a lot of confusion when it came time to set something up or troubleshoot an issue. With the understanding that “clustered servers” are really just servers with clustered resources, and not actually clustered themselves, hopefully it will simply things!
Recently I added a second LUN to a clustered instance of SQL Server to isolate the logs files (and another for the TempDB) and I mounted them to an empty NTFS folder as I’ve done before, but when I started SQL Server, I recieved the following error:
CREATE FILE encountered operating system error 5 (Access is denied.)
while attempting to open or create the physical file
The error means that SQL Server doesn’t have NTFS rights to the location of the TempDB, but when I tried to add those rights, the permissions weren’t granted to the domain proxy account as I’d expected, but were instead granted to the service SID account, MSSQL$InstanceName. I attempted to grant the permissions to this account at the new location, but couldn’t get it to resolve to an actual account. What finally worked was:
In the permissions change dialog box, change the “Location” from the domain to the local machine (even though it’s a cluster and your using a domain account to run the service)
In the text box, type “NT Service\MSSQL$INSTANCENAME” and click “Check Names”
Even though that appears to be a local account, it will resolve properly on all the cluster nodes involved. This step, as opposed to using the domain proxy account the service is running as, was necessary because (during the initial SQL Server setup process) I’d selected to use the proxy account SID to host permissions rather than a domain group. The better choice permissions-wise, but the source of some confusion!
I recently stumbled across a great walk-through for clustering SQL Server on newer versions of Windows. It’s really thorough – everything from setting up iSCSI (in this case, to simulate a shared disk when it’s physically attached to one node – not ideal, but lets you test the walk-through), adding the required server roles, preparing the servers, and then a walkthrough of every screen in the SQL installation process. Thanks to the writer of this awesome blog!
We recently ran into an issue where linked server queries where sometimes taking significantly longer than they should have – queries that executed in under a second on the source server took two minutes when executed from a remote SQL Server over a linked server connection. We assumed it was something to do with our Windows Failover Cluster network connections, but it was happening on non-clustered servers as well.
After some digging, we discovered that it happened when the source server was Windows Server 2008 (or 2008 R2), regardless of what OS the remote linked server was running, and the issue wasn’t tied to any particular version of SQL Server. As it turns out, it related to everybody’s least favorite network setting, TCP Auto-Tuning – disabling it on the affected source servers resolved the problem immediately, and the queries returned instantly again.
To remedy the situation, follow these steps:
On the source SQL Server (the one executing the query, not the linked server), open an elevated command prompt (Right click on CMD.EXE and click “Run as Administrator”)
To disable TCP Auto-tuning, run the following command:
netsh int tcp set global autotuninglevel=disabled
To disable Windows Scaling Heuristics (another feature that can speed up clients, but is no good for servers), run:
netsh int tcp set heuristics disabled
This will automatically take effect after Windows has the chance to refresh some network communications – you can either wait about an hour, or, if you’d prefer it take effect immediately, restart the SQL Instance. A reboot isn’t necessary for this setting to take effect.
I’m not sure why these settings are on by default for servers – they seem to speed up internet browsing and other client-type activities (though there’s no shortage of complaints online about them grinding Windows Vista/7 clients to a halt as well), but can cause big problems on servers.
When I clustered SQL Server 2005 the first time, it bothered me that I had to access each clustered instance using both the cluster DNS name and the instance name. If my SQL Cluster is called SQL-CLUSTER and the DNS alias of my first instance is SQL-INSTANCE1, I had to connect to SQL-INSTANCE1INSTANCE1. Since each instance has a dedicated name and IP address that uniquely identifies it, shouldn’t that be enough? Why is the instance name required?
It turns out it’s not! With a little tweaking, you can access the instance using just the SQL DNS name, without the instance name, and it works for as many SQL instances as you have on a cluster, not just one (you’ll need to repeat this process for each instance). Here’s how:
On the active node for the instance you’re adjusting, open the “SQL Server Configuration Manager” from the start menu.
Expand “SQL Server Network Configuration” and select the instance you want to edit.
In the right window, right-click “TCP/IP” and select “Properties.”
On the “Protocol” tab, ensure that “Listen All” is set to “Yes”
On the “IP Addresses” tab, scroll all the way to the bottom, to the “IPAll” section.
The “TCP port” box is blank by default – set it to 1433, the default port for SQL Server.
Click “OK” – you’ll receive a message that you need to restart the SQL Service.
Restart the SQL Service on that node.
Presto – done! You can now access this SQL Server instance using either “SQL-INSTANCE1\INSTANCE1” or just “SQL-INSTANCE”. It’s also worth noting that this network settings follows the instance between active nodes – if you failover the SQL instance to the other node, it should still be addressable the same way, and if you check the protocol settings, you’ll see that your change is active on that server now as well.
To me, it’s so much easier to not have to use the instance name when addressing the server, and since the service is responding on port 1433, the default port, this change should be compatible with every application that connects, and I haven’t had any problems at all.
If you try this and run into any problems, please leave a comment below.
After playing with this a little more, it seems like it only works with the SQL Native Client. If you attempt to connect via the SQL Server OLEDB driver or the old SQL 2000 driver, you get the error that you’re not able to connect to the specified instance. Switching your connection method to the SQL Native Client (SQLNCLI) allows you to connect, assuming your application allows that. Has anybody else experienced this problem?
I ran into this error while deploying SQL Server 2005 Enterprise to a two-node Windows Server 2003 cluster. The SQL Server installation checked all the prereqs with no problems, and as soon as it was time to actually do the installation, it paused for about 5 minutes, with the message “SQL Server Setup is preparing to make the requested configuration changes…” After a few minutes with no activity, the installation fails with the following error message:
I checked the “Scheduled Tasks” list on the passive node, and found the following:
Starting the task manually, even while setup was still running on the active node, had no effect. Also, the following entry appeared in the Log (Advanced Menu -> View Log):
“SQL Server Remote Setup .job” (setup.exe) 5/1/2009 11:29:45 AM ** ERROR **
Unable to start task.
The specific error is:
0x80070005: Access is denied.
Try using the Task page Browse button to locate the application.
It turns out it’s not an “Access denied” message at all! This occurs when you’re logged in to the desktop of the passive node while you’re doing the SQL installation. I had a remote desktop session open to both nodes, which caused this problem. Simply logging out of the passive node, then attempting the installation again from the active node, will allow setup to complete successfully.
Please, Microsoft: Add a pre-check to the setup process, or at the very least, give me a real error!