A cluster is a group of independent computers working together. When you use cluster configurations, you enhance the availability of your servers. Clustering allows you to join two to four Windows servers, or nodes, using a shared disk subsystem. This provides the nodes with the ability to share data, which provides high server availability.
Clusters consist of many components such as nodes, cluster objects, Microsoft Cluster Server (MSCS) virtual servers, and even the hardware and software. If any one of these components is missing, the cluster cannot work.
Nodes have the following characteristics:
When a node starts, it searches for active nodes on the networks designated for internal communication. If it finds an active node, it attempts to join the node's cluster. If it cannot find an existing cluster, it attempts to form a cluster by taking control of the quorum resource. The quorum resource stores the most current version of the cluster database, which contains cluster configuration and state data. A server cluster maintains a consistent, updated copy of the cluster database on all active nodes.
A node can host physical or logical units, referred to as resources. Administrators organize these cluster resources into functional units called groups and assign these groups to individual nodes. If a node fails, the server cluster transfers the groups that were being hosted by the node to other nodes in the cluster. This transfer process is called failover. The reverse process, failback, occurs when the failed node becomes active again and the groups that were failed over to the other nodes are transferred back to the original node.
Nodes, resources, and groups are three kinds of cluster objects. The others are networks, network interfaces, and resource types. All server cluster objects are associated with a set of properties, with data values that describe an object's identity and behavior in the cluster. Administrators manage cluster objects by manipulating their properties, typically through a cluster management application such as Cluster Administrator (part of the MSCS application).
MSCS lets you place TSM server cluster resources into a virtual server. A virtual server is an MSCS cluster group that looks like a Windows server. The virtual server has a network name, an IP address, one or more physical disks, and a service. A TSM server can be one of the virtual services provided by an MSCS virtual server.
The virtual server name is independent of the name of the physical node on which the virtual server runs. The virtual server name and address migrate from node to node with the virtual server. Clients connect to a TSM server using the virtual server name, rather than the Windows server name. The virtual server name is implemented as a cluster network name resource and maps to a primary or backup node. The mapping is dependent on where the virtual server currently resides. Any client that uses WINS or directory services to locate servers can automatically track the virtual server as it moves between nodes. Automatically tracking the virtual server does not require client modification or reconfiguration.
As mentioned earlier, each virtual server has its own disk as part of a cluster resource group. Therefore, they cannot share data. Each TSM server that has been implemented as a virtual server has its database, recovery log, and set of storage pool volumes on a separate disk owned by that virtual server.
Because the server's location is transparent to client applications, this affords TSM the maximum ease of failover and failback, while minimizing the impact on the TSM clients.
The following example demonstrates the way the MSCS virtual server concept works.
Assume a clustered TSM server called TSMSERVER1 is running on node A and a clustered TSM server called TSMSERVER2 is running on node B. Clients connect to the TSM server TSMSERVER1 and the TSM server TSMSERVER2 without knowing which node currently hosts their server. The MSCS concept of a virtual server ensures that the server's location is transparent to client applications. To the client, it appears that the TSM server is running on a virtual server called TSMSERVER1.
Figure 38. Clustering with TSMSERVER1 as Node A and TSMSERVER2 as Node B
When one of the software or hardware resources fails, failover occurs. Resources (for example: applications, disks, or an IP address) migrate from the failed node to the remaining node. The remaining node takes over the TSM server resource group, restarts the TSM service, and provides access to administrators and clients.
If node A fails, node B assumes the role of running TSMSERVER1. To a client, it is exactly as if node A were turned off and immediately turned back on again. Clients experience the loss of all connections to TSMSERVER1 and all active transactions are rolled back to the client. Clients must reconnect to TSMSERVER1 after this occurs. The location of TSMSERVER1 is transparent to the client.
Figure 39. Clustering with TSMSERVER2 as Node B and assuming the role of TSMSERVER1 as Node A
Generally, the following considerations are accounted for during the cluster planning stage and before actual installation. However, because of their importance to the overall success of a working cluster, these considerations are restated here.
Attach to the node on which the TSM server instance is currently active. | Attach to a third, nonclustered system on which an additional instance of the TSM server is active. |
---|---|
This configuration allows high performance backup and restore. However, it is not entirely automated. Operator intervention is required to service a failover where repair delays take more than 2 days. | This configuration may not be acceptable in installations with low bandwidth communications between the servers in the cluster and the tape device controller server. |
Define enough disk-based data volume space to keep more than 2 days worth of average data. | Define enough disk-based data volume space to keep more than 2 days worth of average data. |
Set up a storage pool hierarchy so that data is migrated efficiently to the tape device. | Use the virtual volumes to enable migration of the data from the local disk volumes to the tape device. |
When a failover occurs, manually disconnect the tape device and reattach it to the node on which the server was newly active. | When a failover occurs, no operator intervention is required; the newly active server continues to use the virtual volumes as before. |