The Admission Control component Web service protects the platform on which the Service Platform components run from overload conditions and can control the types of traffic admitted into the platform. The Admission Control component Web service offers two types of admission limit enforcement: locally tracked limits and cluster-wide tracked limits.
Local limits are useful for bounding the rate of requests accepted on an individual server and are tracked only using local request information. Cluster limits are useful for bounding the rate of requests accepted across the cluster as a whole. A distributed reservation algorithm is used to allocate rate to individual cluster members as needed, while ensuring that the total limit across the cluster is exceeded. This approach can handle different distributions of load across the cluster, which may arise as a result of the use of session affinity (directed all requests that are part of a session to the same application server) for converged HTTP and SIP applications.
The Admission Control component Web service allows for setting hierarchical rate limits for service and operation traffic within the TWSS Administration Console. Services and operations form a hierarchy, where each service has a set of child operations. Limits set at the server level constrain the rate of requests for all operations executed against that service, regardless of the distribution of requests to the individual operations. Limits set at the operation level limit the rate of requests for the particular operation. Request rates are expressed in tokens per second. Weights can be set for individual services and operations within the TWSS Administration Console. When determining whether to admit an operation for a particular service, a total token weighting for the given operation is calculated using the formula: (service weight * operation weight). This total weighting represents the number of tokens that will be consumed by admitting this request.
Hierarchical limits in the Admission Control component Web service are enforced using a sliding window, rate limiting bucket algorithm. The algorithm works as follows: the first request received with a positive weighting from a quiesce (idle with no active sliding window) state begins a sliding window over a one-second interval until the next quiesce state. Requests may arrive at any rate within the window as long as the limit is not exceeded. This restricts the average rate of Web service requests, regardless of the characteristics of the incoming flow of traffic. Requests with higher weighting will consume more tokens against the limit than requests with lower weighting. Requests with a weight of zero are automatically admitted, regardless of the limit.
A reservation scheme is used to allocate rate amongst members of a cluster to enforce cluster-wide limits. A single coordinator is elected amongst the cluster members. The coordinator maintains reservation information for each cluster member against the cluster-wide limits. Cluster members send reservation requests as needed, such as when insufficient rate is currently allocated, to attempt to admit incoming requests. When allocated rate is no longer needed, a release request is sent to the coordinator relinquishing rate. To determine when to release rate, a rate estimator is associated with each operation. A running average of current traffic is maintained and used to determine when rate is no longer needed. Rate is released gradually (every second) in chunks that are a percentage of cluster-wide rate limit.
Each cluster member maintains a local hierarchy of rate limiting buckets that are used to calculate request admission. Reservations for additional rate capacity are made against each operation as more rate is needed. When additional rate has been reserved, the allocate rate is added to the operation rate limiting bucket and the parent service rate limiting bucket. As such, the rate limit for the service bucket will be the sum of the rates of its child operation buckets. A reservation request may be denied at any time due to lack of sufficient rate at the service or operation level. When a node member receives a rejected reservation request, it will enter a silence period. This silence period will suppress additional reservation requests, reducing inter-cluster traffic and allowing the network to settle before attempting additional reservation. In steady state, this minimizes the inter-cluster messaging required.
Upon failover of the node housing the cluster coordinator, a new coordinator is elected. A message is resent to all the nodes resetting their reservation status. This results in a "reset" of the algorithm without interrupting service. Inter-cluster traffic may increase temporarily while the algorithm rebuilds state to reduce inter-cluster messaging.
There are no restrictions for choosing local and cluster limits. Local limits are assumed to be typically be smaller than cluster limits, although this is not a hard requirement.