WebSphere brand IBM WebSphere Telecom Web Services Server, Version 7.1

Administering the Traffic Shaping component Web service

The Traffic Shaping component Web service controls the flow of outbound traffic from Service Platform components towards network resources. The Traffic Shaping component Web service implements a distributed token bucket algorithm that allows for specifying the maximum burst and maximum sustained average rate of traffic emitted from the cluster.

Burst traffic refers to Web service requests that are separated by short intervals as compared to the average rate of traffic and are commonly considered spikes in the traffic. The sustained average rate refers to the sustained traffic throughput allowed once all burst has been exhausted. The sustained average rate is commonly thought of as the steady state traffic rate. Traffic shaping applies across the cluster only (not locally) and is tracked on a per-resource basis. Multiple service implementations may refer to the same network resource name and thus share the same traffic limits and consumption tracking. A distributed reservation algorithm is used to allocate rate to individual cluster members as needed, while ensuring that the total limit across the cluster is not This approach can handle different distributions of load across the cluster, which may arise as a result of the use of session affinity (directing all requests that are part of a session to the same application server) for converged HTTP/SIP applications.

The Traffic Shaping component Web service should be called by service implementations prior to the generation of traffic. Each invocation to traffic shaping takes a token weighting representing the amount of traffic (or cost) of the action that the service implementation is about to initiate towards the network resource. The calculation of the token weighting is service implementation-specific and may correspond to the number of messages generated, number of operation targets, and so forth. This weighting corresponds to the number of tokens consumed from the distributed token bucket in admitting the request. This implementation does not perform any request queueing; the traffic shaping component merely indicates whether sufficient rate capacity exists for the network resource to handle the request. The Traffic Shaping component Web service will return information about its local token bucket that may be used to drive service implementation queuing.

Reservation scheme

The Traffic Shaping component Web service implements a distributed token bucket. A token bucket algorithm uses the analogy of a bucket to control the rate of requests. A bucket has a size B, which corresponds to the number of tokens that may fit into the bucket. The size of the bucket also corresponds to the maximum burst size. Each request that comes in consumes some tokens from the bucket. When no more tokens are left in the bucket, the request cannot be processed currently and must either be rejected or queued. Tokens regenerate at a rate R, which corresponds to the maximum average sustained rate of traffic. In any interval t, a maximum of B + Rt tokens may be consumed. This provides shaping of traffic, allowing for bursty output behavior while constraining the average throughput.

The following scheme is used for partitioning network resource traffic among cluster members. The maximum burst size limit is divided up evenly among all active members of the cluster. For example, given a burst limit of 30 tokens and three cluster members, each cluster member will be allocated a 10 token bucket size, or burst size according to the token bucket algorithm. Should one cluster member be offline, the burst limit is repartitioned to the remaining two cluster members for a bucket size of 15 tokens. This conservative scheme ensures that the burst output across the cluster never exceeds the limit. It also works well with round-robin load balancing. A reservation scheme is used to allocate rate amongst members of the cluster to enforce cluster-wide rate limits. A single coordinator is elected among the cluster members. The coordinator maintains reservation information for each cluster member against the cluster-wide limits. Cluster members send reservation requests as needed, that is when insufficient rate is currently allocated, to attempt to allow additional traffic. When the allocated rate is no longer needed (as the bucket begins to fill up faster than tokens are being consumed or becomes full), then a release request is sent to the coordinator to relinquish rate. Rate is released gradually as the bucket level begins to approach the maximum or fully when the bucket reaches its maximum This method accommodates different distributions of burst. A running average of current traffic is maintained and used to determine when rate is no longer needed. Rate is released gradually (every second) in chunks that are a percentage of the maximum sustained average rate.

Reservation requests are made as soon as the token bucket starts to deplete. This tries to regenerate burst in accordance to the current rate of consumption of tokens. The coordinator may deny a reservation request for additional rate if there is not enough remaining rate across the cluster. When a cluster member receives a rejected reservation request, it will enter a silence period. This silence period will suppress additional reservation requests, reducing inter-cluster traffic and allowing the network to settle before attempting additional reservation. In steady state, this minimizes the inter-cluster messaging.

Upon failover of the node housing the cluster coordinator, a new coordinator is elected. A message is resent to all the nodes resetting their reservation status. This results in a "reset" of the algorithm without interrupting service. Inter-cluster traffic may increase temporarily while the algorithm rebuilds state to reduce inter-cluster messaging.

Configuration

Network resources traffic shaping limits are configured in the Network Resources section of the TWSS Administration Console. The Network Resources component Web service must be installed in order for this function to be enabled. A network resource has a logical name that is associated with a resource specification, or a set of properties. Multiple service implementations may be configured to refer to the same network resource logical name.

The Traffic Shaping component Web service expects the following network resource specification properties to be defined:
  • MaxBurstSize: The maximum amount of burst traffic that can be handled by the network resource. This is measured in number of tokens, where a single service implementation traffic shaping request may consume multiple tokens. The actual constant is 1000.
    Note: In most cases it should be lower than 1000.
    This deployment time decision is dependent on the capacity of the backend system.
  • MaxAverageSustainedRate: The maximum average sustained rate of work, measured in tokens per second, that can be handled by the resource. This corresponds to the rate of token regeneration in the token bucket algorithm.
To define a network resource logical name and its properties, launch the TWSS Administration Console and click Network ResourcesNetwork Resources Network Resource Names.
  • Click New. Then specify a new network resource logical name and provide a description for it.
  • Select a network resource logical name in the list. Then click New to define its properties and their associated values.

Guidelines for choosing limits

The maximum burst size must be chosen such that, when burst size is split evenly among all members of the cluster, each member's local token bucket has sufficient tokens to accommodate the largest request weight. Otherwise, there will never be sufficient tokens to satisfy such a request and this request will always be rejected. In addition, configuring some additional burst above the maximum request size will provide better adaptability of the algorithm to different traffic distributions of traffic being generated within the cluster. An example of such burst is configured a maximum burst of 10 requests, when the maximum request weighting is 1.

When running at capacity, the algorithm has a few for request distributions that approach the traffic rates near the specified limit for the network resource. Reservation requests for rate are made on an as-needed basis when processing incoming requests. Each request may result in a reservation request for enough rate to replenish the tokens consumed by that request. Clusters are typically fronted by a round-robin load-balancer and thus may generate outbound traffic in a similar fashion. When running at capacity, it is possible that an individual node will get allocated a chunk of rate during a round-robin spray and the other remaining members be denied. This may not match expectations across the cluster. For example, consider a cluster with three members, with a rate limit for an operation of 30 tokens per second across the cluster and each cluster member allocated 9 tokens per second. This leaves 3 tokens per second worth of rate to reserve. If an additional spray of requests comes on top of the 27 tokens per second rate across the cluster, then the first request in the spray will result in the first server getting allocated the 3 tokens per second and the remainder being denied additional capacity. This may result in less token regeneration across the cluster than expected. This behavior typically manifests itself as a ping-pong effect, where the last little bit of tokens get traded off between members of the cluster. This can be avoided by running a rate that is less than: (number of cluster members * maximum token per request size). The algorithm will tolerate running within this threshold, but some premature rejections may occur in rare traffic patterns.




Terms of use
(C) Copyright IBM Corporation 2009. All Rights Reserved.