User's Guide

Optimizing Smalltalk code

Almost every project requires an optimization phase. This is especially true in the dynamic and highly productive world of Smalltalk. Often, ideas are rapidly prototyped and working code is hastily produced with little regard for efficiency. Even in production-quality code or finished subsystems, new features are added that can subtly affect performance.

Project managers and programming teams can leverage traditional Smalltalk productivity and produce high-quality, optimized code by the following actions:

If every project requires an optimization phase, time needs to be allocated for optimization. In new projects this usually happens toward the end of the cycle when a team discovers how their code is being used.

When you are developing code, you need to focus on functionality. When you are prototyping, care is needed to ensure that prototype-quality code does not masquerade as production-quality code. Prototype code is notoriously slow, obtuse, and unmaintainable. This does not mean that prototyping is a bad approach for exploring new ideas; it is another tool to get the job done. You need to recognize prototype code and ensure that it is discarded, cleaned up, or rewritten before it enters the production system. To make significant performance gains, it is almost always necessary to have a deep understanding of the code. Poor-quality code defies optimization.

As part of writing the code, programmers usually write test cases, although sometimes other members of the team are assigned this task. Using the Stats tool, test cases can easily be turned into benchmarks and these benchmarks used to find "the right 10 percent".

Finally, the rules for optimizing Smalltalk code should be followed.

The rules for optimizing Smalltalk code

The rules for optimizing Smalltalk code are:

Run less code

Here's what you can do to run less code.

Scaling and complexity

Running less code means understanding scaling and complexity. An O(n) operation scales much better than an O(n2) operation, but both perform poorly for large values of n.

It is also necessary to understand the minimum and fixed costs of an operation. Sometimes it is better to have a lower minimum cost for common cases in exchange for more complexity. This means that the operation does not scale well, but scaling may not be necessary. For example, a Dictionary does not scale well to huge databases, but it is faster than a BTree for smaller amounts of data.

Understanding complexity means also understanding the complexity of base classes. Arrays are very fast, OrderedCollections are somewhat slower, and Dictionaries and Sets are very slow by comparison. Seemingly unassuming methods such as asSet can be very expensive. Turning an arbitrary collection into a set is an O(n2) operation in the worst case. For this reason, unnecessary conversion between collection types should be avoided.

The Smalltalk virtual machine is optimized for SmallIntegers. Avoiding LargeIntegers, Floats, and Fractions makes mathematical computations faster.

Running less code means designing for the most common cases. When testing a set of conditions, test the most common condition first. Do not perform expensive computations or lookup operations when the user expects realtime behavior, such as when drawing in a window. By precomputing and caching the required information, less code runs during the expensive operation.

Finally, running less code means improving the algorithm; a better algorithm achieves the same results and runs fewer operations. If the code is a little too slow, basic optimization techniques help. If the code is a great deal too slow, a new architecture, algorithm, or some strategically placed user primitives may be necessary. Performance tuning usually gets a 10-30% improvement unless a tight loop is optimized, in which case the improvement might be from two to five times. Changes in architecture, algorithm, and representation hold the promise of greater wins. Many fast applications have been written without resorting to performance tricks, simply by devising a clever algorithm or representation for the data.

Basic optimization

Occasionally, basic optimization techniques contravene good fundamental design principles. For example, coding a message-send operation inline can improve performance, but puts a copy of the code elsewhere in the system. If the original code is optimized or a bug in it is fixed, the copy is not updated. Sometimes it is necessary to balance optimization and design. In the "10% of the code" that gets optimized, compromise may be necessary.

The following guidelines are useful when writing and optimizing code:

The following are known causes of performance problems:

Send fewer messages

Sending fewer messages means that Smalltalk can stay executing longer inside a method. But how can you send fewer messages when everything in Smalltalk is a message-send operation?

The Smalltalk virtual machine performs special processing at run time depending on the selector and the class of the argument. Some message-send operations can be resolved inside the virtual machine without executing a method. Using these selectors can make a huge difference in a tight loop. Other selectors (particularly control and looping selectors) are optimized out by the compiler.

The following selectors are normally resolved for low-level classes, such as SmallInteger, or are highly optimized by the compiler and virtual machine:

Virtual-machine optimizations can sometimes be defeated accidentally. For example, whileTrue: is treated specially when the receiver and both arguments are Blocks. The following two code fragments do the same work, but the second one is faster:

 "Slow.  A block is created and whileTrue: is sent."
  | x block |
  x := 1.
  block := [x < 12].
 	block whileTrue: [x := x + 1].  
 
"Fast.  No block is created and the #whileTrue: is not sent."
 	| x block |
 	x := 1. 
  block := [x < 12].
 	[block value] whileTrue: [x := x + 1].

In code that must be very fast, directly accessing instance variables improves performance. If the code is executed many times, directly accessing instance variables makes a huge difference because of the reduction in the number of message-send operations. Even though accessors are highly optimized by the virtual machine, they still must send the message.

In application code that reimplements doesNotUnderstand:, both the number of message-send operations and the time to process the original message-send are increased. This happens because the original message-send operation must first fail before doesNotUnderstand: runs. In doesNotUnderstand:, handlers that forward the original message, the overhead of the handler is incurred for each forwarded message.

Avoid expensive operations

Understanding which application program and system operations are expensive is key to writing fast applications. Using the highly optimized methods from the virtual machine and base class library improves the performance of Smalltalk code.

For example, String>>#indexOf: is a primitive in most images. It is much faster than writing a loop and testing the elements for equality or using another less-optimal method, such as findFirst:.

When in doubt about the speed of an operation, use the Stats tool to construct benchmarks and compare times.

Tips
Use CLDT methods instead of writing application program code to do the same work. CLDT has been highly optimized for most common operations.

Run your code--not the garbage collector

Even though garbage collectors are fast, they do use CPU power, which Smalltalk code could be using. It is better to create less garbage.

To find where the code is creating extra garbage, examine senders of new and new: in the code being optimized. Objects are also created by some primitives; for example, Float and LargeInteger operations (for example, +, -, /, and *).

Less well known is that the virtual machine creates objects to run certain methods. Methods that contain Blocks can result in object creation (for example, MethodContext and BlockContext). Exceptions to this rule are special control constructs that appear to use Blocks, such as whileTrue:, but in fact, often do not because the Blocks are often optimized out. In code that draws graphics, the point and rectangle operations are typical sources of excess garbage.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]