To observe speed improvements in code, make sure you establish a stable baseline time. Getting a stable execution time for Smalltalk code using Time millisecondsToRun: [...] is a difficult task because raw time can vary dramatically depending on the initial and current state of the virtual machine. For example, the garbage collector may scavenge more often or longer in one run than in another. This can give the appearance of improvement or degradation when no code has changed.
To be unobtrusive, the Stats tool allocates the memory it needs before it runs and creates as little garbage as possible while it runs. Large allocation requests do not occur between runs. The Stats tool does not mask how the application uses memory.
The following menu items in the Bench menu are used to control the stability of raw execution time for a bench method:
To disallow asynchronous messages, uncheck this item. Be careful, though. When asynchronous messages are disabled, Smalltalk cannot be interrupted. This means that the programmer cannot break the execution of a bench method.
The amount of available new space at the beginning of a run determines when a scavenge operation starts and how long it takes. When new space is emptied before each run, the raw time is stabilized because the number of scavenge operations and the time spent scavenging are more predictable for each run.
Very rarely, depending on the operation, Smalltalk must perform global garbage collection to satisfy an allocation request. When old space is compacted before each run, the raw time is stabilized because the number of globals and the time spent performing global garbage collection are more predictable for each run.
Global garbage collection can take a long time; it is not necessary when stabilizing most results.
Smalltalk uses a compiled-method cache so that methods that are executed frequently are not looked up in the class each time. On virtual machines that support dynamic translation, translated methods are also cached. This means that the first run of a method can be slower than subsequent runs. Clearing the code cache before each run stabilizes raw time because all runs will include both the time spent translating and the time spent looking up methods.
Stability is important in a baseline. This means that run [R] benchmarks should always be assessed for stability. For consistency and repeatability of results, sampled [S] and traced [T] benchmarks should share the same initial virtual machine conditions as run [R] benchmarks.
Other factors affecting the stability of a benchmark must be considered when establishing a baseline. The following factors are beyond the control of the Stats tool and may cause execution time to vary: