A benchmark represents one or more invocations of a bench method on a Smalltalk virtual machine. When a bench method runs, data is gathered before, during, and after the execution of the code in the area of interest. The data is reduced and the results from multiple runs are merged to form a benchmark.
Executing a bench method requires selecting the following from the tool bar:
The total number of times that the operation executes is the product of the number of runs and the number of iterations. If either of these numbers is large, it may take a long time to execute the benchmark.
The Stats tool does not display progress information while a bench method is executing (to minimize garbage and be unobtrusive).
Executing a bench method builds a benchmark. The same bench method is used to establish a baseline and to optimize the operation of interest. This is achieved by specifying how the bench method should be executed. A bench method can be executed in one of the following ways:
After the benchmark is executed, the results are added to the benchmark list.
When you run a bench method (using the Run button), a new benchmark is built. The benchmark contains the raw execution time and the time spent collecting garbage. The operation of interest runs at full speed and no data is gathered while the method executes. Running a bench method accurately captures the raw time spent in the area of interest.
To observe the behavior of the code before attempting to optimize it, a programmer usually builds and deletes many benchmarks. (The Delete button, or the equivalent Delete of the Bench menu, discards runs.) During this process, each benchmark is assessed for stability. Usually a single benchmark is chosen as the baseline for future comparisons.
Baselines are built by running a bench method, never by sampling or tracing. You can vary the number of runs and iterations to achieve an acceptable mean.
A run benchmark is indicated by [R].
![]() | Means between two and five seconds usually ensure stable and repeatable results. Means less than two seconds can be too short for sampling or tracing the operation. Depending on the operation, means less than 200 milliseconds can be unstable and unrepeatable. |
When you sample a bench method (using the Sample button), the benchmark contains all the information of a run [R] as well as data that is gathered by sampling the execution stack. This means that sampling a bench method takes somewhat longer than running the same bench method, due to the overhead of gathering data. When a bench method is sampled, the time spent gathering data is automatically subtracted from the time spent in the operation of interest.
Methods that take a short time may not be recorded at all because they were not seen on the stack when the sample was taken. The probability that a method is recorded is a function of the time it spends on the stack. Therefore, a short method is more likely to be seen as the number of iterations in the bench method and the number of runs are increased.
A sampled benchmark is indicated by [S].
When you trace a benchmark (using the Trace button), data is gathered for every message-send operation. Results from a traced benchmark are viewed in the same way as the results of a sampled benchmark. A traced benchmark is indicated by [T].
Tracing a bench method can take a very long time, which makes sampling a much more attractive way to optimize code.