SimGrid
3.10
Versatile Simulation of Distributed Systems
|
Tracing is widely used to observe and understand the behavior of parallel applications and distributed algorithms. Usually, this is done in a two-step fashion: the user instruments the application and the traces are analyzed after the end of the execution. The analysis can highlights unexpected behaviors, bottlenecks and sometimes can be used to correct distributed algorithms. The SimGrid team has instrumented the library in order to let users trace their simulations and analyze them. This part of the user manual explains how the tracing-related features can be enabled and used during the development of simulators using the SimGrid library.
With the sources of SimGrid, it is possible to enable the tracing using the parameter -Denable_tracing=ON when the cmake is executed. The sections Tracing categories functions, Tracing marks functions, and Tracing user variables functions describe all the functions available when this Cmake options is activated. These functions will have no effect if SimGrid is configured without this option (they are wiped-out by the C-preprocessor).
$ cmake -Denable_tracing=ON . $ make
The SimGrid library is instrumented so users can trace the platform utilization using MSG, SimDAG and SMPI interfaces. It registers how much power is used for each host and how much bandwidth is used for each link of the platform. The idea with this type of tracing is to observe the overall view of resources utilization in the first place, especially the identification of bottlenecks, load-balancing among hosts, and so on.
Another possibility is to trace resource utilization by categories. Categorized resource utilization tracing gives SimGrid users to possibility to classify MSG and SimDAG tasks by category, tracing resource utilization for each of the categories. The functions below let the user declare a category and apply it to tasks. The tasks that are not classified according to a category are not traced. Even if the user does not specify any category, the simulations can still be traced in terms of resource utilization by using a special parameter that is detailed below (see section Tracing configuration Options).
TRACE_category(const char *category)
TRACE_category_with_color(const char *category, const char *color)
MSG_task_set_category(msg_task_t task, const char *category)
MSG_task_get_category(msg_task_t task)
SD_task_set_category(SD_task_t task, const char *category)
SD_task_get_category(SD_task_t task)
For hosts:
TRACE_host_variable_declare(const char *variable)
TRACE_host_variable_declare_with_color(const char *variable, const char *color)
TRACE_host_variable_set(const char *host, const char *variable, double value)
TRACE_host_variable_add(const char *host, const char *variable, double value)
TRACE_host_variable_sub(const char *host, const char *variable, double value)
TRACE_host_variable_set_with_time(double time, const char *host, const char *variable, double value)
TRACE_host_variable_add_with_time(double time, const char *host, const char *variable, double value)
TRACE_host_variable_sub_with_time(double time, const char *host, const char *variable, double value)
For links:
TRACE_link_variable_declare(const char *variable)
TRACE_link_variable_declare_with_color(const char *variable, const char *color)
TRACE_link_variable_set(const char *link, const char *variable, double value)
TRACE_link_variable_add(const char *link, const char *variable, double value)
TRACE_link_variable_sub(const char *link, const char *variable, double value)
TRACE_link_variable_set_with_time(double time, const char *link, const char *variable, double value)
TRACE_link_variable_add_with_time(double time, const char *link, const char *variable, double value)
TRACE_link_variable_sub_with_time(double time, const char *link, const char *variable, double value)
For links, but use source and destination to get route:
TRACE_link_srcdst_variable_set(const char *src, const char *dst, const char *variable, double value)
TRACE_link_srcdst_variable_add(const char *src, const char *dst, const char *variable, double value)
TRACE_link_srcdst_variable_sub(const char *src, const char *dst, const char *variable, double value)
TRACE_link_srcdst_variable_set_with_time(double time, const char *src, const char *dst, const char *variable, double value)
TRACE_link_srcdst_variable_add_with_time(double time, const char *src, const char *dst, const char *variable, double value)
TRACE_link_srcdst_variable_sub_with_time(double time, const char *src, const char *dst, const char *variable, double value)
To check which tracing options are available for your simulator, you can just run it with the option
--help-tracing
to get a very detailed and updated explanation of each tracing parameter. These are some of the options accepted by the tracing system of SimGrid, you can use them by running your simulator with the –cfg= switch:
tracing
: Safe switch. It activates (or deactivates) the tracing system. No other tracing options take effect if this one is not activated. --cfg=tracing:yes
tracing/categorized
: It activates the categorized resource utilization tracing. It should be enabled if tracing categories are used by this simulator. --cfg=tracing/categorized:yes
tracing/uncategorized
: It activates the uncategorized resource utilization tracing. Use it if this simulator do not use tracing categories and resource use have to be traced. --cfg=tracing/uncategorized:yes
tracing/filename
: A file with this name will be created to register the simulation. The file is in the Paje format and can be analyzed using Viva or Paje visualization tools. More information can be found in these webpages: http://github.com/schnorr/viva/ http://github.com/schnorr/pajeng/ --cfg=tracing/filename:mytracefile.traceIf you do not provide this parameter, the trace file will be named simgrid.trace.
tracing/onelink_only
: By default, the tracing system uses all routes in the platform file to re-create a "graph" of the platform and register it in the trace file. This option let the user tell the tracing system to use only the routes that are composed with just one link. --cfg=tracing/onelink_only:yes
tracing/smpi
: This option only has effect if this simulator is SMPI-based. Traces the MPI interface and generates a trace that can be analyzed using Gantt-like visualizations. Every MPI function (implemented by SMPI) is transformed in a state, and point-to-point communications can be analyzed with arrows. --cfg=tracing/smpi:yes
tracing/smpi/group
: This option only has effect if this simulator is SMPI-based. The processes are grouped by the hosts where they were executed. --cfg=tracing/smpi/group:yes
tracing/smpi/computing
: This option only has effect if this simulator is SMPI-based. The parts external to SMPI are also outputted to the trace. Provides better way to analyze the data automatically. --cfg=tracing/smpi/computing:yes
tracing/smpi/internals
: This option only has effect if this simulator is SMPI-based. Display internal communications happening during a collective MPI call. --cfg=tracing/smpi/internals:yes
tracing/smpi/display_sizes
: This option only has effect if this simulator is SMPI-based. Display the sizes of the messages exchanged in the trace, both in the links and on the states. For collective, size means the global size of data sent by the process in general. --cfg=tracing/smpi/display_sizes:yes
tracing/msg/process
: This option only has effect if this simulator is MSG-based. It traces the behavior of all categorized MSG processes, grouping them by hosts. This option can be used to track process location if this simulator has process migration. --cfg=tracing/msg/process:yes
tracing/buffer
: This option put some events in a time-ordered buffer using the insertion sort algorithm. The process of acquiring and releasing locks to access this buffer and the cost of the sorting algorithm make this process slow. The simulator performance can be severely impacted if this option is activated, but you are sure to get a trace file with events sorted. --cfg=tracing/buffer:yes
tracing/onelink_only
: This option changes the way SimGrid register its platform on the trace file. Normally, the tracing considers all routes (no matter their size) on the platform file to re-create the resource topology. If this option is activated, only the routes with one link are used to register the topology within an AS. Routes among AS continue to be traced as usual. --cfg=tracing/onelink_only:yes
tracing/disable_destroy
: Disable the destruction of containers at the end of simulation. This can be used with simulators that have a different notion of time (different from the simulated time). --cfg=tracing/disable_destroy:yes
tracing/basic
: Some visualization tools are not able to parse correctly the Paje file format. Use this option if you are using one of these tools to visualize the simulation trace. Keep in mind that the trace might be incomplete, without all the information that would be registered otherwise. --cfg=tracing/basic:yes
tracing/comment
: Use this to add a comment line to the top of the trace file. --cfg=tracing/comment:my_string
tracing/comment_file
: Use this to add the contents of a file to the top of the trace file as comment. --cfg=tracing/comment_file:textual_file.txt
viva/categorized
: This option generates a graph configuration file for Viva considering categorized resource utilization. --cfg=viva/categorized:graph_categorized.plist
viva/uncategorized
: This option generates a graph configuration file for Viva considering uncategorized resource utilization. --cfg=viva/uncategorized:graph_uncategorized.plist
Please pass
--help-tracing
to your simulator for the updated list of tracing options.
Some scenarios that might help you decide which tracing options you should use to analyze your simulator.
./your_simulator \ --cfg=tracing:yes \ --cfg=tracing/uncategorized:yes \ --cfg=tracing/filename:mytracefile.trace \ --cfg=viva/uncategorized:uncat.plist
./your_simulator \ --cfg=tracing:yes \ --cfg=tracing/categorized:yes \ --cfg=tracing/filename:mytracefile.trace \ --cfg=viva/categorized:cat.plist
A simplified example using the tracing mandatory functions.
int main (int argc, char **argv) { MSG_init (&argc, &argv); //(... after deployment ...) //note that category declaration must be called after MSG_create_environment TRACE_category_with_color ("request", "1 0 0"); TRACE_category_with_color ("computation", "0.3 1 0.4"); TRACE_category ("finalize"); msg_task_t req1 = MSG_task_create("1st_request_task", 10, 10, NULL); msg_task_t req2 = MSG_task_create("2nd_request_task", 10, 10, NULL); msg_task_t req3 = MSG_task_create("3rd_request_task", 10, 10, NULL); msg_task_t req4 = MSG_task_create("4th_request_task", 10, 10, NULL); MSG_task_set_category (req1, "request"); MSG_task_set_category (req2, "request"); MSG_task_set_category (req3, "request"); MSG_task_set_category (req4, "request"); msg_task_t comp = MSG_task_create ("comp_task", 100, 100, NULL); MSG_task_set_category (comp, "computation"); msg_task_t finalize = MSG_task_create ("finalize", 0, 0, NULL); MSG_task_set_category (finalize, "finalize"); //(...) MSG_clean(); return 0; }
A SimGrid-based simulator, when executed with the correct parameters (see above) creates a trace file in the Paje file format holding the simulated behavior of the application or the platform. You have several options to analyze this trace file:
pj_dump
(see PajeNG's wiki on pj_dump and more generally the PajeNG suite) and use gnuplot to plot resource usage, time spent on blocking/executing functions, and so on. Filtering capabilities are at your hand by doing grep
, with the best regular expression you can provide, to get only parts of the trace (for instance, only a subset of resources or processes).pj_dump
the contents of the SimGrid trace file to use R.This subsection describe some of the concepts regarding the Viva Visualization Tool and its relation with SimGrid traces. You should refer to Viva's website for further details on all its visualization techniques.
The analysis of a trace file using the tool always takes into account the concept of the time-slice. This concept means that what is being visualized in the screen is always calculated considering a specific time frame, with its beggining and end timestamp. The time-slice is configured by the user and can be changed dynamically through the window called Time Interval that is opened whenever a trace file is being analyzed. Users are capable to select the beggining and size of the time slice.
As stated above (see section Analyzing SimGrid Simulation Traces), one possibility to analyze SimGrid traces is to use Viva's graph view with a graph configuration to customize the graph according to the traces. A valid graph configuration (we are using the non-XML Property List Format to describe the configuration) can be created for any SimGrid-based simulator using the –cfg=viva/uncategorized:graph_uncategorized.plist or –cfg=viva/categorized:graph_categorized.plist (if the simulator defines resource utilization categories) when executing the simulation.
The basic description of the configuration is as follows:
{ node = (LINK, HOST, ); edge = (HOST-LINK, LINK-HOST, LINK-LINK, );
The nodes of the graph will be created based on the node parameter, which in this case is the different "HOST"s and "LINK"s of the platform used to simulate. The edge parameter indicates that the edges of the graph will be created based on the "HOST-LINK"s, "LINK-HOST"s, and "LINK-LINK"s of the platform. After the definition of these two parameters, the configuration must detail how the nodes (HOSTs and LINKs) should be drawn.
For that, the configuration must have an entry for each of the types used. For HOST, as basic configuration, we have:
HOST = { type = square; size = power; values = (power_used); };
The parameter size indicates which variable from the trace file will be used to define the size of the node HOST in the visualization. If the simulation was executed with availability traces, the size of the nodes will be changed according to these traces. The parameter type indicates which geometrical shape will be used to represent HOST, and the values parameter indicates which values from the trace will be used to fill the shape.
For LINK we have:
LINK = { type = rhombus; size = bandwidth; values = (bandwidth_used); }; }
The same configuration parameters are used here: type (with a rhombus), the size (whose value is from trace's bandwidth variable) and the values.
Viva is capable to handle a customized graph representation based on the variables present in the trace file. In the case of SimGrid, every time a category is created for tasks, two variables in the trace file are defined: one to indicate node utilization (how much power was used by that task category), and another to indicate link utilization (how much bandwidth was used by that category). For instance, if the user declares a category named request, there will be variables named prequest and a brequest (p for power and b for bandwidth). It is important to notice that the variable prequest in this case is only available for HOST, and brequest is only available for LINK. Example: suppose there are two categories for tasks: request and compute. To create a customized graph representation with a proportional separation of host and link utilization, use as configuration for HOST and LINK this:
HOST = { type = square; size = power; values = (prequest, pcomputation); }; LINK = { type = rhombus; size = bandwidth; values = (brequest, bcomputation); };
This configuration enables the analysis of resource utilization by MSG tasks through the identification of load-balancing issues and network bottlenecks, for instance.