Visualization and analysis of concurrent system activities

The TRACE tool helps us to understand complicated behavior over time for all kinds of systems through its domain-independent capabilities for visualizing and analyzing concurrent activities that are encoded in execution traces. TRACE supports claims on resources, events, dependencies, and continuous signals.

Figure 1 shows a typical TRACE Gantt chart of an application that iteratively executes activities A-G. The claims model executions of these system activities and are shown as colored rectangles with a start and end time. Data flow between the activities is modeled by dependencies between claims, shown as arrows between the claims. Events, such as the start and end of an iteration of the application, are shown at the bottom as vertical arrows. The signal, above the events and below the claims, shows the throughput of the modeled system.

Figure 1: Example visualization of system activity over time (x-axis).
Understanding system behavior over time

There are many reasons why a system’s behavior over time is difficult to understand or, worse, confusing – even when the system is performing as designed. An example is a situation in which many concurrent activities share resources. Unforeseen interactions may arise due to the specific timing of the activities. Moreover, if the timing of the activities changes (e.g. due to an upgrade to the computational platform), the interactions may also change, which could result in significantly different behavior. Insight into the hows and whys of a system’s behavior over time is of paramount importance for making effective (design) choices and trade-offs in all phases of the system lifecycle, from the design of a new system to the maintenance of an old legacy system. The TRACE tool can help with this.

Execution traces to capture behavior over time

The TRACE tool works with execution traces. These capture (a single) system behavior over time. An execution trace contains time-stamped data for claims, events and signals. TRACE extends this with concepts from the Y-chart paradigm and a number of user-defined attributes (e.g. the name of the activity) in order to be tailored to a specific problem domain. This execution trace concept is generic, which makes TRACE widely applicable:

  • All levels of abstraction: the TRACE format can capture all levels of abstraction, from low-level embedded activities to system-level activities.

  • Domain-independent: the TRACE format is domain-independent but nevertheless has the means to be tailored to a specific domain via the user-defined attributes.

  • Source-independent: TRACE input can be created from any source, e.g. from the log files of legacy systems or from a discrete-event simulation model.

Figure 2: The Y-chart method

The Y-chart paradigm decomposes a system into an application, a platform, and the mapping between them, fostering reuse when one of these parts changes. Furthermore, it defines a feedback cycle to allow systematic design-space exploration (Figure 2). The Y-chart concepts of application, mapping and platform are realized in TRACE through the decomposition of an activity (e.g. an image-processing computation) into one or more claims on resources for a certain amount of time (e.g. two cores of the CPU and 20 MB of RAM for 50 ms). These are the main elements of the execution traces that capture system behavior (star-1 in Figure 2). To provide feedback on the system under analysis (star-2 in Figure 2), TRACE provides extensive visualization and analysis of execution traces.

Visualization and analysis of execution traces

The TRACE tool provides insights into the system dynamics of all kinds of systems through the visualization and analysis of execution traces (Figure 3). The TRACE Gantt chart view offers coloring, grouping and filtering options. This visualization alone is already very powerful and can bring quick insights into the system dynamics. TRACE also provides several analysis methods, which sets it apart from other Gantt-chart visualization tools.

  • Critical-path analysis can be used to detect activities and resources that are bottlenecks for performance.

  • Distance analysis can be used to compare execution traces with respect to structure, e.g. to check a model trace against an implementation trace.

  • Runtime verification provides a means to formally specify and verify the properties of execution traces using temporal logic. It is useful for expressing and checking performance properties, e.g., “the processing latency is at most 50 ms.”

  • Latency, throughput and work-in-progress analysis provides built-in methods to derive these properties from an execution trace.

  • Behavioral analysis can be used to visualize repetitive patterns that often are present in systems such as image-processing pipelines and manufacturing machines.

  • Resource-usage analysis can quickly give insights into the details of the resource usage.

The TRACE tool and the underlying concepts are relatively easy to learn, the TRACE input is easy to use, and the application of TRACE has great potential benefits.

Figure 3: A specialization of Figure 2 for TRACE
Publications
  1. M. Hendriks, T. Basten, “Performance engineering with TRACE,” Bits & Chips 5, 14 September 2018. https://bits-chips.nl/artikel/performance-engineering-with-trace/

  2. M. Hendriks, F.W. Vaandrager. “Reconstructing Critical Paths from Execution Traces,” CSE '12 Proceedings of the 2012 IEEE 15th International Conference on Computational Science and Engineering. IEEE Computer Society Washington, 2012. doi:10.1109/ICCSE.2012.78

  3. M. Hendriks, M. Geilen, A.R.B. Behrouzian, T. Basten, H. Alizadeh, and D. Goswami. “Checking metric temporal logic with TRACE,” in 16th International Conference on Application of Concurrency to System Design (ACSD 2016), Torun, Poland, 2016. doi: 10.1109/ACSD.2016.13

  4. M. Hendriks, J. Verriet, T. Basten, B. Theelen, M. Brassé, and L. Somers, “Analyzing execution traces: critical-path analysis and distance analysis,” International Journal on Software Tools for Technology Transfer, 2016. doi:10.1007/s10009-016-0436-z

Learn more

If you want to learn more, then read the "Getting started" guide:

Getting started
Contact