Contents
Abstract
SystemTap provides a command line interface and a scripting language to examine the activities of a running Linux system, particularly the kernel, in fine detail. SystemTap scripts are written in the SystemTap scripting language, are then compiled to C-code kernel modules and inserted into the kernel. The scripts can be designed to extract, filter and summarize data, thus allowing the diagnosis of complex performance problems or functional problems. SystemTap provides information similar to the output of tools like netstat, ps, top, and iostat. However, more filtering and analysis options can be used for the collected information.
There are two different setups for using SystemTap: You can either compile the SystemTap script and insert the resulting kernel module on the same machine (this requires the machine to have the kernel debugging information installed) or you can make use of the SystemTap client on one machine and the SystemTap compile server on another machine. This allows you to compile a SystemTap module on a machine other than the one on which it will be run, as long as client and server are compatible.
Depending on your setup, check the sections below for an overview of the
packages you need. To get access to the man pages and to a helpful collection
of example SystemTap scripts for various purposes, additionally install the
systemtap-doc package.
For a SystemTap compile server, you need the following SystemTap packages:
systemtap
systemtap-server
As SystemTap needs information about the kernel, some kernel-related
packages must be installed on the SystemTap compile server in addition to
the SystemTap packages. For each kernel you want to probe with SystemTap,
you need to install a set of the following packages that exactly matches the
kernel version and flavor (indicated by * in the list
below):
kernel-*-debuginfo
kernel-*-devel
kernel-source-*
gcc
![]() | Repository for Packages with Debugging Information |
|---|---|
If you subscribed your system for online updates, you can find
“debuginfo” packages in the
| |
To check if all needed packages are installed and if SystemTap is ready to use, perform the quick test run described in Section 5.2, “Initial Test”.
For a quick test run that probes the currently used kernel, execute the
following command as root:
stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'It runs a script and returns an output. If the output is similar to the following, SystemTap is successfully deployed and ready to use:
Pass 1:parsed user script and 59 library script(s) in 80usr/0sys/214real ms. Pass 2:
analyzed script: 1 probe(s), 11 function(s), 2 embed(s), 1 global(s) in 140usr/20sys/412real ms. Pass 3:
translated to C into"/tmp/stapDwEk76/stap_1856e21ea1c246da85ad8c66b4338349_4970.c" in 160usr/0sys/408real ms. Pass 4:
compiled C into "stap_1856e21ea1c246da85ad8c66b4338349_4970.ko" in 2030usr/360sys/10182real ms. Pass 5:
starting run. read performed Pass 5: run completed in 10usr/20sys/257real ms.
Checks the script against the existing tapset library in
| |
Examines the script for its components. | |
Translates the script to C. Runs the system C compiler to create a
kernel module from it. Both the resulting C code ( | |
Loads the module and enables all the probes (events and handlers) in
the script by hooking into the kernel. The event being probed is a
Virtual File System (VFS) read. As the event occurs on any processor, a
valid handler is executed (prints the text | |
After the SystemTap session is terminated, the probes are disabled, and the kernel module is unloaded. |
In case any error messages appear during the test, check the output for hints about any missing packages and make sure they are installed correctly. Rebooting and loading the appropriate kernel may also be needed.
Each time you run a SystemTap script, a SystemTap session is started.
A number of passes are done on the script before it is allowed to run, at
which point the script is compiled into a kernel module and loaded. In case
the script has already been executed before and no changes regarding any
components have occurred (for example, regarding compiler version, kernel
version, library path, script contents), SystemTap does not compile the
script again, but uses the *.c and
*.ko data stored in the SystemTap cache
(~/.systemtap). The module is unloaded when the tap has
finished running. For an example, see the test run in Section 5.1, “Installation” and the respective explanation.
SystemTap usage is based on SystemTap scripts
(*.stp). They tell SystemTap which type of
information to collect, and what to do once that information is collected.
The scripts are written in the SystemTap scripting language that is similar
to AWK and C. For the language definition, see http://sourceware.org/systemtap/langref/.
The essential idea behind a SystemTap script is to name
events, and to give them handlers.
When SystemTap runs the script, it monitors for certain events. When an
event occurs, the Linux kernel runs the handler as a sub-routine, then
resumes. Thus, events serve as the triggers for
handlers to run. Handlers can record specified data and print it in a
certain manner.
The SystemTap language only uses a few data types (integers, strings, and associative arrays of these), and full control structures (blocks, conditionals, loops, functions). It has a lightweight punctuation (semicolons are optional) and does not need detailed declarations (types are inferred and checked automatically).
For more information about SystemTap scripts and their syntax, refer
to Section 5.4, “Script Syntax” and to the
stapprobes and stapfuncs man pages,
that are available with the systemtap-doc package.
Tapsets are a library of pre-written probes and functions that can be
used in SystemTap scripts. When a user runs a SystemTap script, SystemTap checks
the script's probe events and handlers against the tapset library.
SystemTap then loads the corresponding probes and functions before
translating the script to C. Like SystemTap scripts themselves, tapsets use
the filename extension *.stp.
However, unlike SystemTap scripts, tapsets are not meant for direct execution—they constitute the library from which other scripts can pull definitions. Thus, the tapset library is an abstraction layer designed to make it easier for users to define events and functions. Tapsets provide useful aliases for functions that users may want to specify as an event (knowing the proper alias is mostly easier than remembering specific kernel functions that might vary between kernel versions).
The main commands associated with SystemTap are
stap and staprun. To execute them, you
either need root privileges or must be a member of the stapdev or stapusr group.
SystemTap front-end. Runs a SystemTap script (either from file, or from standard input). It translates the script into C code, compiles it, and loads the resulting kernel module into a running Linux kernel. Then, the requested system trace or probe functions are performed.
SystemTap back-end. Loads and unloads kernel modules produced by the SystemTap front-end.
For a list of options for each command, use --help.
For details, refer to the stap and the
staprun man pages.
There is a particular reason why SystemTap is split into a front-end and a back-end: This allows you to compile a SystemTap script on a (development) machine that has the kernel debugging information (needed to compile the script) and then transfer the resulting kernel module to a (production) machine that does not have any development tools or kernel debugging information installed.
To avoid giving root access to users just for running
SystemTap, you can make use of the following SystemTap groups.
They are
not available by default on SUSE Linux Enterprise, but you can create the groups and
modify the access rights accordingly.
stapdev
Members of this group can run SystemTap scripts with
stap, or run SystemTap instrumentation modules with
staprun. As running stap involves
compiling scripts into kernel modules and loading them into the kernel,
members of this group still have effective root access.
stapusr
Members of this group are only allowed to run SystemTap
instrumentation modules with staprun. In addition,
they can only run those modules from
/lib/modules/.
This directory must be owned by kernel_version/systemtap/root and must only be writable
for the root user.
The following list gives an overview of the SystemTap main files and directories.
/lib/modules/kernel_version/systemtap/
Holds the SystemTap instrumentation modules.
/usr/share/systemtap/tapset/
Holds the standard library of tapsets.
/usr/share/doc/packages/systemtap/examples
Holds a number of example SystemTap scripts for various purposes.
Only available if the systemtap-doc package is installed.
~/.systemtapData directory for cached SystemTap files.
/tmp/stap*
Temporary directory for SystemTap files, including translated C code and kernel object.
SystemTap scripts consist of the following two components:
Name the kernel events at the associated handler should be executed. Examples for events are entering or exiting a certain function, a timer expiring, or starting or terminating a session.
Series of script language statements that specify the work to be done whenever a certain event occurs. This normally includes extracting data from the event context, storing them into internal variables, or printing results.
An event and its corresponding handler is collectively called a
probe. SystemTap events are also called probe
points. A probe's handler is also referred to as probe
body.
Comments can be inserted anywhere in the SystemTap script in various
styles: using either #, /* */, or
// as marker.
A SystemTap script can have multiple probes. They must be written in the following format:
probeevent{statements}
Each probe has a corresponding statement block. This statement block
must be enclosed in { } and contains the statements to be
executed per event.
Example 5.1. Simple SystemTap Script
The following example shows a simple SystemTap script.
probebegin
{
printf
("hello world\n")
exit ()
}
![]()
Start of the probe. | |
Event | |
Start of the handler definition, indicated by
| |
First function defined in the handler: the | |
String to be printed by the | |
Second function defined in the handler: the
| |
End of the handler definition, indicated by |
The event begin
(the start of the SystemTap session) triggers the handler enclosed in
{ }, in this case the printf
function
which prints hello
world followed by a new line
,
then exits.
If your statement block holds several statements, SystemTap executes these statements in sequence—you do not need to insert special separators or terminators between multiple statements. A statement block can also be nested within another statement blocks. Generally, statement blocks in SystemTap scripts use the same syntax and semantics as in the C programming language.
SystemTap supports a number of built-in events.
The general event syntax is a dotted-symbol sequence. This allows a
breakdown of the event namespace into parts. Each component identifier may
be parametrized by a string or number literal, with a syntax like a function
call. A component may include a * character, to expand to
other matching probe points. A probe point may be followed by a
? character, to indicate that it is optional, and that
no error should result if it fails to expand.
Alternately, a probe point may
be followed by a ! character to indicate that it is both
optional and sufficient.
SystemTap supports multiple events per probe—they need to
be separated by a comma (,). If multiple events are
specified in a single probe, SystemTap will execute the handler when any of
the specified events occur.
In general, events can be classified into the following categories:
Synchronous events: Occur when any process executes an instruction at a particular location in kernel code. This gives other events a reference point (instruction address) from which more contextual data may be available.
An example for a synchronous event is
vfs.: The
entry to the file_operationfile_operation event for Virtual
File System (VFS). For example, in Section 5.2, “Initial Test”, read is the
file_operation event used for VFS.
Asynchronous events: Not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs.
Examples for asynchronous events are: begin (start
of a SystemTap session—as soon as a SystemTap script is run,
end (end of a SystemTap session), or timer events.
Timer events specify a handler to be executed periodically, like
example timer.s(,
or
seconds)timer.ms(.milliseconds)
When used in conjunction with other probes that collect information, timer events allow you to print out periodic updates and see how that information changes over time.
Example 5.2. Probe with Timer Event
For example, the following probe would print the text “hello world” every 4 seconds:
probe timer.s(4)
{
printf("hello world\n")
}For detailed information about supported events, refer to the stapprobes man page. The See Also section of the man page also contains links to other man pages that discuss supported events for specific subsystems and components.
Each SystemTap event is accompanied by a corresponding handler defined for that event, consisting of a statement block.
If you need the same set of statements in multiple probes, you can
place them in a function for easy reuse. Functions are defined by the
keyword function followed by a name. They take any
number of string or numeric arguments (by value) and may return a single
string or number.
functionfunction_name(arguments) {statements} probeevent{function_name(arguments)}
The statements in function_name are
executed when the probe for event executes. The
arguments are optional values passed into the
function.
Functions can be defined anywhere in the script. They may take any
One of the functions needed very often was already introduced in Example 5.1, “Simple SystemTap Script”: the printf
function for printing data in a formatted way. When using the
printf function, you can specify how arguments should
be printed by using a format string. The format string is included in
quotation marks and can contain further format specifiers, introduced by a
% character.
Which format strings to use depends on your list of arguments. Format strings can have multiple format specifiers—each matching a corresponding argument. Multiple arguments can be separated by a comma.
The example above would print the current executable name
(execname()) as string and the process ID
(pid()) as integer in brackets, followed by a space,
then the word open and a line break:
[...] vmware-guestd(2206) open hald(2360) open [...]
Apart from the two functions execname()and
pid()) used in Example 5.3, “printf Function with Format Specifiers”, a variety of other
functions can be used as printf arguments.
Among the most commonly used SystemTap functions are the following:
ID of the current thread.
Process ID of the current thread.
ID of the current user.
Current CPU number.
Name of the current process.
Number of seconds since UNIX epoch (January 1, 1970).
Convert time into a string.
String describing the probe point currently being handled.
Useful function for organizing print results. It (internally)
stores an indentation counter for each thread
(tid()). The function takes one argument, an
indentation delta, indicating how many spaces to add or remove from the
thread's indentation counter. It returns a string with some generic
trace data along with an appropriate number of indentation spaces. The
generic data returned includes a timestamp (number of microseconds since
the initial indentation for the thread), a process name, and the thread
ID itself. This allows you to identify what functions were called, who
called them, and how long they took.
Call entries and exits often do not immediately precede each other
(otherwise it would be easy to match them). In between a first call
entry and its exit, usually a number of other call entries and exits are
made. The indentation counter helps you match an entry with its
corresponding exit as it indents the next function call in case it is
not the exit of the previous one. For an example
SystemTap script using thread_indent() and the
respective output, refer to the SystemTap
Tutorial: http://sourceware.org/systemtap/tutorial/Tracing.html#fig:socket-trace.
For more information about supported SystemTap functions, refer to the stapfuncs man page.
If you have installed the systemtap-doc package, you can find a number of useful
SystemTap example scripts in
/usr/share/doc/packages/systemtap/examples.
This section describes a rather simple example script in more detail:
/usr/share/doc/packages/systemtap/examples/network/tcp_connections.stp.
Example 5.4. Monitoring Incoming TCP Connections with
tcp_connections.stp
#! /usr/bin/env stap
probe begin {
printf("%6s %16s %6s %6s %16s\n",
"UID", "CMD", "PID", "PORT", "IP_SOURCE")
}
probe kernel.function("tcp_accept").return?,
kernel.function("inet_csk_accept").return? {
sock = $return
if (sock != 0)
printf("%6d %16s %6d %6d %16s\n", uid(), execname(), pid(),
inet_get_local_port(sock), inet_get_ip_source(sock))
}This SystemTap script monitors the incoming TCP connections and helps to identify unauthorized or unwanted network access requests in real time. It shows the following information for each new incoming TCP connection accepted by the computer:
User ID (UID)
Command accepting the connection (CMD)
Process ID of the command (PID)
Port used by the connection (PORT)
IP address from which the TCP connection originated (IP_SOUCE)
To run the script, execute
stap /usr/share/doc/packages/systemtap/examples/network/tcp_connections.stp
and follow the output on the screen. To manually stop the script, press Ctrl+C.
This chapter only provides a short SystemTap overview. Refer to the following links for more information about SystemTap:
SystemTap project home page.
Huge collection of useful information about SystemTap, ranging from detailed user and developer documentation to reviews and comparisons with other tools, or Frequently Asked Questions and tips. Also contains collections of SystemTap scripts, examples and usage stories and lists recent talks and papers about SystemTap.
Features a SystemTap Tutorial, a SystemTap Beginner's Guide, a Tapset Developer's Guide, and a SystemTap Language Reference in PDF and HTML format. Also lists the relevant man pages.
You can also find the SystemTap language reference and SystemTap tutorial in
your installed system under
/usr/share/doc/packages/systemtap. Example SystemTap
scripts are available from the example subdirectory.