Skip Navigation Links | |
Exit Print View | |
Oracle Solaris 11.1 Dynamic Tracing Guide Oracle Solaris 11.1 Information Library |
Co-existence With Existing Tools
cpucaps-sleep and cpucaps-wakeup
args[4] - ipv4info_t Structure
args[5] - ipv6info_t Structure
Remote Port Login/Logout Event Probes
13. Statically Defined Tracing for User Applications
This chapter describes the Function Boundary Tracing (FBT) provider, which provides probes associated with the entry to and return from most functions in the Oracle Solaris kernel. The function is the fundamental unit of program text. In a well-designed system, each function performs a discrete and well-defined operation on a specified object or series of like objects. Therefore, even on the smallest Oracle Solaris systems, FBT will provide on the order of 20,000 probes.
Similar to other DTrace providers, FBT has no probe effect when it is not explicitly enabled. When enabled, FBT only induces a probe effect in probed functions. While the FBT implementation is highly specific to the instruction set architecture, FBT has been implemented on both SPARC and x86 platforms. For each instruction set, there are a small number of functions that do not call other functions and are highly optimized by the compiler (so-called _leaf functions_) that cannot be instrumented by FBT. Probes for these functions are not present in DTrace.
Effective use of FBT probes requires knowledge of the operating system implementation. Therefore, it is recommended that you use FBT only when developing kernel software or when other providers are not sufficient. Other DTrace providers, including syscall, sched, proc, and io, can be used to answer most system analysis questions without requiring operating system implementation knowledge.
FBT provides a probe at the boundary of most functions in the kernel. The boundary of a function is crossed by entering the function and by returning from the function. FBT thus provides two functions for every function in the kernel: one upon entry to the function, and one upon return from the function. These probes are named entry and return, respectively. The function name, and module name are specified as part of the probe. All FBT probes specify a function name and module name.
The arguments to entry probes are the same as the arguments to the corresponding operating system kernel function. These arguments may be accessed in a typed fashion by using the args[] array. These arguments may be accessed as int64_t's by using the arg0 .. arg n variables.
While a given function only has a single point of entry, it may have many different points where it returns to its caller. You are usually interested in either the value that a function returned or the fact that the function returned at all rather than the specific return path taken. FBT therefore collects a function's multiple return sites into a single return probe. If the exact return path is of interest, you can examine the return probe args[0] value, which indicates the offset (in bytes) of the returning instruction in the function text.
If the function has a return value, the return value is stored in args[1]. If a function does not have a return value, args[1] is not defined.
You can use FBT to easily explore the kernel's implementation. The following example script records the first ioctl(2) from any xclock process and then follows the subsequent code path through the kernel:
/* * To make the output more readable, we want to indent every function entry * (and unindent every function return). This is done by setting the * "flowindent" option. */ #pragma D option flowindent syscall::ioctl:entry /execname == "xclock" && guard++ == 0/ { self->traceme = 1; printf("fd: %d", arg0); } fbt::: /self->traceme/ {} syscall::ioctl:return /self->traceme/ { self->traceme = 0; exit(0); }
Running this script results in output similar to the following example:
# dtrace -s ./xioctl.d dtrace: script './xioctl.d' matched 26254 probes CPU FUNCTION 0 => ioctl fd: 3 0 -> ioctl 0 -> getf 0 -> set_active_fd 0 <- set_active_fd 0 <- getf 0 -> fop_ioctl 0 -> sock_ioctl 0 -> strioctl 0 -> job_control_type 0 <- job_control_type 0 -> strcopyout 0 -> copyout 0 <- copyout 0 <- strcopyout 0 <- strioctl 0 <- sock_ioctl 0 <- fop_ioctl 0 -> releasef 0 -> clear_active_fd 0 <- clear_active_fd 0 -> cv_broadcast 0 <- cv_broadcast 0 <- releasef 0 <- ioctl 0 <= ioctl
The output shows that an xclock process called ioctl on a file descriptor that appears to be associated with a socket.
You can also use FBT when trying to understand kernel drivers. For example, the ssd(7D) driver has many code paths by which EIO may be returned. FBT can be easily used to determine the precise code path that resulted in an error condition, as shown in the following example:
fbt:ssd::return /arg1 == EIO/ { printf("%s+%x returned EIO.", probefunc, arg0); }
For more information on any one return of EIO, one may wish to speculatively trace all fbt probes, and then commit(or discard) based on the return value of a specific function. See Chapter 7, Speculative Tracing for details on speculative tracing.
Alternatively, you can use FBT to understand the functions called within a specified module. The following example lists all of the functions called in UFS:
# dtrace -n fbt:ufs::entry'{@a[probefunc] = count()}' dtrace: description 'fbt:ufs::entry' matched 353 probes ^C ufs_ioctl 1 ufs_statvfs 1 ufs_readlink 1 ufs_trans_touch 1 wrip 1 ufs_dirlook 1 bmap_write 1 ufs_fsync 1 ufs_iget 1 ufs_trans_push_inode 1 ufs_putpages 1 ufs_putpage 1 ufs_syncip 1 ufs_write 1 ufs_trans_write_resv 1 ufs_log_amt 1 ufs_getpage_miss 1 ufs_trans_syncip 1 getinoquota 1 ufs_inode_cache_constructor 1 ufs_alloc_inode 1 ufs_iget_alloced 1 ufs_iget_internal 2 ufs_reset_vnode 2 ufs_notclean 2 ufs_iupdat 2 blkatoff 3 ufs_close 5 ufs_open 5 ufs_access 6 ufs_map 8 ufs_seek 11 ufs_addmap 15 rdip 15 ufs_read 15 ufs_rwunlock 16 ufs_rwlock 16 ufs_delmap 18 ufs_getattr 19 ufs_getpage_ra 24 bmap_read 25 findextent 25 ufs_lockfs_begin 27 ufs_lookup 46 ufs_iaccess 51 ufs_imark 92 ufs_lockfs_begin_getpage 102 bmap_has_holes 102 ufs_getpage 102 ufs_itimes_nolock 107 ufs_lockfs_end 125 dirmangled 498 dirbadname 498
If you know the purpose or arguments of a kernel function, you can use FBT to understand how or why the function is being called. For example, putnext(9F) takes a pointer to a queue(9S) structure as its first member. The q_qinfo member of the queue structure is a pointer to a qinit(9S) structure. The qi_minfo member of the qinit structure has a pointer to a module_info(9S) structure, which contains the module name in its mi_idname member. The following example puts this information together by using the FBT probe in putnext to track putnext(9F) calls by module name:
fbt::putnext:entry { @calls[stringof(args[0]->q_qinfo->qi_minfo->mi_idname)] = count(); }
Running the above script results in output similar to the following example:
# dtrace -s ./putnext.d ^C iprb 1 rpcmod 1 pfmod 1 timod 2 vpnmod 2 pts 40 conskbd 42 kb8042 42 tl 58 arp 108 tcp 126 ptm 249 ip 313 ptem 340 vuid2ps2 361 ttcompat 412 ldterm 413 udp 569 strwhead 624 mouse8042 726
You can also use FBT to determine the time spent in a particular function. The following example shows how to determine the callers of the DDI delaying routines drv_usecwait(9F) and delay(9F).
fbt::delay:entry, fbt::drv_usecwait:entry { self->in = timestamp } fbt::delay:return, fbt::drv_usecwait:return /self->in/ { @snoozers[stack()] = quantize(timestamp - self->in); self->in = 0; }
This example script is particularly interesting to run during boot. Chapter 15, Anonymous Tracing describes the procedure for performing anonymous tracing during system boot. Upon reboot, you might see output similar to the following example:
# dtrace -ae ata`ata_wait+0x34 ata`ata_id_common+0xf5 ata`ata_disk_id+0x20 ata`ata_drive_type+0x9a ata`ata_init_drive+0xa2 ata`ata_attach+0x50 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`devi_attach_node+0x3d genunix`devi_config_one+0x1d0 genunix`ndi_devi_config_one+0xb0 devfs`dv_find+0x125 devfs`devfs_lookup+0x40 genunix`fop_lookup+0x21 genunix`lookuppnvp+0x236 genunix`lookuppnat+0xe7 genunix`lookupnameat+0x87 genunix`cstatat_getvp+0x134 value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@ 4105 8192 |@@@@ 783 16384 |@@@@@@@@@@@@@@ 2793 32768 | 16 65536 | 0 kb8042`kb8042_wait_poweron+0x29 kb8042`kb8042_init+0x22 kb8042`kb8042_attach+0xd6 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`devi_attach_node+0x3d genunix`devi_config_one+0x1d0 genunix`ndi_devi_config_one+0xb0 genunix`resolve_pathname+0xa5 genunix`ddi_pathname_to_dev_t+0x16 consconfig_dacf`consconfig_load_drivers+0x14 consconfig_dacf`dynamic_console_config+0x6c consconfig`consconfig+0x8 unix`stubs_common_code+0x3b value ------------- Distribution ------------- count 262144 | 0 524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 221 1048576 |@@@@ 29 2097152 | 0 usba`hubd_enable_all_port_power+0xed usba`hubd_check_ports+0x8e usba`usba_hubdi_attach+0x275 usba`usba_hubdi_bind_root_hub+0x168 uhci`uhci_attach+0x191 genunix`devi_attach+0x75 genunix`attach_node+0xb2 genunix`i_ndi_config_node+0x97 genunix`i_ddi_attachchild+0x4b genunix`i_ddi_attach_node_hierarchy+0x49 genunix`attach_driver_nodes+0x49 genunix`ddi_hold_installed_driver+0xe3 genunix`attach_drivers+0x28 value ------------- Distribution ------------- count 33554432 | 0 67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3 134217728 | 0
When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.
The presence of this optimization causes the return probe of the calling function to fire before the entry probe of the called function. This ordering can lead to quite a bit of confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:
fbt::foo:entry { self->traceme = 1; } fbt:::entry /self->traceme/ { printf("called %s", probefunc); } fbt::foo:return /self->traceme/ { self->traceme = 0; }
However, if foo ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically de-optimized on the fly, and DTrace does not wish to engage in a lie about how code is structured. Therefore, you should be aware of when tail-call optimization might be used.
Tail-call optimization is likely to be used in source code similar to the following example:
return (bar());
Or in source code similar to the following example:
(void) bar(); return;
Conversely, function source code that ends like the following example cannot have its call to bar optimized, because the call to bar is not a tail-call:
bar(); return (rval);
You can determine whether a call has been tail-call optimized using the following technique:
While running DTrace, trace arg0 of the return probe in question. arg0 contains the offset of the returning instruction in the function.
After DTrace has stopped, use mdb(1) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.
Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb to discover tail-call optimization in the kernel's dup function:
# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'
While this command is running, run a program that performs a dup(2), such as a bash process. The above command should provide output similar to the following example:
dup+0x10 ^C
Now examine the function with mdb:
# echo "dup::dis" | mdb -k dup: sra %o0, 0, %o0 dup+4: mov %o7, %g1 dup+8: clr %o2 dup+0xc: clr %o1 dup+0x10: call -0x1278 <fcntl> dup+0x14: mov %g1, %o7
The output shows that dup+0x10 is a call to the fcntl function and not a ret instruction. Therefore, the call to fcntl is an example of tail-call optimization.
You might observe functions that seem to enter but never return or vice versa. Such rare functions are generally hand-coded assembly routines that branch to the middle of other hand-coded assembly functions. These functions should not impede analysis: the branched-to function must still return to the caller of the branched-from function. That is, if you enable all FBT probes, you should see the entry to one function and the return from another function at the same stack depth.
Some functions cannot be instrumented by FBT. The exact nature of uninstrumentable functions is specific to the instruction set architecture.
Functions that do not create a stack frame on x86 systems cannot be instrumented by FBT. Because the register set for x86 is extraordinarily small, most functions must put data on the stack and therefore create a stack frame. However, some x86 functions do not create a stack frame and therefore cannot be instrumented. Actual numbers vary, but typically fewer than five percent of functions cannot be instrumented on the x86 platform.
Leaf routines hand-coded in assembly language on SPARC systems cannot be instrumented by FBT. The majority of the kernel is written in C, and all functions written in C can be instrumented by FBT. Actual numbers vary, but typically fewer cannot be instrumented on the SPARC platform.
FBT works by dynamically modifying kernel text. Because kernel breakpoints also work by modifying kernel text, if a kernel breakpoint is placed at an entry or return site before loading DTrace, FBT will refuse to provide a probe for the function, even if the kernel breakpoint is subsequently removed. If the kernel breakpoint is placed after loading DTrace, both the kernel breakpoint and the DTrace probe will correspond to the same point in text. In this situation, the breakpoint will trigger first, and then the probe will fire when the debugger resumes the kernel. It is recommended that kernel breakpoints not be used concurrently with DTrace. If breakpoints are required, use the DTrace breakpoint action instead.
The Oracle Solaris kernel can dynamic load and unload kernel modules. When FBT is loaded and a module is dynamically loaded, FBT automatically provides new probes associated with the new module. If a loaded module has no enabled FBT probes, the module may be unloaded; the corresponding probes will be destroyed as the module is unloaded. If a loaded module has enabled FBT probes, the module is considered busy, and cannot be unloaded.
The FBT provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.
|
As FBT exposes the kernel implementation, nothing about it is Stable — and the Module and Function name and data stability are explicitly Private. The data stability for Provider and Name are Evolving, but all other data stabilities are Private: they are artifacts of the current implementation. The dependency class for FBT is ISA: while FBT is available on all current instruction set architectures, there is no guarantee that FBT will be available on arbitrary future instruction set architectures.