JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris 11.1 Dynamic Tracing Guide     Oracle Solaris 11.1 Information Library
search filter icon
search icon

Document Information

Preface

1.  About DTrace

2.  D Programming Language

3.  Aggregations

4.  Actions and Subroutines

5.  Buffers and Buffering

6.  Output Formatting

7.  Speculative Tracing

8.  dtrace(1M) Utility

9.  Scripting

10.  Options and Tunables

11.  Providers

dtrace Provider

BEGIN Probe

END Probe

ERROR Probe

Stability

lockstat Provider

Overview

Adaptive Lock Probes

Spin Lock Probes

Thread Locks

Readers/Writer Lock Probes

Stability

profile Provider

profile- n probes

tick - n probes

Arguments

Timer Resolution

Probe Creation

Stability

cpc Provider

Probes

Arguments

Probe Availability

Probe Creation

Co-existence With Existing Tools

Examples

user-insts.d

kern-cycles.d

brendan-l2miss.d

brendan-generic-l2miss.d

off_core_event.d

l2miss.d

Stability

fbt Provider

Probes

Probe arguments

entry probes

return probes

Examples

Tail-call Optimization

Assembly Functions

Instruction Set Limitations

x86 Limitations

SPARC Limitations

Breakpoint Interaction

Module Loading

Stability

syscall Provider

Probes

System Call Anachronisms

Subcoded System Calls

New System Calls

Deleted System Calls

Large File System Calls

Private System Calls

Arguments

Stability

sdt Provider

Probes

Examples

Creating SDT Probes

Declaring Probes

Probe Arguments

Stability

mib Provider

Probes

Arguments

Stability

fpuinfo Provider

Probes

Arguments

Stability

pid Provider

Naming pid Probes

Function Boundary Probes

entry Probes

return Probes

Function Offset Probes

Stability

plockstat Provider

Overview

Mutex Probes

Reader/Writer Lock Probes

Stability

fasttrap Provider

Probes

Stability

sysinfo Provider

Probes

Arguments

Example

Stability

vminfo Provider

Probes

Arguments

Example

Stability

proc Provider

Probes

Arguments

lwpsinfo_t

psinfo_t

Examples

exec

start and exit

lwp-start and lwp-exit

signal-send

Stability

sched Provider

Probes

Arguments

cpuinfo_t

Examples

on-cpu and off-cpu

enqueue and dequeue

sleep and wakeup

preempt and remain-cpu

change-pri

tick

cpucaps-sleep and cpucaps-wakeup

Stability

io Provider

Probes

Arguments

bufinfo_t structure

devinfo_t

fileinfo_t

Examples

Stability

Protocols

ip Provider

Probes

Arguments

args[0] - pktinfo_t Structure

args[1] - csinfo_t Structure

args[2] - ipinfo_t Structure

args[3] - ifinfo_t Structure

args[4] - ipv4info_t Structure

args[5] - ipv6info_t Structure

Examples

Packets by host address

Sent size distribution

ipio.d

ipproto.d

Stability

iscsi Provider

Probes

Arguments

Types

Examples

One-liners

iscsiwho.d

iscsixfer.d

nfsv3 Provider

Arguments

Probes

Examples

nfsv3rwsnoop.d

nfsv3ops.d

nfsv3fileio.d

nfsv3rwtime.d

nfsv3io.d

nfsv4 Provider

Arguments

Probes

Examples

nfsv4rwsnoop.d

nfsv4ops.d

nfsv4fileio.d

nfsv4rwtime.d

nfsv4io.d

srp Provider

Probes

Probes Overview

Service up/down Event Probes

Remote Port Login/Logout Event Probes

SRP Command Event Probes

SCSI Command Event Probes

Data Transfer Probes

Types

scsicmd_t

conninfo_t

srp_portinfo_t

srp_logininfo_t

srp_taskinfo_t

xferinfo_t

Examples

service.d

srpwho.d

srpsnoop.d

tcp Provider

Probes

Arguments

pktinfo_t Structure

csinfo_t Structure

ipinfo_t Structure

tcpsinfo_t Structure

tcplsinfo_t Structure

tcpinfo_t Structure

Examples

Connections by Host Address

Connections by TCP Port

Who is Connecting to What

Who Isn't Connecting to What

Packets by Host Address

Packets by Local Port

Sent Size Distribution

tcpstate.d

tcpio.d

Stability

udp Provider

Probes

Arguments

pktinfo_t Structure

csinfo_t Structure

ipinfo_t Structure

udpsinfo_t Structure

udpsinfo_t Structure

Examples

Packets by Host Address

Packets by Local Port

Sent Size Distribution

Stability

12.  User Process Tracing

13.  Statically Defined Tracing for User Applications

14.  Security

15.  Anonymous Tracing

16.  Postmortem Tracing

17.  Performance Considerations

18.  Stability

19.  Translators

20.  Versioning

Index

Protocols

This section lists all the protocol providers.

ip Provider

The ip provider provides probes for tracing both IPv4 and IPv6 protocols.

Probes

The ip probes are described in the table below.

Table 11-33 ip Providers

Probe
Description
send
Probe that fires whenever the kernel network stack sends an ip packet.
receive
Probe that fires whenever the kernel network stack receives an ip packet.

These probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by ip. An IP packet must have a full IP header to be visible by these probes.


Note - Loopback tcp packets on Solaris may be processed by tcp fusion, a performance feature that by-passes the ip layer. These fused packets will not be visible using the ip:::send and ip:::receive probes. They are typically all loopback tcp packets after the tcp handshake.


Arguments

The argument types for the ip probes are listed in the table below. The arguments are described in the following section.

Table 11-34 ip Probe Arguments

Probe
args[0]
args[1]
args[2]
args[3]
args[4]
args[5]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
ifinfo_t *
ipv4info_t *
ipv6info_t *
args[0] - pktinfo_t Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;
args[1] - csinfo_t Structure

The csinfo_t structure is where connection state info can be made available if connection IDs become supported by the kernel in the future.

The cs_addr member is currently always NULL.

typedef struct csinfo {
        uintptr_t cs_addr;              /* currently always NULL */
} csinfo_t;
args[2] - ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

Table 11-35 ipinfo_t Members

ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr.
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
args[3] - ifinfo_t Structure

The ifinfo_t structure contains network interface info.

typedef struct ifinfo {
        string if_name;                /* interface name */
        int8_t if_local;               /* is delivered locally */
        netstackid_t if_ipstack;       /* ipstack ID */
        uintptr_t if_addr;             /* pointer to raw ill_t */
} ifinfo_t;

Table 11-36 ifinfo_t Members

if_name
Interface name as a string. For example, "eri0", "lo0", "ip.tun0", "<unknown>".
if_local
Is-local status. 1: is a local interface, 0: is not a local interface, -1: is unknown.
if_ipstack
ipstack ID, for associating ip stack instances, or NULL.
if_addr
Pointer to raw kernel structure for advanced debugging only.

The ifinfo_t details are provided for debugging convenience in the ip layer, if that information is available. There may be some types of traffic where some or all of that information is not available during the ip layer, for which the members may be: "<null>", -1, NULL, NULL.

args[4] - ipv4info_t Structure

The ipv4info_t structure is a DTrace translated version of the IPv4 header.

typedef struct ipv4info {
        uint8_t ipv4_ver;               /* IP version (4) */
        uint8_t ipv4_ihl;               /* header length, bytes */
        uint8_t ipv4_tos;               /* type of service field */
        uint16_t ipv4_length;           /* length (header + payload) */
        uint16_t ipv4_ident;            /* identification */
        uint8_t ipv4_flags;             /* IP flags */
        uint16_t ipv4_offset;           /* fragment offset */
        uint8_t ipv4_ttl;               /* time to live */
        uint8_t ipv4_protocol;          /* next level protocol */
        string ipv4_protostr;           /* next level protocol, as a string */
        uint16_t ipv4_checksum;         /* header checksum */
        ipaddr_t ipv4_src;              /* source address */
        ipaddr_t ipv4_dst;              /* destination address */
        string ipv4_saddr;              /* source address, string */
        string ipv4_daddr;              /* destination address, string */
        ipha_t *ipv4_hdr;               /* pointer to raw header */
} ipv4info_t;

Table 11-37 ipv4info_t Members

ipv4_ver
IP version (4).
ipv4_ihl
IPv4 header length, in bytes.
ipv4_tos
Contents of IPv4 type of service field.
ipv4_length
IPv4 packet length (header + payload) at time of tracing, in bytes.
ipv4_ident
IPv4 identification field.
ipv4_flags
IPv4 flags. See the ipv4_flags table below for bitwise values.
ipv4_offset
IPv4 fragment offset, in bytes.
ipv4_ttl
IPv4 time to live.
ipv4_protocol
IPv4 encapsulated protocol number. See /usr/include/netinet/in.h for the protocol list (IPPROTO_*).
ipv4_protostr
IPv4 encapsulated protocol, as a string. For example, "TCP".
ipv4_checksum
IPv4 header checksum, if available at time of tracing.
ipv4_src
IPv4 source address, as an ipaddr_t.
ipv4_dst
IPv4 destination address, as an ipaddr_t.
ipv4_saddr
IPv4 source address, as a dotted decimal quad string.
ipv4_daddr
IPv4 destination address, as a dotted decimal quad string.
ipv4_hdr
Pointer to raw IPv4 header at the time of tracing.

See RFC-791 for a detailed explanation for these IPv4 header fields. If the packet is IPv6, these members are either "<null>", 0, or NULL depending on type.

Table 11-38 ipv4_flags Values

IPH_DF
Don't fragment
IPH_MF
More fragments
args[5] - ipv6info_t Structure

The ipv6info_t structure is a DTrace translated version of the IPv6 header.

typedef struct ipv6info {
        uint8_t ipv6_ver;               /* IP version (6) */
        uint8_t ipv6_tclass;            /* traffic class */
        uint32_t ipv6_flow;             /* flow label */
        uint16_t ipv6_plen;             /* payload length */
        uint8_t ipv6_nexthdr;           /* next header protocol */
        string ipv6_nextstr;            /* next header protocol, as a string*/
        uint8_t ipv6_hlim;              /* hop limit */
        in6_addr_t *ipv6_src;           /* source address */
        in6_addr_t *ipv6_dst;           /* destination address */
        string ipv6_saddr;              /* source address, string */
        string ipv6_daddr;              /* destination address, string */
        ip6_t *ipv6_hdr;                /* pointer to raw header */
} ipv6info_t;

Table 11-39 ipv6info_t Members

ipv6_ver
IP version (6).
ipv6_tclass
IPv6 traffic class.
ipv6_plen
IPv6 payload length at time of tracing, in bytes.
ipv6_nexthdr
IPv6 next header protocol number. See /usr/include/netinet/in.h for the protocol list (IPPROTO_*).
ipv6_nextstr
IPv6 next header protocol, as a string. For example, "TCP".
ipv6_hlim
IPv6 hop limit.
ipv6_src
IPv6 source address, as an in6_addr_t.
ipv6_dst
IPv6 destination address, as an in6_addr_t.
ipv6_saddr
IPv6 source address, as an RFC-1884 convention 2 string with lower case hexadecimal digits.
ipv6_daddr
IPv6 destination address, as an RFC-1884 convention 2 string with lower case hexadecimal digits.
ipv6_hdr
Pointer to raw IPv6 header at the time of tracing.

See RFC-2460 for a detailed explanation for these IPv6 header fields. If the packet is IPv4, these members are either "<null>", 0, or NULL depending on type.

Examples

Some simple examples of ip provider usage follow.

Packets by host address

This DTrace one-liner counts received packets by host address:

# dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'ip:::receive ' matched 4 probes
^C

  192.168.1.5                                                       1
  192.168.1.185                                                     4
  fe80::214:4fff:fe3b:76c8                                          9
  127.0.0.1                                                        14
  192.168.1.109                                                    28

The output above shows that 28 IP packets were received from 192.168.1.109, 14 IP packets from 127.0.0.1, and so on.

Sent size distribution

This DTrace one-liner prints distribution plots of sent payload size by destination:

# dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'ip:::send ' matched 11 probes
^C

  192.168.2.27                                      
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          7        
              32 |@@@@                                     1        
              64 |@@@@                                     1        
             128 |                                         0        

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@                                    5        
              32 |@@@                                      3        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@               24       
             128 |@                                        1        
             256 |@                                        1        
             512 |@@                                       2        
            1024 |@                                        1        
            2048 |                                         0       
ipio.d

The following DTrace script traces IP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %10s %15s    %15s %8s %6s\n", "CPU", "DELTA(us)",
            "SOURCE", "DEST", "INT", "BYTES");
        last = timestamp;
}

ip:::send
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s -> %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_saddr, args[2]->ip_daddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

ip:::receive
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s <- %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_daddr, args[2]->ip_saddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

This example output shows tracing packets as they pass in and out of tunnels:

# ./ipio.d
 CPU  DELTA(us)          SOURCE               DEST      INT  BYTES
   1     598913    10.1.100.123 ->   192.168.10.75  ip.tun0     68
   1         73   192.168.1.108 ->     192.168.5.1     nge0    140
   1      18325   192.168.1.108 <-     192.168.5.1     nge0    140
   1         69    10.1.100.123 <-   192.168.10.75  ip.tun0     68
   0     102921    10.1.100.123 ->   192.168.10.75  ip.tun0     20
   0         79   192.168.1.108 ->     192.168.5.1     nge0     92

The fields printed are:

Field
Description
CPU
CPU id that event occurred on
DELTA(us)
elapsed time since previous event
SOURCE
source IP address
DEST
destination IP address
INT
interface name
BYTES
payload bytes

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; keep an eye on changes in the CPU column, or add a timestamp column and post sort.


ipproto.d

This DTrace script provides a neat summary for both send and receive IP traffic, including the next level protocol:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

ip:::send,
ip:::receive
{
        this->protostr = args[2]->ip_ver == 4 ?
            args[4]->ipv4_protostr : args[5]->ipv6_nextstr;
        @num[args[2]->ip_saddr, args[2]->ip_daddr, this->protostr] = count();
}

dtrace:::END
{
        printf("   %-28s %-28s %6s %8s\n", "SADDR", "DADDR", "PROTO", "COUNT");
        printa("   %-28s %-28s %6s %@8d\n", @num);
}

This script was run on a system with both IPv4 and IPv6 interfaces for several seconds:

# ./ipproto.d 
Tracing... Hit Ctrl-C to end.
^C
   SADDR                      DADDR                       PROTO    COUNT
   192.168.1.108              192.168.155.32                UDP        1
   192.168.1.108              192.168.17.55                 UDP        1
   192.168.1.108              192.168.228.54                UDP        1
   192.168.1.108              192.168.1.5                   UDP        1
   192.168.1.108              192.168.2.27                 ICMP        1
   192.168.1.200              192.168.3.255                 UDP        1
   192.168.1.5                192.168.1.108                 UDP        1
   192.168.2.27               192.168.1.108                ICMP        1
   fe80::214:4fff:fe3b:76c8   ff02::1                    ICMPV6        1
   fe80::2e0:81ff:fe5e:8308   fe80::214:4fff:fe3b:76c8   ICMPV6        1
   fe80::2e0:81ff:fe5e:8308   ff02::1:2                     UDP        1
   192.168.1.185              192.168.1.255                 UDP        2
   192.168.1.211              192.168.1.255                 UDP        3
   192.168.1.109              192.168.1.108                 TCP      428
   192.168.1.108              192.168.1.109                 TCP      789

The fields printed are:

Field
Description
SADDR
source IP address
DADDR
destination IP address
PROTO
IP next level protocol
COUNT
number of packets

The example output above provides a quick summary of network activity with host address details; we can see that both 192.168.1.109 and 192.168.1.108 are swapping many packets via TCP.

Stability

The ip provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element
Name stability
Data stability
Dependency class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

iscsi Provider

The iscsi provider provides probes for tracing iSCSI target activity.

This is a kernel provider built into the COMSTAR iSCSI target port provider. The COMSTAR iSCSI target and the user-land iSCSI target (/usr/sbin/iscsitgtd) are mutually exclusive. Only one of the targets can be enabled at a time. The COMSTAR iSCSI target DTrace provider provides all the probes that are provided by the user-land iSCSI provider, so that any DTrace script written for the userland provider built into the iSCSI target daemon (iscsitgtd) will work with the COMSTAR iSCSI target port provider as well without any modification. Since this provider instruments the iSCSI target activity, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.

Probes

SCSI Event
Probes
SCSI command/response
iscsi:::scsi-commandiscsi:::scsi-response
Data out/in/request (rtt)
iscsi:::data-sendiscsi:::data-receiveiscsi:::data-request
Login and logout command/response
iscsi:::login-commandiscsi:::login-responseiscsi:::logout-commandiscsi:::logout-response
NOP out/in (pings)
iscsi:::nop-receiveiscsi:::nop-send
Text and task command/response
iscsi:::task-commandiscsi:::task-responseiscsi:::text-commandiscsi:::text-response
Asynchronous message from target
iscsi:::async-send
Buffer dispatch and completion (not available with the USDT provider)
iscsi:::xfer-startiscsi:::xfer-done

Arguments

Probes
Variable
Type
Description
*
args[0]
conninfo_t *
connection info
*
args[1]
iscsiinfo_t *
common iSCSI properties
scsi-command
args[2]
scsicmd_t *
SCSI command block (cdb)
xfer-startxfer-done
args[2]
xferinfo_t *
Buffer info

Types

All COMSTAR iSCSI target probes have the first and second argument in common:

args[0] conninfo_t * connection information

conninfo_t

typedef struct conninfo {
        string ci_local;        /* local host IP address */
        string ci_remote;       /* remote host IP address */
        string ci_protocol;     /* protocol ("ipv4", "ipv6") */
} conninfo_t;
The conninfo_t structure is used by NFSv4 provider, 
Fibre Channel provider and is intended for use by all
application protocol providers as the first argument 
to indicate some basic information about the connection.

args[1] iscsiinfo_t * common iSCSI properties

iscsiinfo_t

typedef struct iscsiinfo {
    string ii_target;    /* target iqn */
    string ii_initiator;    /* initiator iqn */
        string ii_isid;        /* initiator session identifier */
    string ii_tsih;        /* target session identifying handle */
    string ii_transport;    /* transport type ("iser-ib", "sockets") */

    uint64_t ii_lun;    /* target logical unit number */

    uint32_t ii_itt;    /* initiator task tag */
    uint32_t ii_ttt;    /* target transfer tag */

    uint32_t ii_cmdsn;    /* command sequence number */
    uint32_t ii_statsn;    /* status sequence number */
    uint32_t ii_datasn;    /* data sequence number */

    uint32_t ii_datalen;    /* length of data payload */
    uint32_t ii_flags;    /* probe-specific flags */
} iscsiinfo_t;
The iscsiinfo_t structure is used to provide identifying 
information about the target and the initiator and
also some PDU level information such as lun, data length and sequence numbers.

The third argument is only used for the SCSI command probe or the data transfer probe

args[2] scsicmd_t * SCSI command block (cdb)

scsicmd_t

typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t *ic_cdb;        /* CDB data */
} scsicmd_t;
The scsicmd_t structure is used by the SCSI command probe 
and it contains information about the SCSI command
blocks and is intended for use by all the application 
protocols that deal with SCSI data.

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead the xfer-start and xfer-done probes can be used to trace the data transfer irrespective of the transport used. The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically.

args[2] xferinfo_t * data transfer information

xferinfo_t

typedef struct xferinfo {
    uintptr_t xfer_laddr;    /* local buffer address */
    uint32_t xfer_loffset;    /* offset within the local buffer */
    uint32_t xfer_lkey;    /* access control to local memory */
    uintptr_t xfer_raddr;    /* remote virtual address */
    uint32_t xfer_roffset;    /* offset from the remote address */
    uint32_t xfer_rkey;    /* access control to remote virtual address */
    uint32_t xfer_len;    /* transfer length */
    uint32_t xfer_type;    /* Read or Write */
} xferinfo_t;
The xferinfo_t structure is used by the xfer-start 
and the xfer-done probes and contain information about the
data transfer. When the transport type is iSER, 
the remote buffer information is given by the xfer_raddr,
xfer_rkey and xfer_roffset fields. It is set to 0 when the transport type is sockets.

Examples

One-liners

Frequency of iSCSI command types:

# dtrace -n 'iscsi*::: { @[probename] = count(); }'

Frequency of iSCSI client IP addresses:

# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'

Payload bytes by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'

Payload byte distribution by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'
iscsiwho.d

This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

iscsi*:::
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT");
        printa("   %-26s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

# ./iscsiwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     iSCSI EVENT    COUNT
   192.168.1.109                 nop-receive        1
   192.168.1.109                    nop-send        1
   192.168.1.109               scsi-response      126
   192.168.1.109                scsi-command      208
   192.168.1.109                data-request     1207
   192.168.1.109                data-receive     1207

The fields are:

Field
Description
REMOTE IP
IP address of the client
iSCSI EVENT
iSCSI event type
COUNT
Number of events traced

The example output shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend (they are from the perspective of the iSCSI target server).

iscsixfer.d

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. An iSER initiator should be able to read and write data from an iSER target at high data rates with relatively low CPU utilization compared to iSCSI using TCP/IP. In order to see the transport layer in use, display the ii_transport field from the iscsiinfo_t structure.

Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead here is a simple script to print an aggregation of all the data transferred between two points using the xfer-start probe. This can be used for iSCSI using TCP/IP and iSCSI over Remote DMA.

The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically (e.g. if the PDUs are received out of order, one might want to trace the ii_ttt, ii_datasn, ii_statsn etc.). To just get a trace of IO activity, the xfer-start/xfer-done probes should suffice.

#!/usr/sbin/dtrace -s

#pragma D option quiet

iscsi:::xfer-start
{
      @[args[0]->ci_remote, args[2]->xfer_type] = sum(args[2]->xfer_len);
}

END
{
      printf("%26s %10s %8s\n",  "REMOTE IP", "READ/WRITE", "BYTES");
      printa("%26s %10s %15@d\n", @);
}

This output shows the transfer of bytes:

# ./iscsixfer.d
Tracing... Hit Ctrl-C to end.
^C

   REMOTE IP                     READ/WRITE   BYTES
   192.168.1.109                 write        464
   192.168.1.109                 read          1024

The fields are:

Field
Description
REMOTE IP
IP address of the client
READ/WRITE
Read or write
BYTES
Number of bytes transferred

Now if a user is interested in just seeing the data move (read or write) as it happens, one could use this script

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
 printf(" %-26s %8s %10s\n", "REMOTE IP", "BYTES", "READ/WRITE");

}

iscsi:::xfer-start
{
  printf("%26s %%8d %108s\n", args[0]->ci_remote, 
args[2]->xfer_len, args[2]->xfer_type);
}

An interpretation for some of these events are:

iSCSI event
Interpretation
scsi-command
A SCSI command was issued, such as a read or a write. Use other scripts for a breakdown on the SCSI command type.
data-send
Data was sent from the iSCSI target server to the client; the client is performing a read.
data-receive
sData was received on the iSCSI target server from the client. The client is performing a write.

nfsv3 Provider

The nfsv3 provider provides probes for tracing NFS version 3 server activity.

Arguments

All NFS operation probes have the first argument in common:

    args[0]        conninfo_t *        socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi) and the NFS v4 provider (nfsv4), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).

    typedef struct conninfo {
        string ci_local;        /* local host address */
        string ci_remote;       /* remote host address */
        string ci_protocol;     /* protocol (ipv4, ipv6, etc) */
    } conninfo_t;

Operation probes have their second argument in common:

    args[1]        nfsv3opinfo_t *        NFS v3 operation properties

    typedef struct nfsv3opinfo {
        string noi_curpath;    /* current file handle path (if any) */
        cred_t *noi_cred;    /* credentials */
        uint64_t noi_xid;    /* transaction ID */
    } nfsv4opinfo_t;

Probes

Below is a list of the probes along with the specific argument for each whose type is defined by the NFS v3 specification:

Probe Name
args[2]
nfsv3:::op-access-start
ACCESS3args *
nfsv3:::op-access-done
ACCESS3res *
nfsv3:::op-commit-start
COMMIT3args *
nfsv3:::op-commit-done
COMMIT3res *
nfsv3:::op-create-start
CREATE3args *
nfsv3:::op-create-done
CREATE3res *
nfsv3:::op-fsinfo-start
FSINFO3args *
nfsv3:::op-fsinfo-done
FSINFO3res *
nfsv3:::op-fsstat-start
FSSTAT3args *
nfsv3:::op-fsstat-done
FSSTAT3res *
nfsv3:::op-getattr-start
GETATTR3args *
nfsv3:::op-getattr-done
GETATTR3res *
nfsv3:::op-lookup-start
LOOKUP3args *
nfsv3:::op-lookup-done
LOOKUP3res *
nfsv3:::op-link-start
LINK3args *
nfsv3:::op-link-done
LINK3res *
nfsv3:::op-mkdir-start
MKDIR3args *
nfsv3:::op-mkdir-done
MKDIR3res *
nfsv3:::op-mknod-start
MKNOD3args *
nfsv3:::op-mknod-done-
MKNOD3res *
nfsv3:::op-null-start
-
nfsv3:::op-null-done
-
nfsv3:::op-pathconf-start
PATHCONF3args *
nfsv3:::op-pathconf-done
PATHCONF3res *
nfsv3:::op-read-start
READ3args *
nfsv3:::op-read-done
READ3res *
nfsv3:::op-readdir-start
READDIR3args *
nfsv3:::op-readdir-done
READDIR3res *
nfsv3:::op-readdirplus-start
READDIRPLUS3args *
nfsv3:::op-readdirplus-done
READDIRPLUS3res *
nfsv3:::op-readlink-start
READLINK3args *
nfsv3:::op-readlink-done
READLINK3res *
nfsv3:::op-remove-start
REMOVE3args *
nfsv3:::op-remove-done
REMOVE3res *
nfsv3:::op-renamestart
RENAME3args *
nfsv3:::op-rename-done
RENAME3res *
nfsv3:::op-rmdir-start
RMDIR3args *
nfsv3:::op-rmdir-done
RMDIR3res *
nfsv3:::op-setattr-start
SETATTR3args *
nfsv3:::op-setattr-done
SETATTR3res *
nfsv3:::op-symlink-start
SYMLINK3args *
nfsv3:::op-symlink-done
SYMLINK3res *
nfsv3:::op-write-start
WRITE3args *
nfsv3:::op-write-done
WRITE3res *

Note - The op-null-* probes have an undefined args[2].


Examples

Some simple examples of nfsv3 provider usage follow.

nfsv3rwsnoop.d

This DTrace scripts traces NFSv3 read and write requests, showing details of each operation:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv3:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv3:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024,
            args[2]->data.data_len, args[1]->noi_curpath);
}

The following output shows a read of /export/stuff/bin/ghex2, then a read of /export/stuff/bin/gksu, and finally a write of /export/stuff/words12:

# ./nfsv3iosnoop.d
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME 
4299383207       192.168.17.75       R 0          4096 /export/stuff/bin/ghex2
4299391813       192.168.17.75       R 4         28672 /export/stuff/bin/ghex2
4299395700       192.168.17.75       R 32        32768 /export/stuff/bin/ghex2
4299396038       192.168.17.75       R 96        32768 /export/stuff/bin/ghex2
4299396462       192.168.17.75       R 128        8192 /export/stuff/bin/ghex2
4299396550       192.168.17.75       R 64        32768 /export/stuff/bin/ghex2
4320233417       192.168.17.75       R 0          4096 /export/stuff/bin/gksu
4320240902       192.168.17.75       R 4         28672 /export/stuff/bin/gksu
4320242434       192.168.17.75       R 32        32768 /export/stuff/bin/gksu
4320242730       192.168.17.75       R 64        24576 /export/stuff/bin/gksu
4333460565       192.168.17.75       W 0         32768 /export/stuff/words12
4333461477       192.168.17.75       W 32        32768 /export/stuff/words12
4333463264       192.168.17.75       W 64        32768 /export/stuff/words12
4333463567       192.168.17.75       W 96        32768 /export/stuff/words12
4333463893       192.168.17.75       W 128       32768 /export/stuff/words12
4333464202       192.168.17.75       W 160       32768 /export/stuff/words12
4333464451       192.168.17.75       W 192       10055 /export/stuff/words12

The fields printed are:

Field
Description
TIME(us)
Time of event in microseconds
CLIENT
Remote client IP address
OP
R == read, W == write
OFFSET
File offset of I/O, in Kbytes
BYTES
TBytes of I/O
PATHNAME
Pathname of file, if known

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; post sort the TIME column if needed.


nfsv3ops.d

This DTrace script counts NFSv3 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFSv3 operations:

# ./nfsv3ops.d
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.168.17.75                    op-commit-done                      1
   192.168.17.75                    op-commit-start                     1
   192.168.17.75                    op-create-done                      1
   192.168.17.75                    op-create-start                     1
   192.168.17.75                    op-access-done                      6
   192.168.17.75                    op-access-start                     6
   192.168.17.75                    op-read-done                        6
   192.168.17.75                    op-read-start                       6
   192.168.17.75                    op-write-done                       7
   192.168.17.75                    op-write-start                      7
   192.168.17.75                    op-lookup-done                      8
   192.168.17.75                    op-lookup-start                     8
   192.168.17.75                    op-getattr-done                    10
   192.168.17.75                    op-getattr-start                   10
  
   Client                           Operation                       Count
   
   Client                           Operation                       Count
   192.168.17.75                    op-getattr-done                     1
   192.168.17.75                    op-getattr-start                    1

The fields printed are:

Field
Description
Client
Remote client IP address
Operation
NFSv3 operation, described using the nfsv3 provider probename
Count
Operations during this interval
nfsv3fileio.d

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv3fileio.d
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/stuff/words10
        8624            0  /export/stuff/bin/echo-client-2
       13228            0  /export/stuff/bin/echo
      496292            0  /export/stuff/bin/ecpg

The fields printed are:

Field
Description
Rbytes
Bytes read for this pathname
Wbytes
Bytes written to this pathname
Pathname
Pathname of NFS file
nfsv3rwtime.d

This DTrace script prints a summary NFSv3 read and write elapsed times (latencies), along with other details:

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-start,
nfsv3:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv3:::op-read-done,
nfsv3:::op-write-done
/start[args[1]->noi_xid] != 0/
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv3 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv3 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv3 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a clear peak in the read time distribution plot in the 64 to 127 microsecond range, and a second smaller peak between 4 and 16 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
              64 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        1        
              32 |@                                        1        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           36       
             128 |@                                        1        
             256 |@                                        1        
             512 |                                         0        
            1024 |                                         0        
            2048 |@                                        1        
            4096 |@@@                                      3        
            8192 |@@@                                      4        
           16384 |                                         0        


NFSv3 read/write by host (total us):

  192.168.17.75                                                 81674

NFSv4 read/write top 10 files (total us):

  /export/stuff/motd                                               63
  /export/stuff/bin/daps                                         5876
  /export/stuff/bin/date                                         8150
  /export/stuff/bin/dasher                                      67584

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The next example also shows a pair of peaks, the first around a fraction of a millisecond, the second at around 4 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        4        
              32 |@                                        5        
              64 |@@@@@@                                   22       
             128 |@@@@                                     13       
             256 |@@@@@@@@@                                30       
             512 |@@                                       7        
            1024 |@                                        3        
            2048 |@@@                                      12       
            4096 |@@@@@@@                                  26       
            8192 |@@@@                                     15       
           16384 |@                                        2        
           32768 |                                         0        


NFSv3 read/write by host (total us):

  192.168.17.75                                                414458

NFSv3 read/write top 10 files (total us):

  /export/stuff/bin/cal                                         11225
  /export/stuff/bin/cjpeg                                       11947
  /export/stuff/bin/charmap                                     12347
  /export/stuff/bin/cdda2wav.bin                                13449
  /export/stuff/bin/chkey                                       13963
  /export/stuff/bin/cputrack                                    14533
  /export/stuff/bin/catman                                      15535
  /export/stuff/bin/csslint-0.6                                 18302
  /export/stuff/bin/col                                         19926
  /export/stuff/bin/cdrecord.bin                                40622

The first peak is likely to be NFS operations hitting the memory cache, and the second those that missed and went to disk. Further use of DTrace can confirm this theory.

The fields from the distribution plot are:

Field
Description
value
Minimum elapsed time for this event in microseconds
count
Number of events in this time range
nfsv3io.d

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv3:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.168.17.75 calling NFSv3 reads and writes:

# ./nfsv3io.d
Tracing... Interval 5 secs.

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.17.75                        27      1    686     40      100

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.17.75                         0      0      0      0        8

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   0.0.0.0                               0      0      0      0        0
   192.168.17.75                         2      0     28      0       18
^C

Other details can be calculated from the output, such as average read and write size (eg, 686(kr/s) / 27(r/s) = 25.4 average kr). These could also be added to the script to be printed as columns.

The fields printed are:

Field
Description
Client
Remote client IP address
r/s
reads per second
w/s
writes per second
kr/s
kilobytes read per second
kw/s
kilobytes written per second
ops/s
Total NFSv3 operations per second (including the reads and writes)

nfsv4 Provider

The nfsv4 provider provides probes for tracing NFS v4 server activity.

Arguments

All NFS operation probes have the first argument in common:

        args[0]        conninfo_t *        socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).

    typedef struct conninfo {
        string ci_local;        /* local host address */
        string ci_remote;       /* remote host address */
        string ci_protocol;     /* protocol (ipv4, ipv6, etc) */
    } conninfo_t;

Operation probes have their second argument in common:

    args[1]        nfsv4opinfo_t *        NFS v4 operation properties

    typedef struct nfsv4opinfo {
        string noi_curpath;    /* current file handle path (if any) */
        cred_t *noi_cred;    /* credentials */
        uint64_t noi_xid;    /* transaction ID */
    } nfsv4opinfo_t;

Callback operation probes have their second argument in common:

    args[1]        nfsv4cbinfo_t *        NFS v4 callback properties

    typedef struct nfsv4cbinfo {
        string nci_curpath;    /* file handle path (if any) */
    } nfsv4cbinfo_t;

Probes

Below is a list of the top level operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name
args[2]
nfsv4:::compound-op-start
COMPOUND4args *
nfsv4:::compound-op-done
COMPOUND4res *

Below is a list of operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name
args[2]
nfsv4:::op-access-start
ACCESS4args *
nfsv4:::op-access-done
ACCESS4res *
nfsv4:::op-close-start
CLOSE4args *
nfsv4:::op-close-done
CLOSE4res *
nfsv4:::op-commit-start
COMMIT4args *
nfsv4:::op-commit-done
COMMIT4res *
nfsv4:::op-create-start
CREATE4args *
nfsv4:::op-create-done
CREATE4res *
nfsv4:::op-delegpurge-start
DELEGPURGE4args *
nfsv4:::op-delegpurge-done
DELEGPURGE4res *
nfsv4:::op-delegreturn-start
DELEGRETURN4args *
nfsv4:::op-delegreturn-done
DELEGRETURN4res *
nfsv4:::op-getattr-start
GETATTR4args *
nfsv4:::op-getattr-done
GETATTR4res *
nfsv4:::op-getfh-start
GETFH4args *
nfsv4:::op-getfh-done
GETFH4res *
nfsv4:::op-link-start
LINK4args *
nfsv4:::op-link-done
LINK4res *
nfsv4:::op-lock-start
LOCK4args *
nfsv4:::op-lock-done
LOCK4res *
nfsv4:::op-lockt-start
LOCKT4args *
nfsv4:::op-lockt-done
LOCKT4res *
nfsv4:::op-locku-start
LOCKU4args *
nfsv4:::op-locku-done
LOCKU4res *
nfsv4:::op-lookup-start
LOOKUP4args *
nfsv4:::op-lookup-done
LOOKUP4res *
nfsv4:::op-lookupp-start
LOOKUPP4args *
nfsv4:::op-lookupp-done
LOOKUPP4res *
nfsv4:::op-nverify-start
NVERIFY4args *
nfsv4:::op-nverify-done
NVERIFY4res *
nfsv4:::op-open-start
OPEN4args *
nfsv4:::op-open-done
OPEN4res *
nfsv4:::op-open-confirm-start
OPEN_CONFIRM4args *
nfsv4:::op-open-confirm-done
OPEN_CONFIRM4res *
nfsv4:::op-open-downgrade-start
OPEN_DOWNGRADE4args *
nfsv4:::op-open-downgrade-done
OPEN_DOWNGRADE4args *
nfsv4:::op-openattr-start
OPENATTR4args *
nfsv4:::op-openattr-done
OPENATTR4res *
nfsv4:::op-putfh-start
PUTFH4args *
nfsv4:::op-putfh-done
PUTFH4res *
nfsv4:::op-putpubfh-start
PUTPUBFH4args *
nfsv4:::op-putpubfh-done
PUTPUBFH4res *
nfsv4:::op-putrootfh-start
PUTROOTFH4args *
nfsv4:::op-putrootfh-done
PUTROOTFH4res *
nfsv4:::op-read-start
READ4args *
nfsv4:::op-read-done
READ4res *
nfsv4:::op-readdir-start
READDIR4args *
nfsv4:::op-readdir-done
READDIR4res *
nfsv4:::op-readlink-start
READLINK4args *
nfsv4:::op-readlink-done
READLINK4res *
nfsv4:::op-release-lockowner-start
RELEASE_LOCKOWNER4args *
nfsv4:::op-release-lockowner-done
RELEASE_LOCKOWNER4res *
nfsv4:::op-remove-start
REMOVE4args *
nfsv4:::op-remove-don
REMOVE4res *
nfsv4:::op-rename-start
RENAME4args *
nfsv4:::op-rename-done
RENAME4res *
nfsv4:::op-renew-start
RENEW4args *
nfsv4:::op-renew-done
RENEW4res *
nfsv4:::op-restorefh-start
<none>
nfsv4:::op-restorefh-done
<none>
nfsv4:::op-savefh-start
SAVEFH4args *
nfsv4:::op-savefh-done
SAVEFH4res *
nfsv4:::op-secinfo-start
SECINFO4args *
nfsv4:::op-secinfo-done
SECINFO4res *
nfsv4:::op-setattr-start
SETATTR4args *
nfsv4:::op-setattr-done
SETATTR4res *
nfsv4:::op-setclientid-start
SETCLIENTID4args *
nfsv4:::op-setclientid-done
SETCLIENTID4res *
nfsv4:::op-setclientid-confirm-start
SETCLIENTID_CONFIRM4args *
nfsv4:::op-setclientid-confirm-done
SETCLIENTID_CONFIRM4res *
nfsv4:::op-verify-start
VERIFY4args *
nfsv4:::op-verify-done
VERIFY4res *
nfsv4:::op-write-start
WRITE4args *
nfsv4:::op-write-done
WRITE4res *

Callback compound probes have an undefined second argument; this slot is reserved for future use.

Below is a list of the top level callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name
args[2]
nfsv4:::compound-cb-start
CB_COMPOUND4args *
nfsv4:::compound-cb-done
CB_COMPOUND4res *

Below is a list of callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name
args[2]
nfsv4:::cb-getattr-start
CB_GETATTR4args*
nfsv4:::cb-getattr-done
CB_GETATTR4res *
nfsv4:::cb-recall-start
CB_RECALL4args *
nfsv4:::cb-recall-done
CB_RECALL4res *

Note - Since the Solaris NFS v4 implementation does not yet use the 'getattr' callback, the probe will not be implemented; it is noted here in anticipation of a future implementation.


Examples

Some simple examples of nfsv4 provider usage follow.

nfsv4rwsnoop.d

This DTrace script traces NFSv4 reads and writes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv4:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv4:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024, args[2]->data_len,
            args[1]->noi_curpath);
}

This output shows a few files were read, and one was written:

# ./nfsv4rwsnoop.d 
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME
156889725960     192.168.1.109       R 0          4096 /export/share/bin/nawk
156889735515     192.168.1.109       R 4         28672 /export/share/bin/nawk
156889736298     192.168.1.109       R 32        32768 /export/share/bin/nawk
156889736544     192.168.1.109       R 96        32768 /export/share/bin/nawk
156889736902     192.168.1.109       R 64        32768 /export/share/bin/nawk
156916061653     192.168.1.109       R 0          4096 /export/share/bin/ssh
156916069375     192.168.1.109       R 4         28672 /export/share/bin/ssh
156916070098     192.168.1.109       R 32        32768 /export/share/bin/ssh
156916070435     192.168.1.109       R 96        32768 /export/share/bin/ssh
156916070758     192.168.1.109       R 64        32768 /export/share/bin/ssh
156916071036     192.168.1.109       R 128       32768 /export/share/bin/ssh
156916071352     192.168.1.109       R 160       32768 /export/share/bin/ssh
156916071582     192.168.1.109       R 192       32768 /export/share/bin/ssh
156916071696     192.168.1.109       R 72         4096 /export/share/bin/ssh
156916080508     192.168.1.109       R 224        4096 /export/share/bin/ssh
156916080844     192.168.1.109       R 228       28672 /export/share/bin/ssh
156916081566     192.168.1.109       R 256       32768 /export/share/bin/ssh
156916081833     192.168.1.109       R 288       32768 /export/share/bin/ssh
156916082237     192.168.1.109       R 320       20480 /export/share/bin/ssh
156933373074     192.168.1.109       W 0         32768 /export/share/words
156933373351     192.168.1.109       W 32        32768 /export/share/words
156933373855     192.168.1.109       W 64        32768 /export/share/words
156933374185     192.168.1.109       W 96        32768 /export/share/words
156933375442     192.168.1.109       W 128       32768 /export/share/words
156933375864     192.168.1.109       W 160       32768 /export/share/words
156933376105     192.168.1.109       W 192       10055 /export/share/words

The fields printed are:

Field
Description
TIME(us)
Time of event in microseconds
CLIENT
Remote client IP address
OP
R == read, W == write
OFFSET
File offset of I/O, in Kbytes
BYTES
Bytes of I/O
PATHNAME
Pathname of file, if known
nfsv4ops.d

This DTrace script counts NFSv4 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFSv4 operations:

# ./nfsv4ops.d 
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.168.1.109                    op-getattr-done                     1
   192.168.1.109                    op-getattr-start                    1
   192.168.1.109                    op-putfh-done                       1
   192.168.1.109                    op-putfh-start                      1

   Client                           Operation                       Count
   192.168.1.109                    op-access-done                      1
   192.168.1.109                    op-access-start                     1
   192.168.1.109                    op-close-done                       1
   192.168.1.109                    op-close-start                      1
   192.168.1.109                    op-getfh-done                       1
   192.168.1.109                    op-getfh-start                      1
   192.168.1.109                    op-open-done                        1
   192.168.1.109                    op-open-start                       1
   192.168.1.109                    op-getattr-done                     3
   192.168.1.109                    op-getattr-start                    3
   192.168.1.109                    op-read-done                        9
   192.168.1.109                    op-read-start                       9
   192.168.1.109                    op-putfh-done                      12
   192.168.1.109                    op-putfh-start                     12
^C

   Client                           Operation                       Count

The fields printed are:

Field
Description
Client
Remote client IP address
Operation
NFSv4 operation, described using the nfsv4 provider probename
Count
Operations during this interval
nfsv4fileio.d

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv4fileio.d 
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/share/words1
       24528            0  /export/share/bin/xargs
       44864            0  /export/share/bin/ed
      232476            0  /export/share/bin/vi

The fields printed are:

Field
Description
Rbytes
Bytes read for this pathname
Wbytes
Bytes written to this pathname
Pathname
Pathname of NFS file
nfsv4rwtime.d

This DTrace script prints a summary NFSv4 read and write elapsed times (latencies), along with other details:

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-start,
nfsv4:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv4:::op-read-done,
nfsv4:::op-write-done
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv4 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv4 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv4 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a peak in the read time distribution plot in the 64 to 127 microsecond range, and a second peak between 2 and 8 milliseconds:

# ./nfsv4rwtime.d 
Tracing... Hit Ctrl-C to end.
^C
NFSv4 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@                                   1        
             128 |@@@@@@@@@@@                              2        
             256 |@@@@@@@@@@@@@@@@@                        3        
             512 |@@@@@@                                   1        
            1024 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@                                     6        
              64 |@@@@@@@@@@@@                             17       
             128 |@                                        1        
             256 |@@                                       3        
             512 |@                                        1        
            1024 |@@                                       3        
            2048 |@@@@@@@@                                 12       
            4096 |@@@@@@@@@@                               15       
            8192 |@                                        1        
           16384 |                                         0        


NFSv4 read/write by host (total us):

  192.168.1.109                                                148215

NFSv4 read/write top 10 files (total us):

  /export/share/bin/man                                          5020
  /export/share/bin/makeuuid                                     5132
  /export/share/bin/mc68030                                      5836
  /export/share/bin/m4                                           6446
  /export/share/bin/msgfmt                                       6669
  /export/share/bin/mkmsgs                                       6674
  /export/share/bin/mailstats                                    6935
  /export/share/bin/mkdir                                        7009
  /export/share/bin/mac                                          7693
  /export/share/bin/make                                        27903

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The first peak in the read distribution is likely to be NFS operations hitting the memory cache, and the second those that missed and read from disk. The writes were all fast as they are likely to written to the memory cache and returned asynchronously. Further use of DTrace can confirm these theories.

The fields from the distribution plot are:

Field
Description
value
Minimum elapsed time for this event in microseconds
count
Number of events in this time range
nfsv4io.d

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv4:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.168.1.109 calling NFSv4 reads and writes:

# ./nfsv4io.d 
Tracing... Interval 5 secs.

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                        17      1    331     40      290

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                         9      0    197      0      152

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                        16      0    269      0      363

   Client                              r/s    w/s   kr/s   kw/s    ops/s
^C

Other details can be calculated from the output, such as average read and write size (eg, 331(kr/s) / 17(r/s) = 19.5 average kr). These could also be added to the script to be printed as columns.

The fields printed are:

Field
Description
Client
Remote client IP address
r/s
Reads per second
w/s
Writes per second
kr/s
Kilobytes read per second
kw/s
Kilobytes written per second
ops/s
Total NFSv4 operations per second (including the reads and writes)

srp Provider

The srp provider provides probes for tracing srp port provider activity.

This is a kernel provider built into the COMSTAR srp target port provider.

Probes

Probes Overview
Header
Header
Service up/down
srp:::service-up, srp:::service-down
Remote Port login/logout
srp:::login-command, srp:::login-response, srp:::logout-command
SRP command/response
srp:::task-command, srp:::task-response
SCSI command/response
srp:::scsi-command, srp:::scsi-response
Data transfer
srp:::xfer-start, srp:::xfer-done

For all of the providers below, string fields which are not known contain the string "<unknown>". Integer fields which are not known contain 0.

Service up/down Event Probes

srp:::service-up and srp:::service-down trace SRP target online and offline events. Remote port information (ci_remote) is unavailable for both probes.

Probes
Variable
Type
Description
srp:::service-up

srp:::service-down

args[0]
conninfo_t *
connection information
srp:::service-up

srp:::service-down

args[1]
srp_portinfo_t *
local and remote port information
Remote Port Login/Logout Event Probes
Probes
Variable
Type
Description
srp:::login-command

srp:::login-response

srp:::logout-command

args[0]
conninfo_t *
connection information
srp:::login-command

srp:::login-response

srp:::logout-command

args[1]
srp_portinfo_t *
local and remote port information
srp:::login-command

srp:::login-response

args[2]
srp_logininfo_t *
login command/response information
SRP Command Event Probes
Probes
Variable
Type
Description
srp:::task-command

srp:::task-response

args[0]
conninfo_t *
connection information
srp:::task-command

srp:::task-response

args[1]
srp_portinfo_t *
local and remote port information
srp:::scsi-response

srp:::scsi-command

args[2]
srp_taskinfo_t *
srp task information
SCSI Command Event Probes
Probes
Variable
Type
Description
srp:::scsi-command

srp:::scsi-response

args[0]
conninfo_t *
connection information
srp:::scsi-command

srp:::scsi-response

args[1]
srp_portinfo_t *
local and remote port information
srp:::scsi-command
args[2]
scsicmd_t *
SCSI command block (cdb)
srp:::scsi-response
args[2]
srp_taskinfo_t *
srp task information
srp:::scsi-command
args[3]
srp_taskinfo_t *
srp task information
Data Transfer Probes
Probes
Variable
Type
Description
srp:::xfer-start

srp:::xfer-done

args[0]
conninfo_t *
connection information
srp:::xfer-start

srp:::xfer-done

args[1]
fc_port_info_t *
local port information
srp:::xfer-start

srp:::xfer-done

args[2]
xferinfo_t *
RDMA transfer information
srp:::xfer-start

srp:::xfer-done

args[3]
srp_taskinfo_t *
srp task information

Types

scsicmd_t, conninfo_t and xferinfo_t are common types which are used by other providers.

scsicmd_t
typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t  *ic_cdb;       /* CDB data */
} scsicmd_t;
conninfo_t
typedef struct conninfo {
        string ci_local;        /* GID of the local HCA */
        string ci_remote;       /* GID of the remote HCA */
        string ci_protocol;     /* protocol ("ib") */
} conninfo_t;
srp_portinfo_t
typedef struct srp_portinfo {
        /* initiator */
        string  pi_initiator;   /* Initiator: eui.xxxxxxxxxxxxxxx */
        string  pi_i_sid;       /* Initiator seiion id */

        /* target */
        string  pi_target;      /* Target: eui.xxxxxxxxxxxxxxx */
        string  pi_t_sid;       /* Target session id */

        uintptr_t pi_chan_id;   /* Channel identifier */
} srp_portinfo_t;
srp_logininfo_t
typedef struct srp_logininfo {
        uint64_t li_task_tag;      /* SRP task tag */
        uint32_t li_max_it_iu_len; /* Maxium iu length that initiator can
        send to target */
        uint32_t li_max_ti_iu_len; /* Maxium iu length that target can
        send to initiator */
        uint32_t li_request_limit; /* Maximun number of SRP requests
        that initiator can send on a channel */
        uint32_t reason_code;      /* Reason code */
} srp_logininfo_t;
srp_taskinfo_t
typedef struct srp_taskinfo {
        uint64_t ti_task_tag;   /* SRP task tag */
        uint64_t ti_lun;        /* Target logical unit number */
        uint8_t  ti_function;   /* Task management function */
        uint32_t ti_req_limit_delta; /* Increment of channel's request limit */
        uint8_t  ti_flag;            /* bit 2:DOOVER 3:DOUNDER 4:DIOVER 5:DIUNDER */
        uint32_t ti_do_resid_cnt;    /* Data-out residual count */
        uint32_t ti_di_resid_cnt;    /* Data-in residual count */
        uint8_t  ti_status;     /* Status of this task */
} srp_taskinfo_t;
xferinfo_t
typedef struct xferinfo {
        uintptr_t xfer_laddr;        /* Local buffer address */
        uint32_t  xfer_loffset;      /* Relative offset from local buffer */
        uint32_t  xfer_lkey;         /* Access control to local memory */
        uintptr_t xfer_raddr;        /* Remote virtual address */
        uint32_t  xfer_roffset;      /* Offset from the remote address */
        uint32_t  xfer_rkey;         /* Access control to remote address */
        uint32_t  xfer_len;          /* Transfer length */
        uint8_t   xfer_type;         /* 0: read; 1: write; */
} xferinfo_t;

Examples

service.d

This is a simple script to produce a report of target online/offline events.

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n\n");
        printf("%-14s   %-35s %-20s\n", "SRP EVENT", \
"LOCAL PORT", "EUI NAME");
};

srp:::service-up
{
        printf("%-14s   %-35s %-20s\n", probename, \
args[0]->ci_local, args[1]->pi_target);
}

srp:::service-down
{
        printf("%-14s   %-35s %-20s\n", probename, \
args[0]->ci_local, args[1]->pi_target);
}

This output shows the host and the number of iSCSI operations:

thump1## dtrace -s ~/src/service.d
Tracing... Hit Ctrl-C to end.
^C
SRP EVENT        LOCAL PORT                          EUI NAME
service-down     fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-down     fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
thump1##

The fields are:

Field
Description
SRP EVENT
srp event type
LOCAL PORT
GID of the local port
EUI NAME
EUI name of the local port
srpwho.d

This is a simple script to produce a report of the remote HCA port and a count of srp events. This is intended to provide a quick summary of srp activity when run on the SRP target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

srp:::login-command,
srp:::login-response,
srp:::task-command,
srp:::task-response,
srp:::scsi-command,
srp:::scsi-response,
srp:::xfer-start,
srp:::xfer-done
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-33s %14s %8s\n", "REMOTE GID", "iSCSI EVENT", "COUNT");
        printa("   %-33s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

thump1## dtrace -s ./srpwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE GID                           iSCSI EVENT    COUNT
   fe80000000000000:0003ba000100386d  login-command        1
   fe80000000000000:0003ba000100386d login-response        1
   fe80000000000000:0003ba0001003851  login-command        2
   fe80000000000000:0003ba0001003851 login-response        2
   fe80000000000000:0003ba0001003852  login-command        2
   fe80000000000000:0003ba0001003852 login-response        2
   fe80000000000000:0003ba0001004d32      xfer-done        9
   fe80000000000000:0003ba0001004d32     xfer-start        9
   fe80000000000000:0003ba0001004d31      xfer-done       18
   fe80000000000000:0003ba0001004d31     xfer-start       18
   fe80000000000000:0003ba0001004d32   scsi-command       22
   fe80000000000000:0003ba0001004d32  scsi-response       22
   fe80000000000000:0003ba0001004d32   task-command       22
   fe80000000000000:0003ba0001004d32  task-response       22
   fe80000000000000:0003ba0001004d31   scsi-command       42
   fe80000000000000:0003ba0001004d31  scsi-response       42
   fe80000000000000:0003ba0001004d31   task-command       42
   fe80000000000000:0003ba0001004d31  task-response       42

The fields are:

Field
Description
REMOTE GID
GID of the client HCA port
SRP EVENT
srp event type
COUNT
Number of events traced
srpsnoop.d

This is a simple script to snoop srp events when run on a srp target server.

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf("%17s %3s  %-40s %-14s %6s %10s  %6s\n", "TIMESTAMP",
            "CPU", "REMOTE GID", "EVENT", "BYTES", "TAG", "SCSIOP");

        /*
         * SCSI opcode to string translation hash. This is from
         * /usrp/include/sys/scsi/generic/commands.h. If you would
         * rather all hex, comment this out.
         */
        scsiop[0x08] = "read";
        scsiop[0x0a] = "write";
        scsiop[0x0b] = "seek";
        scsiop[0x28] = "read(10)";
        scsiop[0x2a] = "write(10)";
        scsiop[0x2b] = "seek(10)";
}

srp:::login-*
{
        printf("%17d %3d  %-40s %-14s %17d  -\n", timestamp, cpu, 
                         args[0]->ci_remote,
            probename, args[2]->li_task_tag);
}
srp:::task-command,
srp:::task-response,
srp:::scsi-response
{
        printf("%17d %3d  %-40s %-14s %6d %10d  -\n", timestamp, cpu,  
                    args[0]->ci_remote,
            probename, 0, args[2]->ti_task_tag);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] != NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  %s\n", timestamp, cpu,  
                        args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, scsiop[args[2]->ic_cdb[0]]);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] == NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  0x%x\n", timestamp, cpu,
                    args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, args[2]->ic_cdb[0]);
}

srp:::xfer-start,
srp:::xfer-done
{
        printf("%17d %3d  %-40s %-14s %6d %10d   %s\n", timestamp, 
                    cpu,  args[0]->ci_remote,
            probename,args[2]->xfer_len, args[3]->ti_task_tag,
            args[2]->xfer_type > 0 ? "READ" : "WRITE");
}

This output shows the snoop on dd commands executed by the initiator.

thump1## dtrace -s ./srpsnoop.d
TIMESTAMP       CPU  REMOTE GID                          EVENT           BYTES   TAG  SCSIOP
22644410404019   3  fe80000000000000:0003ba0001004d31   task-command        0   26  -
22644410493068   3  fe80000000000000:0003ba0001004d31   scsi-command        0   26  read(10)
22644410511422   3  fe80000000000000:0003ba0001004d31   task-command        0   30  -
22644410541494   3  fe80000000000000:0003ba0001004d31   scsi-command        0   30  read(10)
22644410621049   0  fe80000000000000:0003ba0001004d31   xfer-start       2048   26   READ
22644410720486   1  fe80000000000000:0003ba0001004d31   xfer-start      49152   30   READ
22644410681390   3  fe80000000000000:0003ba0001004d31   xfer-done        2048   26   READ
22644410694719   3  fe80000000000000:0003ba0001004d31   scsi-response       0   26  -
22644410703358   3  fe80000000000000:0003ba0001004d31   task-response       0   26  -
22644410895424   3  fe80000000000000:0003ba0001004d31   xfer-done       49152   30   READ
22644410901576   3  fe80000000000000:0003ba0001004d31   scsi-response       0   30  -
22644410905717   3  fe80000000000000:0003ba0001004d31   task-response       0   30  -
22727363721107   3  fe80000000000000:0003ba0001004d31   task-command        0   59  -
22727363919179   0  fe80000000000000:0003ba0001004d31   xfer-start      10240   59   WRITE
22727364095164   0  fe80000000000000:0003ba0001004d31   scsi-response       0   59  -
22727364105406   0  fe80000000000000:0003ba0001004d31   task-response       0   59  -
22727363812953   3  fe80000000000000:0003ba0001004d31   scsi-command        0   59  write(10)
22727363986185   3  fe80000000000000:0003ba0001004d31   xfer-done       10240   59   WRITE

The fields are:

Field
Description
CPU
CPU event occurred on
REMOTE GID
GID of the client HCA port
EVENT
srp event type
BYTES
Data bytes
TAG
Initiator task tag
SCSIOP
SCSI opcode as a description, as hex, or '-'

tcp Provider

he tcp provider provides probes for tracing the TCP protocol.

Probes

The tcp probes are described in the table below.

Table 11-40 tcp Probes

Probe
Description
state-change
Probe that fires a TCP session changes its TCP state. Previous state is noted in the tcplsinfo_t * probe argument. The tcpinfo_t * and ipinfo_t * arguments are NULL.
send
Probe that fires whenever TCP sends a segment (either control or data).
receive
Probe that fires whenever TCP receives a segment (either control or data).
connect-request
Probe that fires when a TCP active open is initiated by sending an initial SYN segment. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the initial SYN segment sent.
connect-established
This probe fires when either of the following occurs: either a TCP active OPEN succeeds - the initial SYN has been sent and a valid SYN,ACK segment has been received in response. TCP enters the ESTABLISHED state, and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the SYN,ACK segment received; or a simultaneous active OPEN succeeds and a final ACK is received from the peer TCP. TCP has entered the ESTABLISHED state and the tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers of the final ACK received. The common thread in these cases is that an active-OPEN connection is established at this point, in contrast with tcp:::accept-established which fires on passive connection establishment. In both cases above, the TCP segment that is presented via the tcpinfo_t * is the segment that triggers the transition to ESTABLISHED - the received SYN,ACK in the first case and the final ACK segment in the second.
connect-refused
A TCP active OPEN connection attempt was refused by the peer - a RST segment was received in acknowledgment of the initial SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST,ACK segment received.
accept-established
A passive open has succeeded - an initial active OPEN initiation SYN has been received, TCP responded with a SYN,ACK and a final ACK has been received. TCP has entered the ESTABLISHED state. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the final ACK segment received.
accept-refused
An incoming SYN has arrived for a destination port with no listening connection, so the connection initiation request is rejected by sending a RST segment ACKing the SYN. The tcpinfo_t * and ipinfo_t * probe arguments represent the TCP and IP headers associated with the RST segment sent.

The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.

Arguments

The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have 5 arguments - state-change has 6.

Probe
args[0]
args[1]
args[2]
args[3]
args[4]
args[5]
state-change
null
csinfo_t *
null
tcpsinfo_t *
null
tcplsinfo_t *
send
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-request
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-established
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
connect-refused
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
accept-established
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
accept-refused
pktinfo_t *
csinfo_t *
ipinfo_t *
tcpsinfo_t *
tcpinfo_t *
pktinfo_t Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
    uint64_t cs_cid;
    pid_t cs_pid;
    zoneid_t cs_zoneid;
 } csinfo_t;
cs_addr
Address of translated ip_xmit_attr_t *.
cs_cid
Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 11-41 ipinfo_t Members

ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
tcpsinfo_t Structure

The tcpsinfo_t structure contains tcp state info.

typedef struct tcpsinfo {
        uintptr tcps_addr;
        int tcps_local;       /* is delivered locally, boolean */
        int tcps_active;       /* active open (from here), boolean */
        uint16_t tcps_lport;      /* local port */
        uint16_t tcps_rport;      /* remote port */
string tcps_laddr;        /* local address, as a string */
string tcps_raddr;        /* remote address, as a string */
int32_t tcps_state;/* TCP state. Use inline tcp_state_string[]to convert to string */
        uint32_t tcps_iss;     /* initial sequence # sent */
        uint32_t tcps_suna;     /* sequence # sent but unacked */
        uint32_t tcps_snxt;     /* next sequence # to send */
        uint32_t tcps_rack;     /* sequence # we have acked */
        uint32_t tcps_rnxt;     /* next sequence # expected */
        uint32_t tcps_swnd;     /* send window size */
        uint32_t tcps_snd_ws;   /* send window scaling */
        uint32_t tcps_rwnd;     /* receive window size */
        uint32_t tcps_rcv_ws;   /* receive window scaling */
    uint32_t tcps_cwnd;        /* congestion window */
    uint32_t tcps_cwnd_ssthresh;    /* threshold for congestion avoidance */
    uint32_t tcps_sack_fack;    /* SACK sequence # we have acked */
    uint32_t tcps_sack_snxt;    /* next SACK seq # for retransmission */
        uint32_t tcps_rto;              /* round-trip timeout, msec */
    uint32_t tcps_mss;        /* max segment size */
        int tcps_retransmit;            /* retransmit send event, boolean */
} tcpsinfo_t;

It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, we need this data here.

Table 11-42 tcpsinfo_t Members

tcps_addr
Address of translated tcp_t *.
tcps_local
is local, boolean. 0: is not delivered locally (uses a physical network interface), 1: is delivered locally (including loopback interfaces, eg lo0).
tcps_active
is an active open, boolean. 0: TCP connection was created from a remote host, 1: TCP connection was created from this host.
tcps_lport
local port associated with the TCP connection.
tcps_rport
remote port associated with the TCP connection.
tcps_laddr
local address associated with the TCP connection, as a string.
tcps_raddr
remote address associated with the TCP connection, as a string.
tcps_state
TCP state. Inline definitions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string.
tcps_iss
Initial sequence number sent.
tcps_suna
Lowest sequence number for which we have sent data but not received acknowledgement.
tcps_snxt
Next sequence number to send. tcps_snxt - tcps_suna gives the number of bytes pending acknowledgement for the TCP connection.
tcps_rack
Highest sequence number for which we have received and sent acknowledgement.
tcps_rnxt
Next sequence number expected on receive side. tcps_rnxt - tcps_rack gives the number of bytes we have received but not yet acknowledged for the TCP connection.
tcps_swnd
TCP send window size.
tcps_snd_ws
TCP send window scale. tcps_swnd << tcp_snd_ws gives the scaled window size if window scaling options are in use.
tcps_rwnd
TCP receive window size.
tcps_rcv_ws
TCP receive window scale. tcps_rwnd << tcp_rcv_ws gives the scaled window size if window scaling options are in use.
tcps_cwnd
TCP congestion window size. tcps_cwnd_ssthresh TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins.
tcps_cwnd_ssthresh
TCP congestion window threshold. When the congestion window is greater than ssthresh, congestion avoidance begins.
tcps_sack_fack
Highest SACK-acked sequence number.
tcps_sack_snxt
Next sequence num to be retransmitted using SACK.
tcps_rto
Round-trip timeout. If we do not receive acknowledgement of data sent tcps_rto msec ago, retransmit is required.
tcps_mss
Maximum segment size.
tcps_retransmit
send is a retransmit, boolean. 1 for tcp:::send events that are retransmissions, 0 for tcp events that are not send events, and for send events that are not retransmissions.
tcplsinfo_t Structure

The tcplsinfo_t structure contains the previous tcp state during a state change.

typedef struct tcplsinfo {
        int32_t tcps_state;              /* TCP state */
} tcplsinfo_t;

Table 11-43 tcplsinfo_t Members

tcps_state
previous TCP state. Inline definitions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string.
tcpinfo_t Structure

The tcpinfo_t structure is a DTrace translated version of the TCP header.

typedef struct tcpinfo {
        uint16_t tcp_sport;             /* source port */
        uint16_t tcp_dport;             /* destination port */
        uint32_t tcp_seq;               /* sequence number */
        uint32_t tcp_ack;               /* acknowledgment number */
        uint8_t tcp_offset;             /* data offset, in bytes */
        uint8_t tcp_flags;              /* flags */
        uint16_t tcp_window;            /* window size */
        uint16_t tcp_checksum;          /* checksum */
        uint16_t tcp_urgent;            /* urgent data pointer */
        tcph_t *tcp_hdr;                /* raw TCP header */
} tcpinfo_t;

Table 11-44 tcpinfo_t Members

tcp_sport
TCP source port.
tcp_dport
TCP destination port.
tcp_seq
TCP sequence number.
tcp_ack
TCP acknowledgment number.
tcp_offset
Payload data offset, in bytes (not 32-bit words).
tcp_flags
TCP flags. See the tcp_flags table below for available macros.
tcp_window
TCP window size, bytes.
tcp_checksum
Checksum of TCP header and payload.
tcp_urgent
TCP urgent data pointer, bytes.
tcp_hdr
Pointer to raw TCP header at time of tracing.

Table 11-45 tcp_flags Values

TH_FIN
No more data from sender (finish).
TH_SYN
Synchronize sequence numbers (connect).
TH_RST
Reset the connection.
TH_PUSH
TCP push function.
TH_ACK
Acknowledgment field is set.
TH_URG
Urgent pointer field is set.
TH_ECE
Explicit congestion notification echo (see RFC-3168).
TH_CWR
Congestion window reduction.

See RFC-793 for a detailed explanation of the standard TCP header fields and flags.

Examples

Some simple examples of tcp provider usage follow.

Connections by Host Address

This DTrace one-liner counts inbound TCP connections by source IP address:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  127.0.0.1                                                         1
  192.168.2.88                                                      1
  fe80::214:4fff:fe8d:59aa                                          1
  192.168.1.109                                                     3

The output above shows there were 3 TCP connections from 192.168.1.109, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Connections by TCP Port

This DTrace one-liner counts inbound TCP connections by local TCP port:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

 40648                1
    22                3

The output above shows there were 3 TCP connections for port 22 (ssh), a single TCP connection for port 40648 (an RPC port).

Who is Connecting to What

Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:

# dtrace -n 
'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' 
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  192.168.2.88                                       40648                1
  fe80::214:4fff:fe8d:59aa                              22                1
  192.168.1.109                                         22                3

The output above shows there were 3 TCP connections from 192.168.1.109 to port 22 (ssh), etc.

Who Isn't Connecting to What

It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC793:

# dtrace -n 'tcp:::accept-refused 
{ @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' 
dtrace: description 'tcp:::receive ' matched 1 probes
^C

  192.168.1.109                                         23                2

Here we traced two failed attempts by host 192.168.1.109 to connect to port 23 (telnet).

Packets by Host Address

This DTrace one-liner counts TCP received packets by host address:

# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.168.2.30                                                     43
  192.168.1.109                                                    44
  192.168.2.88                                                   3722

The output above shows that 7 TCP packets were received from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Packets by Local Port

This DTrace one-liner counts TCP received packets by the local TCP port:

# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

 42303                3
 42634                3
  2049               27
 40648               36
    22              162

The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.

Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:

# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'tcp:::send ' matched 3 probes
^C

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    14       
             128 |@@@                                      1        
             256 |                                         0        

  192.168.2.30                                      
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@                     7        
              64 |@@@@@@@@@                                3        
             128 |@@@                                      1        
             256 |@@@@@@                                   2        
             512 |@@@                                      1        
            1024 |                                         0        
tcpstate.d

This DTrace script demonstrates the capability to trace TCP state changes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

int last[int];

dtrace:::BEGIN
{
        printf(" %3s %12s  %-20s    %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW");
}

tcp:::state-change
/ last[args[1]->cs_cid] /
{
        this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000;
        printf(" %3d %12d  %-20s -> %-20s\n", cpu, this->elapsed,
            tcp_state_string[args[5]->tcps_state], 
                     tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;
}

tcp:::state-change
/ last[args[1]->cs_cid] == 0 /
{
        printf(" %3d %12s  %-20s -> %-20s\n", cpu, "-",
            tcp_state_string[args[5]->tcps_state],
            tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;

This script was run on a system for a couple of minutes:

# ./tcpstate.d 

 CPU    DELTA(us)  OLD                     NEW                 
   0            -  state-listen         -> state-syn-received  
   0          613  state-syn-received   -> state-established   
   0            -  state-idle           -> state-bound         
   0           63  state-bound          -> state-syn-sent      
   0          685  state-syn-sent       -> state-bound         
   0           22  state-bound          -> state-idle          
   0          114  state-idle           -> state-closed  

In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to established. An outbound connection attempt is also made to a closed port. It takes 63us to go from bound to syn-sent, 685 us to go from syn-sent to bound etc.

The fields printed are:

Field
Description
CPU
CPU id for the event
DELTA(us)
time since previous event for that connection, microseconds
OLD
old TCP state
NEW
new TCP state
tcpio.d

The following DTrace script traces TCP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %15s:%-5s      %15s:%-5s %6s  %s\n", "CPU",
            "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS");
}

tcp:::send
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d -> %16s:%-5d %6d  (", cpu,
            args[2]->ip_saddr, args[4]->tcp_sport,
            args[2]->ip_daddr, args[4]->tcp_dport, this->length);
}

tcp:::receive
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d <- %16s:%-5d %6d  (", cpu,
            args[2]->ip_daddr, args[4]->tcp_dport,
            args[2]->ip_saddr, args[4]->tcp_sport, this->length);
}

tcp:::send,
tcp:::receive
{
        printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : "");
        printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : "");
        printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : "");
        printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : "");
        printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : "");
        printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : "");
        printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : "");
        printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : "");
        printf("%s", args[4]->tcp_flags == 0 ? "null " : "");
        printf("\b)\n");
}

This example output has captured a TCP handshake:

# ./tcpio.d
 CPU           LADDR:LPORT                RADDR:RPORT  BYTES  FLAGS
   1     192.168.2.80:22    ->    192.168.1.109:60337    464  (PUSH|ACK)
   1     192.168.2.80:22    ->    192.168.1.109:60337     48  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (SYN)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (SYN|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337    376  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     24  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337    736  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)

The fields printed are:

Field
Description
CPU
CPU id that event occurred on
LADDR
local IP address
LPORT
local TCP port
RADDR
remote IP address
RPORT
remote TCP port
BYTES
TCP payload bytes
FLAGS
TCP flags

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.


Stability

The tcp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA

udp Provider

The udp provider provides probes for tracing the UDP protocol.

Probes

The udp probes are described in the table below.

Table 11-46 udp Probes

Probe
Description
send
Probe that fires whenever UDP sends a datagram.
receive
Probe that fires whenever UDP receives a datagram.

The send and receive probes trace datagrams on physical interfaces and also packets on loopback interfaces that are processed by udp.

Arguments

The argument types for the udp probes are listed in the table below. The arguments are described in the following section.

Table 11-47 udp Probe Arguments

Probe
args[0]
args[1]
args[2]
args[3]
args[4]
send
pktinfo_t *
csinfo_t *
ipinfo_t *
udpinfo_t *
udpinfo_t *
receive
pktinfo_t *
csinfo_t *
ipinfo_t *
udpinfo_t *
udpinfo_t *
pktinfo_t Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;
csinfo_t Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
    uint64_t cs_cid;
    pid_t cs_pid;
    zoneid_t cs_zoneid;
 } csinfo_t;

Table 11-48 csinfo_t Members

cs_addr
Address of translated ip_xmit_attr_t *.
cs_cid
Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
cs_pid
Process ID associated with the connection.
cs_zoneid
Zone ID associated with the connection.
ipinfo_t Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in UDP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 11-49 ipinfo_t Members

ip_ver
IP version number. Currently either 4 or 6.
ip_plength
Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
ip_saddr
Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
ip_daddr
Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
udpsinfo_t Structure

The udpsinfo_t structure contains udp state info.

typedef struct udpsinfo {
        uintptr_t       udps_addr;
    uint16_t    upds_lport;    /* local port */
    uint16_t    udps_fport;    /* remote port */
    string            udps_laddr;    /* local address, as a string */
    string            udps_faddr;    /* remote address, as a string */
} udpsinfo_t;

Table 11-50 udpsinfo_t Members

udps_addr
Address of translated udp_t *.
udps_lport
local port associated with the UDP connection.
udps_fport
remote port associated with the UDP connection.
udps_laddr
local address associated with the UDP connection, as a string
udps_fport
remote address associated with the UDP connection, as a string
udpsinfo_t Structure

The udpinfo_t structure is a DTrace translated version of the UDP header.

typedef struct udpinfo {
    uint16_t udp_sport;             /* source port */
    uint16_t udp_dport;             /* destination port */
    uint16_t udp_length;            /* total length */
    uint16_t udp_checksum;          /* headers + data checksum */
    udpha_t *udp_hdr;               /* raw UDP header */
} udpinfo_t;

Table 11-51 udpinfo_t Members

udp_sport
UDP source port.
udp_dport
UDP destination port.
udp_length
Payload length in bytes.
udp_checksum
Checksum of UDP header and payload.
udp_hdr
Pointer to raw UDP header at time of tracing.

See RFC-768 for a detailed explanation of the standard UDP header fields and flags.

Examples

Some simple examples of udp provider usage follow.

Packets by Host Address

This DTrace one-liner counts UDP received packets by host address:

# dtrace -n 'udp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'udp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.168.2.30                                                     43
  192.168.1.109                                                    44
  192.168.2.88                                                   3722

The output above shows that 7 UDP packets were received from 127.0.0.1, 14 UDP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Packets by Local Port

This DTrace one-liner counts UDP received packets by the local UDP port:

# dtrace -n 'udp:::receive { @[args[4]->udp_dport] = count(); }'
dtrace: description 'udp:::receive ' matched 1 probe
^C

 33294                1
 33822                1
 38961                1
 44433                1
 46258                1
 46317                1
 47511                1
 50581                1
 54685                1
 56491                1
 59056                1
 62171                1
 62769                1
 64231                1

The output above shows that 1 packet was received for port 33294, 1 packet was received for port 33822, etc.

Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for UDP sends:

# dtrace -n 'udp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'udp:::send ' matched 6 probes
^C

  129.156.86.11                                     
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14       
              64 |                                         0        

Stability

The udp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element
Name Stability
Data Stability
Dependency Class
Provider
Evolving
Evolving
ISA
Module
Private
Private
Unknown
Function
Private
Private
Unknown
Name
Evolving
Evolving
ISA
Arguments
Evolving
Evolving
ISA