Protocols

This section lists all the protocol providers.

`ip` Provider

The ip provider provides probes for tracing both IPv4 and IPv6 protocols.

Probes

The ip probes are described in the table below.

Table 11-33 ip Providers

Probe	Description
`send`	Probe that fires whenever the kernel network stack sends an `ip` packet.
`receive`	Probe that fires whenever the kernel network stack receives an `ip` packet.

These probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by ip. An IP packet must have a full IP header to be visible by these probes.

Note - Loopback tcp packets on Solaris may be processed by tcp fusion, a performance feature that by-passes the ip layer. These fused packets will not be visible using the ip:::send and ip:::receive probes. They are typically all loopback tcp packets after the tcp handshake.

Arguments

The argument types for the ip probes are listed in the table below. The arguments are described in the following section.

Table 11-34 ip Probe Arguments

Probe	`args[0]`	`args[1]`	`args[2]`	`args[3]`	`args[4]`	`args[5]`
`send`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`ifinfo_t *`	`ipv4info_t *`	`ipv6info_t *`
`receive`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`ifinfo_t *`	`ipv4info_t *`	`ipv6info_t *`

`args[0] - pktinfo_t` Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;

`args[1] - csinfo_t` Structure

The csinfo_t structure is where connection state info can be made available if connection IDs become supported by the kernel in the future.

The cs_addr member is currently always NULL.

typedef struct csinfo {
        uintptr_t cs_addr;              /* currently always NULL */
} csinfo_t;

`args[2] - ipinfo_t` Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

Table 11-35 ipinfo_t Members

`ip_ver`	IP version number. Currently either 4 or 6.
`ip_plength`	Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
`ip_saddr`.	Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
`ip_daddr`	Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.

`args[3] - ifinfo_t` Structure

The ifinfo_t structure contains network interface info.

typedef struct ifinfo {
        string if_name;                /* interface name */
        int8_t if_local;               /* is delivered locally */
        netstackid_t if_ipstack;       /* ipstack ID */
        uintptr_t if_addr;             /* pointer to raw ill_t */
} ifinfo_t;

Table 11-36 ifinfo_t Members

`if_name`	Interface name as a string. For example, "eri0", "lo0", "ip.tun0", "<unknown>".
`if_local`	Is-local status. 1: is a local interface, 0: is not a local interface, -1: is unknown.
`if_ipstack`	`ipstack` ID, for associating `ip` stack instances, or NULL.
`if_addr`	Pointer to raw kernel structure for advanced debugging only.

The ifinfo_t details are provided for debugging convenience in the ip layer, if that information is available. There may be some types of traffic where some or all of that information is not available during the ip layer, for which the members may be: "<null>", -1, NULL, NULL.

`args[4] - ipv4info_t` Structure

The ipv4info_t structure is a DTrace translated version of the IPv4 header.

typedef struct ipv4info {
        uint8_t ipv4_ver;               /* IP version (4) */
        uint8_t ipv4_ihl;               /* header length, bytes */
        uint8_t ipv4_tos;               /* type of service field */
        uint16_t ipv4_length;           /* length (header + payload) */
        uint16_t ipv4_ident;            /* identification */
        uint8_t ipv4_flags;             /* IP flags */
        uint16_t ipv4_offset;           /* fragment offset */
        uint8_t ipv4_ttl;               /* time to live */
        uint8_t ipv4_protocol;          /* next level protocol */
        string ipv4_protostr;           /* next level protocol, as a string */
        uint16_t ipv4_checksum;         /* header checksum */
        ipaddr_t ipv4_src;              /* source address */
        ipaddr_t ipv4_dst;              /* destination address */
        string ipv4_saddr;              /* source address, string */
        string ipv4_daddr;              /* destination address, string */
        ipha_t *ipv4_hdr;               /* pointer to raw header */
} ipv4info_t;

Table 11-37 ipv4info_t Members

`ipv4_ver`	IP version (4).
`ipv4_ihl`	IPv4 header length, in bytes.
`ipv4_tos`	Contents of IPv4 type of service field.
`ipv4_length`	IPv4 packet length (header + payload) at time of tracing, in bytes.
`ipv4_ident`	IPv4 identification field.
`ipv4_flags`	IPv4 flags. See the `ipv4_flags` table below for bitwise values.
`ipv4_offset`	IPv4 fragment offset, in bytes.
`ipv4_ttl`	IPv4 time to live.
`ipv4_protocol`	IPv4 encapsulated protocol number. See `/usr/include/netinet/in.h` for the protocol list (IPPROTO_*).
`ipv4_protostr`	IPv4 encapsulated protocol, as a string. For example, "TCP".
`ipv4_checksum`	IPv4 header checksum, if available at time of tracing.
`ipv4_src`	IPv4 source address, as an `ipaddr_t`.
`ipv4_dst`	IPv4 destination address, as an `ipaddr_t`.
`ipv4_saddr`	IPv4 source address, as a dotted decimal quad string.
`ipv4_daddr`	IPv4 destination address, as a dotted decimal quad string.
`ipv4_hdr`	Pointer to raw IPv4 header at the time of tracing.

See RFC-791 for a detailed explanation for these IPv4 header fields. If the packet is IPv6, these members are either "<null>", 0, or NULL depending on type.

Table 11-38 ipv4_flags Values

`IPH_DF`	Don't fragment
`IPH_MF`	More fragments

`args[5] - ipv6info_t` Structure

The ipv6info_t structure is a DTrace translated version of the IPv6 header.

typedef struct ipv6info {
        uint8_t ipv6_ver;               /* IP version (6) */
        uint8_t ipv6_tclass;            /* traffic class */
        uint32_t ipv6_flow;             /* flow label */
        uint16_t ipv6_plen;             /* payload length */
        uint8_t ipv6_nexthdr;           /* next header protocol */
        string ipv6_nextstr;            /* next header protocol, as a string*/
        uint8_t ipv6_hlim;              /* hop limit */
        in6_addr_t *ipv6_src;           /* source address */
        in6_addr_t *ipv6_dst;           /* destination address */
        string ipv6_saddr;              /* source address, string */
        string ipv6_daddr;              /* destination address, string */
        ip6_t *ipv6_hdr;                /* pointer to raw header */
} ipv6info_t;

Table 11-39 ipv6info_t Members

`ipv6_ver`	IP version (6).
`ipv6_tclass`	IPv6 traffic class.
`ipv6_plen`	IPv6 payload length at time of tracing, in bytes.
`ipv6_nexthdr`	IPv6 next header protocol number. See `/usr/include/netinet/in.h` for the protocol list (`IPPROTO_*`).
`ipv6_nextstr`	IPv6 next header protocol, as a string. For example, "`TCP`".
`ipv6_hlim`	IPv6 hop limit.
`ipv6_src`	IPv6 source address, as an `in6_addr_t`.
`ipv6_dst`	IPv6 destination address, as an `in6_addr_t`.
`ipv6_saddr`	IPv6 source address, as an RFC-1884 convention 2 string with lower case hexadecimal digits.
`ipv6_daddr`	IPv6 destination address, as an RFC-1884 convention 2 string with lower case hexadecimal digits.
`ipv6_hdr`	Pointer to raw IPv6 header at the time of tracing.

See RFC-2460 for a detailed explanation for these IPv6 header fields. If the packet is IPv4, these members are either "<null>", 0, or NULL depending on type.

Examples

Some simple examples of ip provider usage follow.

Packets by host address

This DTrace one-liner counts received packets by host address:

# dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'ip:::receive ' matched 4 probes
^C

  192.168.1.5                                                       1
  192.168.1.185                                                     4
  fe80::214:4fff:fe3b:76c8                                          9
  127.0.0.1                                                        14
  192.168.1.109                                                    28

The output above shows that 28 IP packets were received from 192.168.1.109, 14 IP packets from 127.0.0.1, and so on.

Sent size distribution

This DTrace one-liner prints distribution plots of sent payload size by destination:

# dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'ip:::send ' matched 11 probes
^C

  192.168.2.27                                      
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          7        
              32 |@@@@                                     1        
              64 |@@@@                                     1        
             128 |                                         0        

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@                                    5        
              32 |@@@                                      3        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@               24       
             128 |@                                        1        
             256 |@                                        1        
             512 |@@                                       2        
            1024 |@                                        1        
            2048 |                                         0

`ipio.d`

The following DTrace script traces IP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %10s %15s    %15s %8s %6s\n", "CPU", "DELTA(us)",
            "SOURCE", "DEST", "INT", "BYTES");
        last = timestamp;
}

ip:::send
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s -> %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_saddr, args[2]->ip_daddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

ip:::receive
{
        this->elapsed = (timestamp - last) / 1000;
        printf(" %3d %10d %15s <- %15s %8s %6d\n", cpu, this->elapsed,
            args[2]->ip_daddr, args[2]->ip_saddr, args[3]->if_name,
            args[2]->ip_plength);
        last = timestamp;
}

This example output shows tracing packets as they pass in and out of tunnels:

# ./ipio.d
 CPU  DELTA(us)          SOURCE               DEST      INT  BYTES
   1     598913    10.1.100.123 ->   192.168.10.75  ip.tun0     68
   1         73   192.168.1.108 ->     192.168.5.1     nge0    140
   1      18325   192.168.1.108 <-     192.168.5.1     nge0    140
   1         69    10.1.100.123 <-   192.168.10.75  ip.tun0     68
   0     102921    10.1.100.123 ->   192.168.10.75  ip.tun0     20
   0         79   192.168.1.108 ->     192.168.5.1     nge0     92

The fields printed are:

Field	Description
`CPU`	CPU id that event occurred on
`DELTA`(us)	elapsed time since previous event
`SOURCE`	source IP address
`DEST`	destination IP address
`INT`	interface name
`BYTES`	payload bytes

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; keep an eye on changes in the CPU column, or add a timestamp column and post sort.

`ipproto.d`

This DTrace script provides a neat summary for both send and receive IP traffic, including the next level protocol:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

ip:::send,
ip:::receive
{
        this->protostr = args[2]->ip_ver == 4 ?
            args[4]->ipv4_protostr : args[5]->ipv6_nextstr;
        @num[args[2]->ip_saddr, args[2]->ip_daddr, this->protostr] = count();
}

dtrace:::END
{
        printf("   %-28s %-28s %6s %8s\n", "SADDR", "DADDR", "PROTO", "COUNT");
        printa("   %-28s %-28s %6s %@8d\n", @num);
}

This script was run on a system with both IPv4 and IPv6 interfaces for several seconds:

# ./ipproto.d 
Tracing... Hit Ctrl-C to end.
^C
   SADDR                      DADDR                       PROTO    COUNT
   192.168.1.108              192.168.155.32                UDP        1
   192.168.1.108              192.168.17.55                 UDP        1
   192.168.1.108              192.168.228.54                UDP        1
   192.168.1.108              192.168.1.5                   UDP        1
   192.168.1.108              192.168.2.27                 ICMP        1
   192.168.1.200              192.168.3.255                 UDP        1
   192.168.1.5                192.168.1.108                 UDP        1
   192.168.2.27               192.168.1.108                ICMP        1
   fe80::214:4fff:fe3b:76c8   ff02::1                    ICMPV6        1
   fe80::2e0:81ff:fe5e:8308   fe80::214:4fff:fe3b:76c8   ICMPV6        1
   fe80::2e0:81ff:fe5e:8308   ff02::1:2                     UDP        1
   192.168.1.185              192.168.1.255                 UDP        2
   192.168.1.211              192.168.1.255                 UDP        3
   192.168.1.109              192.168.1.108                 TCP      428
   192.168.1.108              192.168.1.109                 TCP      789

The fields printed are:

Field	Description
`SADDR`	source IP address
`DADDR`	destination IP address
`PROTO`	IP next level protocol
`COUNT`	number of packets

The example output above provides a quick summary of network activity with host address details; we can see that both 192.168.1.109 and 192.168.1.108 are swapping many packets via TCP.

Stability

The ip provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element	Name stability	Data stability	Dependency class
Provider	Evolving	Evolving	ISA
Module	Private	Private	Unknown
Function	Private	Private	Unknown
Name	Evolving	Evolving	ISA
Arguments	Evolving	Evolving	ISA

`iscsi` Provider

The iscsi provider provides probes for tracing iSCSI target activity.

This is a kernel provider built into the COMSTAR iSCSI target port provider. The COMSTAR iSCSI target and the user-land iSCSI target (/usr/sbin/iscsitgtd) are mutually exclusive. Only one of the targets can be enabled at a time. The COMSTAR iSCSI target DTrace provider provides all the probes that are provided by the user-land iSCSI provider, so that any DTrace script written for the userland provider built into the iSCSI target daemon (iscsitgtd) will work with the COMSTAR iSCSI target port provider as well without any modification. Since this provider instruments the iSCSI target activity, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.

Probes

SCSI Event	Probes
SCSI command/response	`iscsi:::scsi-commandiscsi:::scsi-response`
Data out/in/request (`rtt`)	`iscsi:::data-sendiscsi:::data-receiveiscsi:::data-request`
Login and logout command/response	`iscsi:::login-commandiscsi:::login-responseiscsi:::logout-commandiscsi:::logout-response`
NOP out/in (pings)	`iscsi:::nop-receiveiscsi:::nop-send`
Text and task command/response	`iscsi:::task-commandiscsi:::task-responseiscsi:::text-commandiscsi:::text-response`
Asynchronous message from target	`iscsi:::async-send`
Buffer dispatch and completion (not available with the USDT provider)	`iscsi:::xfer-startiscsi:::xfer-done`

Arguments

Probes	Variable	Type	Description
`*`	`args[0]`	`conninfo_t *`	connection info
`*`	`args[1]`	`iscsiinfo_t *`	common iSCSI properties
`scsi-command`	`args[2]`	`scsicmd_t *`	SCSI command block (`cdb`)
`xfer-startxfer-done`	`args[2]`	`xferinfo_t *`	Buffer info

Types

All COMSTAR iSCSI target probes have the first and second argument in common:

args[0] conninfo_t * connection information

conninfo_t

typedef struct conninfo {
        string ci_local;        /* local host IP address */
        string ci_remote;       /* remote host IP address */
        string ci_protocol;     /* protocol ("ipv4", "ipv6") */
} conninfo_t;
The conninfo_t structure is used by NFSv4 provider, 
Fibre Channel provider and is intended for use by all
application protocol providers as the first argument 
to indicate some basic information about the connection.

args[1] iscsiinfo_t * common iSCSI properties

iscsiinfo_t

typedef struct iscsiinfo {
    string ii_target;    /* target iqn */
    string ii_initiator;    /* initiator iqn */
        string ii_isid;        /* initiator session identifier */
    string ii_tsih;        /* target session identifying handle */
    string ii_transport;    /* transport type ("iser-ib", "sockets") */

    uint64_t ii_lun;    /* target logical unit number */

    uint32_t ii_itt;    /* initiator task tag */
    uint32_t ii_ttt;    /* target transfer tag */

    uint32_t ii_cmdsn;    /* command sequence number */
    uint32_t ii_statsn;    /* status sequence number */
    uint32_t ii_datasn;    /* data sequence number */

    uint32_t ii_datalen;    /* length of data payload */
    uint32_t ii_flags;    /* probe-specific flags */
} iscsiinfo_t;
The iscsiinfo_t structure is used to provide identifying 
information about the target and the initiator and
also some PDU level information such as lun, data length and sequence numbers.

The third argument is only used for the SCSI command probe or the data transfer probe

args[2] scsicmd_t * SCSI command block (cdb)

scsicmd_t

typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t *ic_cdb;        /* CDB data */
} scsicmd_t;
The scsicmd_t structure is used by the SCSI command probe 
and it contains information about the SCSI command
blocks and is intended for use by all the application 
protocols that deal with SCSI data.

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead the xfer-start and xfer-done probes can be used to trace the data transfer irrespective of the transport used. The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically.

args[2] xferinfo_t * data transfer information

xferinfo_t

typedef struct xferinfo {
    uintptr_t xfer_laddr;    /* local buffer address */
    uint32_t xfer_loffset;    /* offset within the local buffer */
    uint32_t xfer_lkey;    /* access control to local memory */
    uintptr_t xfer_raddr;    /* remote virtual address */
    uint32_t xfer_roffset;    /* offset from the remote address */
    uint32_t xfer_rkey;    /* access control to remote virtual address */
    uint32_t xfer_len;    /* transfer length */
    uint32_t xfer_type;    /* Read or Write */
} xferinfo_t;
The xferinfo_t structure is used by the xfer-start 
and the xfer-done probes and contain information about the
data transfer. When the transport type is iSER, 
the remote buffer information is given by the xfer_raddr,
xfer_rkey and xfer_roffset fields. It is set to 0 when the transport type is sockets.

Examples

One-liners

Frequency of iSCSI command types:

# dtrace -n 'iscsi*::: { @[probename] = count(); }'

Frequency of iSCSI client IP addresses:

# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'

Payload bytes by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'

Payload byte distribution by iSCSI command type:

# dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'

`iscsiwho.d`

This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

iscsi*:::
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT");
        printa("   %-26s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

# ./iscsiwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE IP                     iSCSI EVENT    COUNT
   192.168.1.109                 nop-receive        1
   192.168.1.109                    nop-send        1
   192.168.1.109               scsi-response      126
   192.168.1.109                scsi-command      208
   192.168.1.109                data-request     1207
   192.168.1.109                data-receive     1207

The fields are:

Field	Description
`REMOTE IP`	IP address of the client
`iSCSI EVENT`	iSCSI event type
`COUNT`	Number of events traced

The example output shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend (they are from the perspective of the iSCSI target server).

`iscsixfer.d`

Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. An iSER initiator should be able to read and write data from an iSER target at high data rates with relatively low CPU utilization compared to iSCSI using TCP/IP. In order to see the transport layer in use, display the ii_transport field from the iscsiinfo_t structure.

Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead here is a simple script to print an aggregation of all the data transferred between two points using the xfer-start probe. This can be used for iSCSI using TCP/IP and iSCSI over Remote DMA.

The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically (e.g. if the PDUs are received out of order, one might want to trace the ii_ttt, ii_datasn, ii_statsn etc.). To just get a trace of IO activity, the xfer-start/xfer-done probes should suffice.

#!/usr/sbin/dtrace -s

#pragma D option quiet

iscsi:::xfer-start
{
      @[args[0]->ci_remote, args[2]->xfer_type] = sum(args[2]->xfer_len);
}

END
{
      printf("%26s %10s %8s\n",  "REMOTE IP", "READ/WRITE", "BYTES");
      printa("%26s %10s %15@d\n", @);
}

This output shows the transfer of bytes:

# ./iscsixfer.d
Tracing... Hit Ctrl-C to end.
^C

   REMOTE IP                     READ/WRITE   BYTES
   192.168.1.109                 write        464
   192.168.1.109                 read          1024

The fields are:

Field	Description
`REMOTE IP`	IP address of the client
`READ/WRITE`	Read or write
`BYTES`	Number of bytes transferred

Now if a user is interested in just seeing the data move (read or write) as it happens, one could use this script

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
 printf(" %-26s %8s %10s\n", "REMOTE IP", "BYTES", "READ/WRITE");

}

iscsi:::xfer-start
{
  printf("%26s %%8d %108s\n", args[0]->ci_remote, 
args[2]->xfer_len, args[2]->xfer_type);
}

An interpretation for some of these events are:

iSCSI event	Interpretation
`scsi-command`	A SCSI command was issued, such as a read or a write. Use other scripts for a breakdown on the SCSI command type.
`data-send`	Data was sent from the iSCSI target server to the client; the client is performing a read.
`data-receive`	sData was received on the iSCSI target server from the client. The client is performing a write.

`nfsv3` Provider

The nfsv3 provider provides probes for tracing NFS version 3 server activity.

Arguments

All NFS operation probes have the first argument in common:

    args[0]        conninfo_t *        socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi) and the NFS v4 provider (nfsv4), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).

    typedef struct conninfo {
        string ci_local;        /* local host address */
        string ci_remote;       /* remote host address */
        string ci_protocol;     /* protocol (ipv4, ipv6, etc) */
    } conninfo_t;

Operation probes have their second argument in common:

    args[1]        nfsv3opinfo_t *        NFS v3 operation properties

    typedef struct nfsv3opinfo {
        string noi_curpath;    /* current file handle path (if any) */
        cred_t *noi_cred;    /* credentials */
        uint64_t noi_xid;    /* transaction ID */
    } nfsv4opinfo_t;

Probes

Below is a list of the probes along with the specific argument for each whose type is defined by the NFS v3 specification:

Probe Name	`args[2]`
`nfsv3:::op-access-start`	`ACCESS3args *`
`nfsv3:::op-access-done`	`ACCESS3res *`
`nfsv3:::op-commit-start`	`COMMIT3args *`
`nfsv3:::op-commit-done`	`COMMIT3res *`
`nfsv3:::op-create-start`	`CREATE3args *`
`nfsv3:::op-create-done`	`CREATE3res *`
`nfsv3:::op-fsinfo-start`	`FSINFO3args *`
`nfsv3:::op-fsinfo-done`	`FSINFO3res *`
`nfsv3:::op-fsstat-start`	`FSSTAT3args *`
`nfsv3:::op-fsstat-done`	`FSSTAT3res *`
`nfsv3:::op-getattr-start`	`GETATTR3args *`
`nfsv3:::op-getattr-done`	`GETATTR3res *`
`nfsv3:::op-lookup-start`	`LOOKUP3args *`
`nfsv3:::op-lookup-done`	`LOOKUP3res *`
`nfsv3:::op-link-start`	`LINK3args *`
`nfsv3:::op-link-done`	`LINK3res *`
`nfsv3:::op-mkdir-start`	`MKDIR3args *`
`nfsv3:::op-mkdir-done`	`MKDIR3res *`
`nfsv3:::op-mknod-start`	`MKNOD3args *`
`nfsv3:::op-mknod-done-`	`MKNOD3res *`
`nfsv3:::op-null-start`	`-`
`nfsv3:::op-null-done`	`-`
`nfsv3:::op-pathconf-start`	`PATHCONF3args *`
`nfsv3:::op-pathconf-done`	`PATHCONF3res *`
`nfsv3:::op-read-start`	`READ3args *`
`nfsv3:::op-read-done`	`READ3res *`
`nfsv3:::op-readdir-start`	`READDIR3args *`
`nfsv3:::op-readdir-done`	`READDIR3res *`
`nfsv3:::op-readdirplus-start`	`READDIRPLUS3args *`
`nfsv3:::op-readdirplus-done`	`READDIRPLUS3res *`
`nfsv3:::op-readlink-start`	`READLINK3args *`
`nfsv3:::op-readlink-done`	`READLINK3res *`
`nfsv3:::op-remove-start`	`REMOVE3args *`
`nfsv3:::op-remove-done`	`REMOVE3res *`
`nfsv3:::op-renamestart`	`RENAME3args *`
`nfsv3:::op-rename-done`	`RENAME3res *`
`nfsv3:::op-rmdir-start`	`RMDIR3args *`
`nfsv3:::op-rmdir-done`	`RMDIR3res *`
`nfsv3:::op-setattr-start`	`SETATTR3args *`
`nfsv3:::op-setattr-done`	`SETATTR3res *`
`nfsv3:::op-symlink-start`	`SYMLINK3args *`
`nfsv3:::op-symlink-done`	`SYMLINK3res *`
`nfsv3:::op-write-start`	`WRITE3args *`
`nfsv3:::op-write-done`	`WRITE3res *`

Note - The op-null-* probes have an undefined args[2].

Examples

Some simple examples of nfsv3 provider usage follow.

`nfsv3rwsnoop.d`

This DTrace scripts traces NFSv3 read and write requests, showing details of each operation:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv3:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv3:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024,
            args[2]->data.data_len, args[1]->noi_curpath);
}

The following output shows a read of /export/stuff/bin/ghex2, then a read of /export/stuff/bin/gksu, and finally a write of /export/stuff/words12:

# ./nfsv3iosnoop.d
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME 
4299383207       192.168.17.75       R 0          4096 /export/stuff/bin/ghex2
4299391813       192.168.17.75       R 4         28672 /export/stuff/bin/ghex2
4299395700       192.168.17.75       R 32        32768 /export/stuff/bin/ghex2
4299396038       192.168.17.75       R 96        32768 /export/stuff/bin/ghex2
4299396462       192.168.17.75       R 128        8192 /export/stuff/bin/ghex2
4299396550       192.168.17.75       R 64        32768 /export/stuff/bin/ghex2
4320233417       192.168.17.75       R 0          4096 /export/stuff/bin/gksu
4320240902       192.168.17.75       R 4         28672 /export/stuff/bin/gksu
4320242434       192.168.17.75       R 32        32768 /export/stuff/bin/gksu
4320242730       192.168.17.75       R 64        24576 /export/stuff/bin/gksu
4333460565       192.168.17.75       W 0         32768 /export/stuff/words12
4333461477       192.168.17.75       W 32        32768 /export/stuff/words12
4333463264       192.168.17.75       W 64        32768 /export/stuff/words12
4333463567       192.168.17.75       W 96        32768 /export/stuff/words12
4333463893       192.168.17.75       W 128       32768 /export/stuff/words12
4333464202       192.168.17.75       W 160       32768 /export/stuff/words12
4333464451       192.168.17.75       W 192       10055 /export/stuff/words12

The fields printed are:

Field	Description
`TIME`(us)	Time of event in microseconds
`CLIENT`	Remote client IP address
`OP`	`R` == read, `W` == write
`OFFSET`	File offset of I/O, in Kbytes
`BYTES`	TBytes of I/O
`PATHNAME`	Pathname of file, if known

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; post sort the TIME column if needed.

`nfsv3ops.d`

This DTrace script counts NFSv3 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFSv3 operations:

# ./nfsv3ops.d
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.168.17.75                    op-commit-done                      1
   192.168.17.75                    op-commit-start                     1
   192.168.17.75                    op-create-done                      1
   192.168.17.75                    op-create-start                     1
   192.168.17.75                    op-access-done                      6
   192.168.17.75                    op-access-start                     6
   192.168.17.75                    op-read-done                        6
   192.168.17.75                    op-read-start                       6
   192.168.17.75                    op-write-done                       7
   192.168.17.75                    op-write-start                      7
   192.168.17.75                    op-lookup-done                      8
   192.168.17.75                    op-lookup-start                     8
   192.168.17.75                    op-getattr-done                    10
   192.168.17.75                    op-getattr-start                   10
  
   Client                           Operation                       Count
   
   Client                           Operation                       Count
   192.168.17.75                    op-getattr-done                     1
   192.168.17.75                    op-getattr-start                    1

The fields printed are:

Field	Description
`Client`	Remote client IP address
`Operation`	NFSv3 operation, described using the `nfsv3` provider probename
`Count`	Operations during this interval

`nfsv3fileio.d`

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv3fileio.d
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/stuff/words10
        8624            0  /export/stuff/bin/echo-client-2
       13228            0  /export/stuff/bin/echo
      496292            0  /export/stuff/bin/ecpg

The fields printed are:

Field	Description
`Rbytes`	Bytes read for this pathname
`Wbytes`	Bytes written to this pathname
`Pathname`	Pathname of NFS file

`nfsv3rwtime.d`

This DTrace script prints a summary NFSv3 read and write elapsed times (latencies), along with other details:

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv3:::op-read-start,
nfsv3:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv3:::op-read-done,
nfsv3:::op-write-done
/start[args[1]->noi_xid] != 0/
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv3 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv3 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv3 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a clear peak in the read time distribution plot in the 64 to 127 microsecond range, and a second smaller peak between 4 and 16 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
              64 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        1        
              32 |@                                        1        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           36       
             128 |@                                        1        
             256 |@                                        1        
             512 |                                         0        
            1024 |                                         0        
            2048 |@                                        1        
            4096 |@@@                                      3        
            8192 |@@@                                      4        
           16384 |                                         0        


NFSv3 read/write by host (total us):

  192.168.17.75                                                 81674

NFSv4 read/write top 10 files (total us):

  /export/stuff/motd                                               63
  /export/stuff/bin/daps                                         5876
  /export/stuff/bin/date                                         8150
  /export/stuff/bin/dasher                                      67584

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The next example also shows a pair of peaks, the first around a fraction of a millisecond, the second at around 4 milliseconds:

# ./nfsv3rwtime.d
Tracing... Hit Ctrl-C to end.
^C
NFSv3 read/write distributions (us):

  read                                              
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@                                        4        
              32 |@                                        5        
              64 |@@@@@@                                   22       
             128 |@@@@                                     13       
             256 |@@@@@@@@@                                30       
             512 |@@                                       7        
            1024 |@                                        3        
            2048 |@@@                                      12       
            4096 |@@@@@@@                                  26       
            8192 |@@@@                                     15       
           16384 |@                                        2        
           32768 |                                         0        


NFSv3 read/write by host (total us):

  192.168.17.75                                                414458

NFSv3 read/write top 10 files (total us):

  /export/stuff/bin/cal                                         11225
  /export/stuff/bin/cjpeg                                       11947
  /export/stuff/bin/charmap                                     12347
  /export/stuff/bin/cdda2wav.bin                                13449
  /export/stuff/bin/chkey                                       13963
  /export/stuff/bin/cputrack                                    14533
  /export/stuff/bin/catman                                      15535
  /export/stuff/bin/csslint-0.6                                 18302
  /export/stuff/bin/col                                         19926
  /export/stuff/bin/cdrecord.bin                                40622

The first peak is likely to be NFS operations hitting the memory cache, and the second those that missed and went to disk. Further use of DTrace can confirm this theory.

The fields from the distribution plot are:

Field	Description
`value`	Minimum elapsed time for this event in microseconds
`count`	Number of events in this time range

`nfsv3io.d`

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv3:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv3:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.data.data_len);
}


nfsv3:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.168.17.75 calling NFSv3 reads and writes:

# ./nfsv3io.d
Tracing... Interval 5 secs.

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.17.75                        27      1    686     40      100

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.17.75                         0      0      0      0        8

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   0.0.0.0                               0      0      0      0        0
   192.168.17.75                         2      0     28      0       18
^C

Other details can be calculated from the output, such as average read and write size (eg, 686(kr/s) / 27(r/s) = 25.4 average kr). These could also be added to the script to be printed as columns.

The fields printed are:

Field	Description
`Client`	Remote client IP address
`r/s`	reads per second
`w/s`	writes per second
`kr/s`	kilobytes read per second
`kw/s`	kilobytes written per second
`ops/s`	Total NFSv3 operations per second (including the reads and writes)

`nfsv4` Provider

The nfsv4 provider provides probes for tracing NFS v4 server activity.

Arguments

All NFS operation probes have the first argument in common:

        args[0]        conninfo_t *        socket connection information

The conninfo_t structure is already used by the iSCSI target provider (iscsi), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).

    typedef struct conninfo {
        string ci_local;        /* local host address */
        string ci_remote;       /* remote host address */
        string ci_protocol;     /* protocol (ipv4, ipv6, etc) */
    } conninfo_t;

Operation probes have their second argument in common:

    args[1]        nfsv4opinfo_t *        NFS v4 operation properties

    typedef struct nfsv4opinfo {
        string noi_curpath;    /* current file handle path (if any) */
        cred_t *noi_cred;    /* credentials */
        uint64_t noi_xid;    /* transaction ID */
    } nfsv4opinfo_t;

Callback operation probes have their second argument in common:

    args[1]        nfsv4cbinfo_t *        NFS v4 callback properties

    typedef struct nfsv4cbinfo {
        string nci_curpath;    /* file handle path (if any) */
    } nfsv4cbinfo_t;

Probes

Below is a list of the top level operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name	`args[2]`
`nfsv4:::compound-op-start`	`COMPOUND4args *`
`nfsv4:::compound-op-done`	`COMPOUND4res *`

Below is a list of operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name	`args[2]`
`nfsv4:::op-access-start`	`ACCESS4args *`
`nfsv4:::op-access-done`	`ACCESS4res *`
`nfsv4:::op-close-start`	`CLOSE4args *`
`nfsv4:::op-close-done`	`CLOSE4res *`
`nfsv4:::op-commit-start`	`COMMIT4args *`
`nfsv4:::op-commit-done`	`COMMIT4res *`
`nfsv4:::op-create-start`	`CREATE4args *`
`nfsv4:::op-create-done`	`CREATE4res *`
`nfsv4:::op-delegpurge-start`	`DELEGPURGE4args *`
`nfsv4:::op-delegpurge-done`	`DELEGPURGE4res *`
`nfsv4:::op-delegreturn-start`	`DELEGRETURN4args *`
`nfsv4:::op-delegreturn-done`	`DELEGRETURN4res *`
`nfsv4:::op-getattr-start`	`GETATTR4args *`
`nfsv4:::op-getattr-done`	`GETATTR4res *`
`nfsv4:::op-getfh-start`	`GETFH4args *`
`nfsv4:::op-getfh-done`	`GETFH4res *`
`nfsv4:::op-link-start`	`LINK4args *`
`nfsv4:::op-link-done`	`LINK4res *`
`nfsv4:::op-lock-start`	`LOCK4args *`
`nfsv4:::op-lock-done`	`LOCK4res *`
`nfsv4:::op-lockt-start`	`LOCKT4args *`
`nfsv4:::op-lockt-done`	`LOCKT4res *`
`nfsv4:::op-locku-start`	`LOCKU4args *`
`nfsv4:::op-locku-done`	`LOCKU4res *`
`nfsv4:::op-lookup-start`	`LOOKUP4args *`
`nfsv4:::op-lookup-done`	`LOOKUP4res *`
`nfsv4:::op-lookupp-start`	`LOOKUPP4args *`
`nfsv4:::op-lookupp-done`	`LOOKUPP4res *`
`nfsv4:::op-nverify-start`	`NVERIFY4args *`
`nfsv4:::op-nverify-done`	`NVERIFY4res *`
`nfsv4:::op-open-start`	`OPEN4args *`
`nfsv4:::op-open-done`	`OPEN4res *`
`nfsv4:::op-open-confirm-start`	`OPEN_CONFIRM4args *`
`nfsv4:::op-open-confirm-done`	`OPEN_CONFIRM4res *`
`nfsv4:::op-open-downgrade-start`	`OPEN_DOWNGRADE4args *`
`nfsv4:::op-open-downgrade-done`	`OPEN_DOWNGRADE4args *`
`nfsv4:::op-openattr-start`	`OPENATTR4args *`
`nfsv4:::op-openattr-done`	`OPENATTR4res *`
`nfsv4:::op-putfh-start`	`PUTFH4args *`
`nfsv4:::op-putfh-done`	`PUTFH4res *`
`nfsv4:::op-putpubfh-start`	`PUTPUBFH4args *`
`nfsv4:::op-putpubfh-done`	`PUTPUBFH4res *`
`nfsv4:::op-putrootfh-start`	`PUTROOTFH4args *`
`nfsv4:::op-putrootfh-done`	`PUTROOTFH4res *`
`nfsv4:::op-read-start`	`READ4args *`
`nfsv4:::op-read-done`	`READ4res *`
`nfsv4:::op-readdir-start`	`READDIR4args *`
`nfsv4:::op-readdir-done`	`READDIR4res *`
`nfsv4:::op-readlink-start`	`READLINK4args *`
`nfsv4:::op-readlink-done`	`READLINK4res *`
`nfsv4:::op-release-lockowner-start`	`RELEASE_LOCKOWNER4args *`
`nfsv4:::op-release-lockowner-done`	`RELEASE_LOCKOWNER4res *`
`nfsv4:::op-remove-start`	`REMOVE4args *`
`nfsv4:::op-remove-don`	`REMOVE4res *`
`nfsv4:::op-rename-start`	`RENAME4args *`
`nfsv4:::op-rename-done`	`RENAME4res *`
`nfsv4:::op-renew-start`	`RENEW4args *`
`nfsv4:::op-renew-done`	`RENEW4res *`
`nfsv4:::op-restorefh-start`	<none>
`nfsv4:::op-restorefh-done`	<none>
`nfsv4:::op-savefh-start`	`SAVEFH4args *`
`nfsv4:::op-savefh-done`	`SAVEFH4res *`
`nfsv4:::op-secinfo-start`	`SECINFO4args *`
`nfsv4:::op-secinfo-done`	`SECINFO4res *`
`nfsv4:::op-setattr-start`	`SETATTR4args *`
`nfsv4:::op-setattr-done`	`SETATTR4res *`
`nfsv4:::op-setclientid-start`	`SETCLIENTID4args *`
`nfsv4:::op-setclientid-done`	`SETCLIENTID4res *`
`nfsv4:::op-setclientid-confirm-start`	`SETCLIENTID_CONFIRM4args *`
`nfsv4:::op-setclientid-confirm-done`	`SETCLIENTID_CONFIRM4res *`
`nfsv4:::op-verify-start`	`VERIFY4args *`
`nfsv4:::op-verify-done`	`VERIFY4res *`
`nfsv4:::op-write-start`	`WRITE4args *`
`nfsv4:::op-write-done`	`WRITE4res *`

Callback compound probes have an undefined second argument; this slot is reserved for future use.

Below is a list of the top level callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name	`args[2]`
`nfsv4:::compound-cb-start`	`CB_COMPOUND4args *`
`nfsv4:::compound-cb-done`	`CB_COMPOUND4res *`

Below is a list of callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:

Probe Name	`args[2]`
`nfsv4:::cb-getattr-start`	`CB_GETATTR4args*`
`nfsv4:::cb-getattr-done`	`CB_GETATTR4res *`
`nfsv4:::cb-recall-start`	`CB_RECALL4args *`
`nfsv4:::cb-recall-done`	`CB_RECALL4res *`

Note - Since the Solaris NFS v4 implementation does not yet use the 'getattr' callback, the probe will not be implemented; it is noted here in anticipation of a future implementation.

Examples

Some simple examples of nfsv4 provider usage follow.

`nfsv4rwsnoop.d`

This DTrace script traces NFSv4 reads and writes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)",
            "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME");
}

nfsv4:::op-read-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count,
            args[1]->noi_curpath);
}

nfsv4:::op-write-start
{
        printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000,
            args[0]->ci_remote, "W", args[2]->offset / 1024, args[2]->data_len,
            args[1]->noi_curpath);
}

This output shows a few files were read, and one was written:

# ./nfsv4rwsnoop.d 
TIME(us)         CLIENT             OP OFFSET    BYTES PATHNAME
156889725960     192.168.1.109       R 0          4096 /export/share/bin/nawk
156889735515     192.168.1.109       R 4         28672 /export/share/bin/nawk
156889736298     192.168.1.109       R 32        32768 /export/share/bin/nawk
156889736544     192.168.1.109       R 96        32768 /export/share/bin/nawk
156889736902     192.168.1.109       R 64        32768 /export/share/bin/nawk
156916061653     192.168.1.109       R 0          4096 /export/share/bin/ssh
156916069375     192.168.1.109       R 4         28672 /export/share/bin/ssh
156916070098     192.168.1.109       R 32        32768 /export/share/bin/ssh
156916070435     192.168.1.109       R 96        32768 /export/share/bin/ssh
156916070758     192.168.1.109       R 64        32768 /export/share/bin/ssh
156916071036     192.168.1.109       R 128       32768 /export/share/bin/ssh
156916071352     192.168.1.109       R 160       32768 /export/share/bin/ssh
156916071582     192.168.1.109       R 192       32768 /export/share/bin/ssh
156916071696     192.168.1.109       R 72         4096 /export/share/bin/ssh
156916080508     192.168.1.109       R 224        4096 /export/share/bin/ssh
156916080844     192.168.1.109       R 228       28672 /export/share/bin/ssh
156916081566     192.168.1.109       R 256       32768 /export/share/bin/ssh
156916081833     192.168.1.109       R 288       32768 /export/share/bin/ssh
156916082237     192.168.1.109       R 320       20480 /export/share/bin/ssh
156933373074     192.168.1.109       W 0         32768 /export/share/words
156933373351     192.168.1.109       W 32        32768 /export/share/words
156933373855     192.168.1.109       W 64        32768 /export/share/words
156933374185     192.168.1.109       W 96        32768 /export/share/words
156933375442     192.168.1.109       W 128       32768 /export/share/words
156933375864     192.168.1.109       W 160       32768 /export/share/words
156933376105     192.168.1.109       W 192       10055 /export/share/words

The fields printed are:

Field	Description
`TIME`(us)	Time of event in microseconds
`CLIENT`	Remote client IP address
`OP`	`R` == read, `W` == write
`OFFSET`	File offset of I/O, in Kbytes
`BYTES`	Bytes of I/O
`PATHNAME`	Pathname of file, if known

`nfsv4ops.d`

This DTrace script counts NFSv4 operations by client, printing a summary every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Interval 5 secs.\n");
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote, probename] = count();
}

profile:::tick-5sec,
dtrace:::END
{
        printf("\n   %-32s %-28s %8s\n", "Client", "Operation", "Count");
        printa("   %-32s %-28s %@8d\n", @ops);
        trunc(@ops);
}

The following output shows which client is sending which NFSv4 operations:

# ./nfsv4ops.d 
Tracing... Interval 5 secs.

   Client                           Operation                       Count
   192.168.1.109                    op-getattr-done                     1
   192.168.1.109                    op-getattr-start                    1
   192.168.1.109                    op-putfh-done                       1
   192.168.1.109                    op-putfh-start                      1

   Client                           Operation                       Count
   192.168.1.109                    op-access-done                      1
   192.168.1.109                    op-access-start                     1
   192.168.1.109                    op-close-done                       1
   192.168.1.109                    op-close-start                      1
   192.168.1.109                    op-getfh-done                       1
   192.168.1.109                    op-getfh-start                      1
   192.168.1.109                    op-open-done                        1
   192.168.1.109                    op-open-start                       1
   192.168.1.109                    op-getattr-done                     3
   192.168.1.109                    op-getattr-start                    3
   192.168.1.109                    op-read-done                        9
   192.168.1.109                    op-read-start                       9
   192.168.1.109                    op-putfh-done                      12
   192.168.1.109                    op-putfh-start                     12
^C

   Client                           Operation                       Count

The fields printed are:

Field	Description
`Client`	Remote client IP address
`Operation`	NFSv4 operation, described using the nfsv4 provider probename
`Count`	Operations during this interval

`nfsv4fileio.d`

This DTrace script prints a summary of file read and write bytes:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        trace("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-done
{
        @readbytes[args[1]->noi_curpath] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writebytes[args[1]->noi_curpath] = sum(args[2]->count);
}

dtrace:::END
{
        printf("\n%12s %12s  %s\n", "Rbytes", "Wbytes", "Pathname");
        printa("%@12d %@12d  %s\n", @readbytes, @writebytes);
}

This output shows a few files were read, and one was written:

# ./nfsv4fileio.d 
Tracing... Hit Ctrl-C to end.
^C

      Rbytes       Wbytes  Pathname
           0       206663  /export/share/words1
       24528            0  /export/share/bin/xargs
       44864            0  /export/share/bin/ed
      232476            0  /export/share/bin/vi

The fields printed are:

Field	Description
`Rbytes`	Bytes read for this pathname
`Wbytes`	Bytes written to this pathname
`Pathname`	Pathname of NFS file

`nfsv4rwtime.d`

This DTrace script prints a summary NFSv4 read and write elapsed times (latencies), along with other details:

#!/usr/sbin/dtrace -s

#pragma D option quiet

inline int TOP_FILES = 10;

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

nfsv4:::op-read-start,
nfsv4:::op-write-start
{
        start[args[1]->noi_xid] = timestamp;
}

nfsv4:::op-read-done,
nfsv4:::op-write-done
{
        this->elapsed = timestamp - start[args[1]->noi_xid];
        @rw[probename == "op-read-done" ? "read" : "write"] =
            quantize(this->elapsed / 1000);
        @host[args[0]->ci_remote] = sum(this->elapsed);
        @file[args[1]->noi_curpath] = sum(this->elapsed);
        start[args[1]->noi_xid] = 0;
}

dtrace:::END
{
        printf("NFSv4 read/write distributions (us):\n");
        printa(@rw);

        printf("\nNFSv4 read/write by host (total us):\n");
        normalize(@host, 1000);
        printa(@host);

        printf("\nNFSv4 read/write top %d files (total us):\n", TOP_FILES);
        normalize(@file, 1000);
        trunc(@file, TOP_FILES);
        printa(@file);
}

This output below shows a peak in the read time distribution plot in the 64 to 127 microsecond range, and a second peak between 2 and 8 milliseconds:

# ./nfsv4rwtime.d 
Tracing... Hit Ctrl-C to end.
^C
NFSv4 read/write distributions (us):

  write                                             
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@                                   1        
             128 |@@@@@@@@@@@                              2        
             256 |@@@@@@@@@@@@@@@@@                        3        
             512 |@@@@@@                                   1        
            1024 |                                         0        

  read                                              
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@                                     6        
              64 |@@@@@@@@@@@@                             17       
             128 |@                                        1        
             256 |@@                                       3        
             512 |@                                        1        
            1024 |@@                                       3        
            2048 |@@@@@@@@                                 12       
            4096 |@@@@@@@@@@                               15       
            8192 |@                                        1        
           16384 |                                         0        


NFSv4 read/write by host (total us):

  192.168.1.109                                                148215

NFSv4 read/write top 10 files (total us):

  /export/share/bin/man                                          5020
  /export/share/bin/makeuuid                                     5132
  /export/share/bin/mc68030                                      5836
  /export/share/bin/m4                                           6446
  /export/share/bin/msgfmt                                       6669
  /export/share/bin/mkmsgs                                       6674
  /export/share/bin/mailstats                                    6935
  /export/share/bin/mkdir                                        7009
  /export/share/bin/mac                                          7693
  /export/share/bin/make                                        27903

Other details are printed, such as total read/write latency by host, and the top 10 files by latency.

The first peak in the read distribution is likely to be NFS operations hitting the memory cache, and the second those that missed and read from disk. The writes were all fast as they are likely to written to the memory cache and returned asynchronously. Further use of DTrace can confirm these theories.

The fields from the distribution plot are:

Field	Description
`value`	Minimum elapsed time for this event in microseconds
`count`	Number of events in this time range

`nfsv4io.d`

This is a simple DTrace script to provide basic I/O details by host every 5 seconds:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        interval = 5;
        printf("Tracing... Interval %d secs.\n", interval);
        tick = interval;
}

nfsv4:::op-*
{
        @ops[args[0]->ci_remote] = count();
}

nfsv4:::op-read-done
{
        @reads[args[0]->ci_remote] = count();
        @readbytes[args[0]->ci_remote] = sum(args[2]->data_len);
}


nfsv4:::op-write-done
{
        @writes[args[0]->ci_remote] = count();
        @writebytes[args[0]->ci_remote] = sum(args[2]->count);
}

profile:::tick-1sec
/tick-- == 0/
{
        normalize(@ops, interval);
        normalize(@reads, interval);
        normalize(@writes, interval);
        normalize(@writebytes, 1024 * interval);
        normalize(@readbytes, 1024 * interval);
        printf("\n   %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s",
            "kr/s", "kw/s", "ops/s");
        printa("   %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes,
            @readbytes, @writebytes, @ops);
        trunc(@ops);
        trunc(@reads);
        trunc(@writes);
        trunc(@readbytes);
        trunc(@writebytes);
        tick = interval;
}

This output shows 192.168.1.109 calling NFSv4 reads and writes:

# ./nfsv4io.d 
Tracing... Interval 5 secs.

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                        17      1    331     40      290

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                         9      0    197      0      152

   Client                              r/s    w/s   kr/s   kw/s    ops/s
   192.168.1.109                        16      0    269      0      363

   Client                              r/s    w/s   kr/s   kw/s    ops/s
^C

Other details can be calculated from the output, such as average read and write size (eg, 331(kr/s) / 17(r/s) = 19.5 average kr). These could also be added to the script to be printed as columns.

The fields printed are:

Field	Description
`Client`	Remote client IP address
`r/s`	Reads per second
`w/s`	Writes per second
`kr/s`	Kilobytes read per second
`kw/s`	Kilobytes written per second
`ops/s`	Total NFSv4 operations per second (including the reads and writes)

`srp` Provider

The srp provider provides probes for tracing srp port provider activity.

This is a kernel provider built into the COMSTAR srp target port provider.

Probes

Probes Overview

Header	Header
`Service up/down`	`srp:::service-up`, `srp:::service-down`
`Remote Port login/logout`	`srp:::login-command`, `srp:::login-response`, `srp:::logout-command`
`SRP command/response`	`srp:::task-command`, `srp:::task-response`
`SCSI command/response`	`srp:::scsi-command`, `srp:::scsi-response`
`Data transfer`	`srp:::xfer-start`, `srp:::xfer-done`

For all of the providers below, string fields which are not known contain the string "<unknown>". Integer fields which are not known contain 0.

Service up/down Event Probes

srp:::service-up and srp:::service-down trace SRP target online and offline events. Remote port information (ci_remote) is unavailable for both probes.

Probes	Variable	Type	Description
`srp:::service-up` `srp:::service-down`	`args[0]`	`conninfo_t *`	connection information
`srp:::service-up` `srp:::service-down`	`args[1]`	`srp_portinfo_t *`	local and remote port information

Remote Port Login/Logout Event Probes

Probes	Variable	Type	Description
`srp:::login-command` `srp:::login-response` `srp:::logout-command`	`args[0]`	`conninfo_t *`	connection information
`srp:::login-command` `srp:::login-response` `srp:::logout-command`	`args[1]`	`srp_portinfo_t *`	local and remote port information
`srp:::login-command` `srp:::login-response`	`args[2]`	`srp_logininfo_t *`	login command/response information

SRP Command Event Probes

Probes	Variable	Type	Description
`srp:::task-command` `srp:::task-response`	`args[0]`	`conninfo_t *`	connection information
`srp:::task-command` `srp:::task-response`	`args[1]`	`srp_portinfo_t *`	local and remote port information
`srp:::scsi-response` `srp:::scsi-command`	`args[2]`	`srp_taskinfo_t *`	srp task information

SCSI Command Event Probes

Probes	Variable	Type	Description
`srp:::scsi-command` `srp:::scsi-response`	`args[0]`	`conninfo_t *`	connection information
`srp:::scsi-command` `srp:::scsi-response`	`args[1]`	`srp_portinfo_t *`	local and remote port information
`srp:::scsi-command`	`args[2]`	`scsicmd_t *`	SCSI command block (cdb)
`srp:::scsi-response`	`args[2]`	`srp_taskinfo_t *`	srp task information
`srp:::scsi-command`	`args[3]`	`srp_taskinfo_t *`	srp task information

Data Transfer Probes

Probes	Variable	Type	Description
`srp:::xfer-start` `srp:::xfer-done`	`args[0]`	`conninfo_t *`	connection information
`srp:::xfer-start` `srp:::xfer-done`	`args[1]`	`fc_port_info_t *`	local port information
`srp:::xfer-start` `srp:::xfer-done`	`args[2]`	`xferinfo_t *`	RDMA transfer information
`srp:::xfer-start` `srp:::xfer-done`	`args[3]`	`srp_taskinfo_t *`	srp task information

Types

scsicmd_t, conninfo_t and xferinfo_t are common types which are used by other providers.

`scsicmd_t`

typedef struct scsicmd {
        uint64_t ic_len;        /* CDB length */
        uint8_t  *ic_cdb;       /* CDB data */
} scsicmd_t;

`conninfo_t`

typedef struct conninfo {
        string ci_local;        /* GID of the local HCA */
        string ci_remote;       /* GID of the remote HCA */
        string ci_protocol;     /* protocol ("ib") */
} conninfo_t;

`srp_portinfo_t`

typedef struct srp_portinfo {
        /* initiator */
        string  pi_initiator;   /* Initiator: eui.xxxxxxxxxxxxxxx */
        string  pi_i_sid;       /* Initiator seiion id */

        /* target */
        string  pi_target;      /* Target: eui.xxxxxxxxxxxxxxx */
        string  pi_t_sid;       /* Target session id */

        uintptr_t pi_chan_id;   /* Channel identifier */
} srp_portinfo_t;

`srp_logininfo_t`

typedef struct srp_logininfo {
        uint64_t li_task_tag;      /* SRP task tag */
        uint32_t li_max_it_iu_len; /* Maxium iu length that initiator can
        send to target */
        uint32_t li_max_ti_iu_len; /* Maxium iu length that target can
        send to initiator */
        uint32_t li_request_limit; /* Maximun number of SRP requests
        that initiator can send on a channel */
        uint32_t reason_code;      /* Reason code */
} srp_logininfo_t;

`srp_taskinfo_t`

typedef struct srp_taskinfo {
        uint64_t ti_task_tag;   /* SRP task tag */
        uint64_t ti_lun;        /* Target logical unit number */
        uint8_t  ti_function;   /* Task management function */
        uint32_t ti_req_limit_delta; /* Increment of channel's request limit */
        uint8_t  ti_flag;            /* bit 2:DOOVER 3:DOUNDER 4:DIOVER 5:DIUNDER */
        uint32_t ti_do_resid_cnt;    /* Data-out residual count */
        uint32_t ti_di_resid_cnt;    /* Data-in residual count */
        uint8_t  ti_status;     /* Status of this task */
} srp_taskinfo_t;

`xferinfo_t`

typedef struct xferinfo {
        uintptr_t xfer_laddr;        /* Local buffer address */
        uint32_t  xfer_loffset;      /* Relative offset from local buffer */
        uint32_t  xfer_lkey;         /* Access control to local memory */
        uintptr_t xfer_raddr;        /* Remote virtual address */
        uint32_t  xfer_roffset;      /* Offset from the remote address */
        uint32_t  xfer_rkey;         /* Access control to remote address */
        uint32_t  xfer_len;          /* Transfer length */
        uint8_t   xfer_type;         /* 0: read; 1: write; */
} xferinfo_t;

Examples

`service.d`

This is a simple script to produce a report of target online/offline events.

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n\n");
        printf("%-14s   %-35s %-20s\n", "SRP EVENT", \
"LOCAL PORT", "EUI NAME");
};

srp:::service-up
{
        printf("%-14s   %-35s %-20s\n", probename, \
args[0]->ci_local, args[1]->pi_target);
}

srp:::service-down
{
        printf("%-14s   %-35s %-20s\n", probename, \
args[0]->ci_local, args[1]->pi_target);
}

This output shows the host and the number of iSCSI operations:

thump1## dtrace -s ~/src/service.d
Tracing... Hit Ctrl-C to end.
^C
SRP EVENT        LOCAL PORT                          EUI NAME
service-down     fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-down     fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d31   eui.0003BA0001004D30
service-up       fe80000000000000:0003ba0001004d32   eui.0003BA0001004D30
thump1##

The fields are:

Field	Description
`SRP EVENT`	srp event type
`LOCAL PORT`	GID of the local port
`EUI NAME`	EUI name of the local port

`srpwho.d`

This is a simple script to produce a report of the remote HCA port and a count of srp events. This is intended to provide a quick summary of srp activity when run on the SRP target server:

#!/usr/sbin/dtrace -s

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

srp:::login-command,
srp:::login-response,
srp:::task-command,
srp:::task-response,
srp:::scsi-command,
srp:::scsi-response,
srp:::xfer-start,
srp:::xfer-done
{
        @events[args[0]->ci_remote, probename] = count();
}

dtrace:::END
{
        printf("   %-33s %14s %8s\n", "REMOTE GID", "iSCSI EVENT", "COUNT");
        printa("   %-33s %14s %@8d\n", @events);
}

This output shows the host and the number of iSCSI operations:

thump1## dtrace -s ./srpwho.d
Tracing... Hit Ctrl-C to end.
^C
   REMOTE GID                           iSCSI EVENT    COUNT
   fe80000000000000:0003ba000100386d  login-command        1
   fe80000000000000:0003ba000100386d login-response        1
   fe80000000000000:0003ba0001003851  login-command        2
   fe80000000000000:0003ba0001003851 login-response        2
   fe80000000000000:0003ba0001003852  login-command        2
   fe80000000000000:0003ba0001003852 login-response        2
   fe80000000000000:0003ba0001004d32      xfer-done        9
   fe80000000000000:0003ba0001004d32     xfer-start        9
   fe80000000000000:0003ba0001004d31      xfer-done       18
   fe80000000000000:0003ba0001004d31     xfer-start       18
   fe80000000000000:0003ba0001004d32   scsi-command       22
   fe80000000000000:0003ba0001004d32  scsi-response       22
   fe80000000000000:0003ba0001004d32   task-command       22
   fe80000000000000:0003ba0001004d32  task-response       22
   fe80000000000000:0003ba0001004d31   scsi-command       42
   fe80000000000000:0003ba0001004d31  scsi-response       42
   fe80000000000000:0003ba0001004d31   task-command       42
   fe80000000000000:0003ba0001004d31  task-response       42

The fields are:

Field	Description
`REMOTE GID`	GID of the client HCA port
`SRP EVENT`	srp event type
`COUNT`	Number of events traced

`srpsnoop.d`

This is a simple script to snoop srp events when run on a srp target server.

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

dtrace:::BEGIN
{
        printf("%17s %3s  %-40s %-14s %6s %10s  %6s\n", "TIMESTAMP",
            "CPU", "REMOTE GID", "EVENT", "BYTES", "TAG", "SCSIOP");

        /*
         * SCSI opcode to string translation hash. This is from
         * /usrp/include/sys/scsi/generic/commands.h. If you would
         * rather all hex, comment this out.
         */
        scsiop[0x08] = "read";
        scsiop[0x0a] = "write";
        scsiop[0x0b] = "seek";
        scsiop[0x28] = "read(10)";
        scsiop[0x2a] = "write(10)";
        scsiop[0x2b] = "seek(10)";
}

srp:::login-*
{
        printf("%17d %3d  %-40s %-14s %17d  -\n", timestamp, cpu, 
                         args[0]->ci_remote,
            probename, args[2]->li_task_tag);
}
srp:::task-command,
srp:::task-response,
srp:::scsi-response
{
        printf("%17d %3d  %-40s %-14s %6d %10d  -\n", timestamp, cpu,  
                    args[0]->ci_remote,
            probename, 0, args[2]->ti_task_tag);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] != NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  %s\n", timestamp, cpu,  
                        args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, scsiop[args[2]->ic_cdb[0]]);
}

srp:::scsi-command
/scsiop[args[2]->ic_cdb[0]] == NULL/
{
        printf("%17d %3d  %-40s %-14s %6d %10d  0x%x\n", timestamp, cpu,
                    args[0]->ci_remote,
            probename, 0, args[3]->ti_task_tag, args[2]->ic_cdb[0]);
}

srp:::xfer-start,
srp:::xfer-done
{
        printf("%17d %3d  %-40s %-14s %6d %10d   %s\n", timestamp, 
                    cpu,  args[0]->ci_remote,
            probename,args[2]->xfer_len, args[3]->ti_task_tag,
            args[2]->xfer_type > 0 ? "READ" : "WRITE");
}

This output shows the snoop on dd commands executed by the initiator.

thump1## dtrace -s ./srpsnoop.d
TIMESTAMP       CPU  REMOTE GID                          EVENT           BYTES   TAG  SCSIOP
22644410404019   3  fe80000000000000:0003ba0001004d31   task-command        0   26  -
22644410493068   3  fe80000000000000:0003ba0001004d31   scsi-command        0   26  read(10)
22644410511422   3  fe80000000000000:0003ba0001004d31   task-command        0   30  -
22644410541494   3  fe80000000000000:0003ba0001004d31   scsi-command        0   30  read(10)
22644410621049   0  fe80000000000000:0003ba0001004d31   xfer-start       2048   26   READ
22644410720486   1  fe80000000000000:0003ba0001004d31   xfer-start      49152   30   READ
22644410681390   3  fe80000000000000:0003ba0001004d31   xfer-done        2048   26   READ
22644410694719   3  fe80000000000000:0003ba0001004d31   scsi-response       0   26  -
22644410703358   3  fe80000000000000:0003ba0001004d31   task-response       0   26  -
22644410895424   3  fe80000000000000:0003ba0001004d31   xfer-done       49152   30   READ
22644410901576   3  fe80000000000000:0003ba0001004d31   scsi-response       0   30  -
22644410905717   3  fe80000000000000:0003ba0001004d31   task-response       0   30  -
22727363721107   3  fe80000000000000:0003ba0001004d31   task-command        0   59  -
22727363919179   0  fe80000000000000:0003ba0001004d31   xfer-start      10240   59   WRITE
22727364095164   0  fe80000000000000:0003ba0001004d31   scsi-response       0   59  -
22727364105406   0  fe80000000000000:0003ba0001004d31   task-response       0   59  -
22727363812953   3  fe80000000000000:0003ba0001004d31   scsi-command        0   59  write(10)
22727363986185   3  fe80000000000000:0003ba0001004d31   xfer-done       10240   59   WRITE

The fields are:

Field	Description
`CPU`	CPU event occurred on
`REMOTE GID`	GID of the client HCA port
`EVENT`	srp event type
`BYTES`	Data bytes
`TAG`	Initiator task tag
`SCSIOP`	SCSI opcode as a description, as hex, or '-'

`tcp` Provider

he tcp provider provides probes for tracing the TCP protocol.

Probes

The tcp probes are described in the table below.

Table 11-40 tcp Probes

Probe	Description
`state-change`	Probe that fires a TCP session changes its TCP state. Previous state is noted in the `tcplsinfo_t ` probe argument. The `tcpinfo_t ` and `ipinfo_t *` arguments are `NULL`.
`send`	Probe that fires whenever TCP sends a segment (either control or data).
`receive`	Probe that fires whenever TCP receives a segment (either control or data).
`connect-request`	Probe that fires when a TCP active open is initiated by sending an initial `SYN` segment. The `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers associated with the initial `SYN` segment sent.
connect-established	This probe fires when either of the following occurs: either a TCP active OPEN succeeds - the initial `SYN` has been sent and a valid `SYN,ACK` segment has been received in response. TCP enters the ESTABLISHED state, and the `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers associated with the `SYN,ACK` segment received; or a simultaneous active OPEN succeeds and a final `ACK` is received from the peer TCP. TCP has entered the ESTABLISHED state and the `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers of the final `ACK` received. The common thread in these cases is that an active-OPEN connection is established at this point, in contrast with `tcp:::accept-established` which fires on passive connection establishment. In both cases above, the TCP segment that is presented via the `tcpinfo_t *` is the segment that triggers the transition to ESTABLISHED - the received `SYN,ACK` in the first case and the final `ACK` segment in the second.
connect-refused	A TCP active OPEN connection attempt was refused by the peer - a `RST` segment was received in acknowledgment of the initial `SYN`. The `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers associated with the `RST,ACK` segment received.
accept-established	A passive open has succeeded - an initial active OPEN initiation `SYN` has been received, TCP responded with a `SYN,ACK` and a final `ACK` has been received. TCP has entered the ESTABLISHED state. The `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers associated with the final `ACK` segment received.
accept-refused	An incoming `SYN` has arrived for a destination port with no listening connection, so the connection initiation request is rejected by sending a `RST` segment `ACK`ing the `SYN`. The `tcpinfo_t ` and `ipinfo_t ` probe arguments represent the TCP and IP headers associated with the `RST` segment sent.

The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.

Arguments

The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have 5 arguments - state-change has 6.

Probe	`args[0]`	`args[1]`	`args[2]`	`args[3]`	`args[4]`	`args[5]`
`state-change`	`null`	`csinfo_t *`	`null`	`tcpsinfo_t *`	`null`	`tcplsinfo_t *`
`send`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`receive`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`connect-request`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`connect-established`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`connect-refused`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`accept-established`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`
`accept-refused`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`tcpsinfo_t *`	`tcpinfo_t *`

`pktinfo_t` Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;

`csinfo_t` Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
    uint64_t cs_cid;
    pid_t cs_pid;
    zoneid_t cs_zoneid;
 } csinfo_t;

`cs_addr`	Address of translated `ip_xmit_attr_t *`.
`cs_cid`	Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
`cs_pid`	Process ID associated with the connection.
`cs_zoneid`	Zone ID associated with the connection.

`ipinfo_t` Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 11-41 ipinfo_t Members

`ip_ver`	IP version number. Currently either 4 or 6.
`ip_plength`	Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
`ip_saddr`	Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
`ip_daddr`	Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.

`tcpsinfo_t` Structure

The tcpsinfo_t structure contains tcp state info.

typedef struct tcpsinfo {
        uintptr tcps_addr;
        int tcps_local;       /* is delivered locally, boolean */
        int tcps_active;       /* active open (from here), boolean */
        uint16_t tcps_lport;      /* local port */
        uint16_t tcps_rport;      /* remote port */
string tcps_laddr;        /* local address, as a string */
string tcps_raddr;        /* remote address, as a string */
int32_t tcps_state;/* TCP state. Use inline tcp_state_string[]to convert to string */
        uint32_t tcps_iss;     /* initial sequence # sent */
        uint32_t tcps_suna;     /* sequence # sent but unacked */
        uint32_t tcps_snxt;     /* next sequence # to send */
        uint32_t tcps_rack;     /* sequence # we have acked */
        uint32_t tcps_rnxt;     /* next sequence # expected */
        uint32_t tcps_swnd;     /* send window size */
        uint32_t tcps_snd_ws;   /* send window scaling */
        uint32_t tcps_rwnd;     /* receive window size */
        uint32_t tcps_rcv_ws;   /* receive window scaling */
    uint32_t tcps_cwnd;        /* congestion window */
    uint32_t tcps_cwnd_ssthresh;    /* threshold for congestion avoidance */
    uint32_t tcps_sack_fack;    /* SACK sequence # we have acked */
    uint32_t tcps_sack_snxt;    /* next SACK seq # for retransmission */
        uint32_t tcps_rto;              /* round-trip timeout, msec */
    uint32_t tcps_mss;        /* max segment size */
        int tcps_retransmit;            /* retransmit send event, boolean */
} tcpsinfo_t;

It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, we need this data here.

Table 11-42 tcpsinfo_t Members

`tcps_addr`	Address of translated `tcp_t *`.
`tcps_local`	is local, boolean. 0: is not delivered locally (uses a physical network interface), 1: is delivered locally (including loopback interfaces, eg lo0).
`tcps_active`	is an active open, boolean. 0: TCP connection was created from a remote host, 1: TCP connection was created from this host.
`tcps_lport`	local port associated with the TCP connection.
`tcps_rport`	remote port associated with the TCP connection.
`tcps_laddr`	local address associated with the TCP connection, as a string.
`tcps_raddr`	remote address associated with the TCP connection, as a string.
`tcps_state`	TCP state. Inline definitions are provided for the various TCP states: `TCP_STATE_CLOSED`, `TCP_STATE_SYN_SENT`, etc. Use inline `tcp_state_string`[] to convert state to a string.
`tcps_iss`	Initial sequence number sent.
`tcps_suna`	Lowest sequence number for which we have sent data but not received acknowledgement.
`tcps_snxt`	Next sequence number to send. `tcps_snxt` - `tcps_suna` gives the number of bytes pending acknowledgement for the TCP connection.
`tcps_rack`	Highest sequence number for which we have received and sent acknowledgement.
`tcps_rnxt`	Next sequence number expected on receive side. `tcps_rnxt` - `tcps_rack` gives the number of bytes we have received but not yet acknowledged for the TCP connection.
`tcps_swnd`	TCP send window size.
`tcps_snd_ws`	TCP send window scale. `tcps_swnd` << `tcp_snd_ws` gives the scaled window size if window scaling options are in use.
`tcps_rwnd`	TCP receive window size.
`tcps_rcv_ws`	TCP receive window scale. `tcps_rwnd` << `tcp_rcv_ws` gives the scaled window size if window scaling options are in use.
`tcps_cwnd`	TCP congestion window size. `tcps_cwnd_ssthresh` TCP congestion window threshold. When the congestion window is greater than `ssthresh`, congestion avoidance begins.
tcps_cwnd_ssthresh	TCP congestion window threshold. When the congestion window is greater than `ssthresh`, congestion avoidance begins.
tcps_sack_fack	Highest `SACK-acked` sequence number.
tcps_sack_snxt	Next sequence num to be retransmitted using SACK.
tcps_rto	Round-trip timeout. If we do not receive acknowledgement of data sent `tcps_rto msec` ago, retransmit is required.
tcps_mss	Maximum segment size.
tcps_retransmit	send is a retransmit, boolean. 1 for `tcp:::send` events that are retransmissions, 0 for `tcp` events that are not send events, and for send events that are not retransmissions.

`tcplsinfo_t` Structure

The tcplsinfo_t structure contains the previous tcp state during a state change.

typedef struct tcplsinfo {
        int32_t tcps_state;              /* TCP state */
} tcplsinfo_t;

Table 11-43 tcplsinfo_t Members

tcps_state

previous TCP state. Inline definitions are provided for the various TCP states: TCP_STATE_CLOSED, TCP_STATE_SYN_SENT, etc. Use inline tcp_state_string[] to convert state to a string.

`tcpinfo_t` Structure

The tcpinfo_t structure is a DTrace translated version of the TCP header.

typedef struct tcpinfo {
        uint16_t tcp_sport;             /* source port */
        uint16_t tcp_dport;             /* destination port */
        uint32_t tcp_seq;               /* sequence number */
        uint32_t tcp_ack;               /* acknowledgment number */
        uint8_t tcp_offset;             /* data offset, in bytes */
        uint8_t tcp_flags;              /* flags */
        uint16_t tcp_window;            /* window size */
        uint16_t tcp_checksum;          /* checksum */
        uint16_t tcp_urgent;            /* urgent data pointer */
        tcph_t *tcp_hdr;                /* raw TCP header */
} tcpinfo_t;

Table 11-44 tcpinfo_t Members

`tcp_sport`	TCP source port.
`tcp_dport`	TCP destination port.
`tcp_seq`	TCP sequence number.
`tcp_ack`	TCP acknowledgment number.
`tcp_offset`	Payload data offset, in bytes (not 32-bit words).
`tcp_flags`	TCP flags. See the tcp_flags table below for available macros.
`tcp_window`	TCP window size, bytes.
`tcp_checksum`	Checksum of TCP header and payload.
`tcp_urgent`	TCP urgent data pointer, bytes.
`tcp_hdr`	Pointer to raw TCP header at time of tracing.

Table 11-45 tcp_flags Values

`TH_FIN`	No more data from sender (finish).
`TH_SYN`	Synchronize sequence numbers (connect).
`TH_RST`	Reset the connection.
`TH_PUSH`	TCP push function.
`TH_ACK`	Acknowledgment field is set.
`TH_URG`	Urgent pointer field is set.
`TH_ECE`	Explicit congestion notification echo (see RFC-3168).
`TH_CWR`	Congestion window reduction.

See RFC-793 for a detailed explanation of the standard TCP header fields and flags.

Examples

Some simple examples of tcp provider usage follow.

Connections by Host Address

This DTrace one-liner counts inbound TCP connections by source IP address:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  127.0.0.1                                                         1
  192.168.2.88                                                      1
  fe80::214:4fff:fe8d:59aa                                          1
  192.168.1.109                                                     3

The output above shows there were 3 TCP connections from 192.168.1.109, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Connections by TCP Port

This DTrace one-liner counts inbound TCP connections by local TCP port:

# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }'
dtrace: description 'tcp:::state-change' matched 1 probes
^C

 40648                1
    22                3

The output above shows there were 3 TCP connections for port 22 (ssh), a single TCP connection for port 40648 (an RPC port).

Who is Connecting to What

Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:

# dtrace -n 
'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' 
dtrace: description 'tcp:::state-change' matched 1 probes
^C

  192.168.2.88                                       40648                1
  fe80::214:4fff:fe8d:59aa                              22                1
  192.168.1.109                                         22                3

The output above shows there were 3 TCP connections from 192.168.1.109 to port 22 (ssh), etc.

Who Isn't Connecting to What

It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC793:

# dtrace -n 'tcp:::accept-refused 
{ @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' 
dtrace: description 'tcp:::receive ' matched 1 probes
^C

  192.168.1.109                                         23                2

Here we traced two failed attempts by host 192.168.1.109 to connect to port 23 (telnet).

Packets by Host Address

This DTrace one-liner counts TCP received packets by host address:

# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.168.2.30                                                     43
  192.168.1.109                                                    44
  192.168.2.88                                                   3722

The output above shows that 7 TCP packets were received from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Packets by Local Port

This DTrace one-liner counts TCP received packets by the local TCP port:

# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }'
dtrace: description 'tcp:::receive ' matched 5 probes
^C

 42303                3
 42634                3
  2049               27
 40648               36
    22              162

The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.

Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:

# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'tcp:::send ' matched 3 probes
^C

  192.168.1.109                                     
           value  ------------- Distribution ------------- count    
              32 |                                         0        
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    14       
             128 |@@@                                      1        
             256 |                                         0        

  192.168.2.30                                      
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@                     7        
              64 |@@@@@@@@@                                3        
             128 |@@@                                      1        
             256 |@@@@@@                                   2        
             512 |@@@                                      1        
            1024 |                                         0

`tcpstate.d`

This DTrace script demonstrates the capability to trace TCP state changes:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10

int last[int];

dtrace:::BEGIN
{
        printf(" %3s %12s  %-20s    %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW");
}

tcp:::state-change
/ last[args[1]->cs_cid] /
{
        this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000;
        printf(" %3d %12d  %-20s -> %-20s\n", cpu, this->elapsed,
            tcp_state_string[args[5]->tcps_state], 
                     tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;
}

tcp:::state-change
/ last[args[1]->cs_cid] == 0 /
{
        printf(" %3d %12s  %-20s -> %-20s\n", cpu, "-",
            tcp_state_string[args[5]->tcps_state],
            tcp_state_string[args[3]->tcps_state]);
        last[args[1]->cs_cid] = timestamp;

This script was run on a system for a couple of minutes:

# ./tcpstate.d 

 CPU    DELTA(us)  OLD                     NEW                 
   0            -  state-listen         -> state-syn-received  
   0          613  state-syn-received   -> state-established   
   0            -  state-idle           -> state-bound         
   0           63  state-bound          -> state-syn-sent      
   0          685  state-syn-sent       -> state-bound         
   0           22  state-bound          -> state-idle          
   0          114  state-idle           -> state-closed

In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to established. An outbound connection attempt is also made to a closed port. It takes 63us to go from bound to syn-sent, 685 us to go from syn-sent to bound etc.

The fields printed are:

Field	Description
`CPU`	CPU id for the event
`DELTA`(us)	time since previous event for that connection, microseconds
`OLD`	old TCP state
`NEW`	new TCP state

`tcpio.d`

The following DTrace script traces TCP packets and prints various details:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

dtrace:::BEGIN
{
        printf(" %3s %15s:%-5s      %15s:%-5s %6s  %s\n", "CPU",
            "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS");
}

tcp:::send
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d -> %16s:%-5d %6d  (", cpu,
            args[2]->ip_saddr, args[4]->tcp_sport,
            args[2]->ip_daddr, args[4]->tcp_dport, this->length);
}

tcp:::receive
{
        this->length = args[2]->ip_plength - args[4]->tcp_offset;
        printf(" %3d %16s:%-5d <- %16s:%-5d %6d  (", cpu,
            args[2]->ip_daddr, args[4]->tcp_dport,
            args[2]->ip_saddr, args[4]->tcp_sport, this->length);
}

tcp:::send,
tcp:::receive
{
        printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : "");
        printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : "");
        printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : "");
        printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : "");
        printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : "");
        printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : "");
        printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : "");
        printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : "");
        printf("%s", args[4]->tcp_flags == 0 ? "null " : "");
        printf("\b)\n");
}

This example output has captured a TCP handshake:

# ./tcpio.d
 CPU           LADDR:LPORT                RADDR:RPORT  BYTES  FLAGS
   1     192.168.2.80:22    ->    192.168.1.109:60337    464  (PUSH|ACK)
   1     192.168.2.80:22    ->    192.168.1.109:60337     48  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (SYN)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (SYN|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     20  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337    376  (PUSH|ACK)
   3     192.168.2.80:22    ->    192.168.1.109:60337      0  (ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337     24  (PUSH|ACK)
   2     192.168.2.80:22    ->    192.168.1.109:60337    736  (PUSH|ACK)
   3     192.168.2.80:22    <-    192.168.1.109:60337      0  (ACK)

The fields printed are:

Field	Description
`CPU`	CPU id that event occurred on
`LADDR`	local IP address
`LPORT`	local TCP port
`RADDR`	remote IP address
`RPORT`	remote TCP port
`BYTES`	TCP payload bytes
`FLAGS`	TCP flags

Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.

Stability

The tcp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element	Name Stability	Data Stability	Dependency Class
Provider	Evolving	Evolving	ISA
Module	Private	Private	Unknown
Function	Private	Private	Unknown
Name	Evolving	Evolving	ISA
Arguments	Evolving	Evolving	ISA

`udp` Provider

The udp provider provides probes for tracing the UDP protocol.

Probes

The udp probes are described in the table below.

Table 11-46 udp Probes

Probe	Description
`send`	Probe that fires whenever `UDP` sends a datagram.
`receive`	Probe that fires whenever `UDP` receives a datagram.

The send and receive probes trace datagrams on physical interfaces and also packets on loopback interfaces that are processed by udp.

Arguments

The argument types for the udp probes are listed in the table below. The arguments are described in the following section.

Table 11-47 udp Probe Arguments

Probe	`args[0]`	`args[1]`	`args[2]`	`args[3]`	`args[4]`
`send`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`udpinfo_t *`	`udpinfo_t *`
`receive`	`pktinfo_t *`	`csinfo_t *`	`ipinfo_t *`	`udpinfo_t *`	`udpinfo_t *`

`pktinfo_t` Structure

The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.

The pkt_addr member is currently always NULL.

typedef struct pktinfo {
        uintptr_t pkt_addr;             /* currently always NULL */
} pktinfo_t;

`csinfo_t` Structure

The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.

typedef struct csinfo {
        uintptr_t cs_addr;
    uint64_t cs_cid;
    pid_t cs_pid;
    zoneid_t cs_zoneid;
 } csinfo_t;

Table 11-48 csinfo_t Members

`cs_addr`	Address of translated `ip_xmit_attr_t *`.
`cs_cid`	Connection id. A unique per-connection identifier which identifies the connection during its lifetime.
`cs_pid`	Process ID associated with the connection.
`cs_zoneid`	Zone ID associated with the connection.

`ipinfo_t` Structure

The ipinfo_t structure contains common IP info for both IPv4 and IPv6.

typedef struct ipinfo {
        uint8_t ip_ver;                 /* IP version (4, 6) */
        uint16_t ip_plength;            /* payload length */
        string ip_saddr;                /* source address */
        string ip_daddr;                /* destination address */
} ipinfo_t;

These values are read at the time the probe fired in UDP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.

Table 11-49 ipinfo_t Members

`ip_ver`	IP version number. Currently either 4 or 6.
`ip_plength`	Payload length in bytes. This is the length of the packet at the time of tracing, excluding the IP header.
`ip_saddr`	Source IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.
`ip_daddr`	Destination IP address, as a string. For IPv4 this is a dotted decimal quad, IPv6 follows RFC-1884 convention 2 with lower case hexadecimal digits.

`udpsinfo_t` Structure

The udpsinfo_t structure contains udp state info.

typedef struct udpsinfo {
        uintptr_t       udps_addr;
    uint16_t    upds_lport;    /* local port */
    uint16_t    udps_fport;    /* remote port */
    string            udps_laddr;    /* local address, as a string */
    string            udps_faddr;    /* remote address, as a string */
} udpsinfo_t;

Table 11-50 udpsinfo_t Members

`udps_addr`	Address of translated `udp_t *`.
`udps_lport`	local port associated with the UDP connection.
`udps_fport`	remote port associated with the UDP connection.
`udps_laddr`	local address associated with the UDP connection, as a string
`udps_fport`	remote address associated with the UDP connection, as a string

`udpsinfo_t` Structure

The udpinfo_t structure is a DTrace translated version of the UDP header.

typedef struct udpinfo {
    uint16_t udp_sport;             /* source port */
    uint16_t udp_dport;             /* destination port */
    uint16_t udp_length;            /* total length */
    uint16_t udp_checksum;          /* headers + data checksum */
    udpha_t *udp_hdr;               /* raw UDP header */
} udpinfo_t;

Table 11-51 udpinfo_t Members

`udp_sport`	UDP source port.
`udp_dport`	UDP destination port.
`udp_length`	Payload length in bytes.
`udp_checksum`	Checksum of UDP header and payload.
`udp_hdr`	Pointer to raw UDP header at time of tracing.

See RFC-768 for a detailed explanation of the standard UDP header fields and flags.

Examples

Some simple examples of udp provider usage follow.

Packets by Host Address

This DTrace one-liner counts UDP received packets by host address:

# dtrace -n 'udp:::receive { @[args[2]->ip_saddr] = count(); }'
dtrace: description 'udp:::receive ' matched 5 probes
^C

  127.0.0.1                                                         7
  fe80::214:4fff:fe8d:59aa                                         14
  192.168.2.30                                                     43
  192.168.1.109                                                    44
  192.168.2.88                                                   3722

The output above shows that 7 UDP packets were received from 127.0.0.1, 14 UDP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.

Packets by Local Port

This DTrace one-liner counts UDP received packets by the local UDP port:

# dtrace -n 'udp:::receive { @[args[4]->udp_dport] = count(); }'
dtrace: description 'udp:::receive ' matched 1 probe
^C

 33294                1
 33822                1
 38961                1
 44433                1
 46258                1
 46317                1
 47511                1
 50581                1
 54685                1
 56491                1
 59056                1
 62171                1
 62769                1
 64231                1

The output above shows that 1 packet was received for port 33294, 1 packet was received for port 33822, etc.

Sent Size Distribution

This DTrace one-liner prints distribution plots of IP payload size by destination, for UDP sends:

# dtrace -n 'udp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }'
dtrace: description 'udp:::send ' matched 6 probes
^C

  129.156.86.11                                     
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14       
              64 |                                         0

Stability

The udp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.

Element	Name Stability	Data Stability	Dependency Class
Provider	Evolving	Evolving	ISA
Module	Private	Private	Unknown
Function	Private	Private	Unknown
Name	Evolving	Evolving	ISA
Arguments	Evolving	Evolving	ISA

Skip Navigation Links
Exit Print View
	Oracle Solaris 11.1 Dynamic Tracing Guide Oracle Solaris 11.1 Information Library

Protocols

ip Provider

Probes

Arguments

args[0] - pktinfo_t Structure

args[1] - csinfo_t Structure

args[2] - ipinfo_t Structure

args[3] - ifinfo_t Structure

args[4] - ipv4info_t Structure

args[5] - ipv6info_t Structure

Examples

Packets by host address

Sent size distribution

ipio.d

ipproto.d

Stability

iscsi Provider

Probes

Arguments

Types

Examples

One-liners

iscsiwho.d

iscsixfer.d

nfsv3 Provider

Arguments

Probes

Examples

nfsv3rwsnoop.d

nfsv3ops.d

nfsv3fileio.d

nfsv3rwtime.d

nfsv3io.d

nfsv4 Provider

Arguments

Probes

Examples

nfsv4rwsnoop.d

nfsv4ops.d

nfsv4fileio.d

nfsv4rwtime.d

nfsv4io.d

srp Provider

Probes

Probes Overview

Service up/down Event Probes

Remote Port Login/Logout Event Probes

SRP Command Event Probes

SCSI Command Event Probes

Data Transfer Probes

Types

scsicmd_t

conninfo_t

srp_portinfo_t

srp_logininfo_t

srp_taskinfo_t

xferinfo_t

Examples

service.d

srpwho.d

srpsnoop.d

tcp Provider

Probes

Arguments

pktinfo_t Structure

csinfo_t Structure

ipinfo_t Structure

tcpsinfo_t Structure

tcplsinfo_t Structure

tcpinfo_t Structure

Examples

Connections by Host Address

Connections by TCP Port

Who is Connecting to What

Who Isn't Connecting to What

Packets by Host Address

Packets by Local Port

Sent Size Distribution

tcpstate.d

tcpio.d

`ip` Provider

`args[0] - pktinfo_t` Structure

`args[1] - csinfo_t` Structure

`args[2] - ipinfo_t` Structure

`args[3] - ifinfo_t` Structure

`args[4] - ipv4info_t` Structure

`args[5] - ipv6info_t` Structure

`ipio.d`

`ipproto.d`

`iscsi` Provider

`iscsiwho.d`

`iscsixfer.d`

`nfsv3` Provider

`nfsv3rwsnoop.d`

`nfsv3ops.d`

`nfsv3fileio.d`

`nfsv3rwtime.d`

`nfsv3io.d`

`nfsv4` Provider

`nfsv4rwsnoop.d`

`nfsv4ops.d`

`nfsv4fileio.d`

`nfsv4rwtime.d`

`nfsv4io.d`

`srp` Provider

`scsicmd_t`

`conninfo_t`

`srp_portinfo_t`

`srp_logininfo_t`

`srp_taskinfo_t`

`xferinfo_t`

`service.d`

`srpwho.d`

`srpsnoop.d`

`tcp` Provider

`pktinfo_t` Structure

`csinfo_t` Structure

`ipinfo_t` Structure

`tcpsinfo_t` Structure

`tcplsinfo_t` Structure

`tcpinfo_t` Structure

`tcpstate.d`

`tcpio.d`

`udp` Provider

`pktinfo_t` Structure

`csinfo_t` Structure

`ipinfo_t` Structure

`udpsinfo_t` Structure

`udpsinfo_t` Structure