Skip Navigation Links | |
Exit Print View | |
Oracle Solaris 11.1 Dynamic Tracing Guide Oracle Solaris 11.1 Information Library |
Co-existence With Existing Tools
cpucaps-sleep and cpucaps-wakeup
13. Statically Defined Tracing for User Applications
This section lists all the protocol providers.
The ip provider provides probes for tracing both IPv4 and IPv6 protocols.
The ip probes are described in the table below.
Table 11-33 ip Providers
|
These probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by ip. An IP packet must have a full IP header to be visible by these probes.
Note - Loopback tcp packets on Solaris may be processed by tcp fusion, a performance feature that by-passes the ip layer. These fused packets will not be visible using the ip:::send and ip:::receive probes. They are typically all loopback tcp packets after the tcp handshake.
The argument types for the ip probes are listed in the table below. The arguments are described in the following section.
Table 11-34 ip Probe Arguments
|
The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.
The pkt_addr member is currently always NULL.
typedef struct pktinfo { uintptr_t pkt_addr; /* currently always NULL */ } pktinfo_t;
The csinfo_t structure is where connection state info can be made available if connection IDs become supported by the kernel in the future.
The cs_addr member is currently always NULL.
typedef struct csinfo { uintptr_t cs_addr; /* currently always NULL */ } csinfo_t;
The ipinfo_t structure contains common IP info for both IPv4 and IPv6.
typedef struct ipinfo { uint8_t ip_ver; /* IP version (4, 6) */ uint16_t ip_plength; /* payload length */ string ip_saddr; /* source address */ string ip_daddr; /* destination address */ } ipinfo_t;
Table 11-35 ipinfo_t Members
|
The ifinfo_t structure contains network interface info.
typedef struct ifinfo { string if_name; /* interface name */ int8_t if_local; /* is delivered locally */ netstackid_t if_ipstack; /* ipstack ID */ uintptr_t if_addr; /* pointer to raw ill_t */ } ifinfo_t;
Table 11-36 ifinfo_t Members
|
The ifinfo_t details are provided for debugging convenience in the ip layer, if that information is available. There may be some types of traffic where some or all of that information is not available during the ip layer, for which the members may be: "<null>", -1, NULL, NULL.
The ipv4info_t structure is a DTrace translated version of the IPv4 header.
typedef struct ipv4info { uint8_t ipv4_ver; /* IP version (4) */ uint8_t ipv4_ihl; /* header length, bytes */ uint8_t ipv4_tos; /* type of service field */ uint16_t ipv4_length; /* length (header + payload) */ uint16_t ipv4_ident; /* identification */ uint8_t ipv4_flags; /* IP flags */ uint16_t ipv4_offset; /* fragment offset */ uint8_t ipv4_ttl; /* time to live */ uint8_t ipv4_protocol; /* next level protocol */ string ipv4_protostr; /* next level protocol, as a string */ uint16_t ipv4_checksum; /* header checksum */ ipaddr_t ipv4_src; /* source address */ ipaddr_t ipv4_dst; /* destination address */ string ipv4_saddr; /* source address, string */ string ipv4_daddr; /* destination address, string */ ipha_t *ipv4_hdr; /* pointer to raw header */ } ipv4info_t;
Table 11-37 ipv4info_t Members
|
See RFC-791 for a detailed explanation for these IPv4 header fields. If the packet is IPv6, these members are either "<null>", 0, or NULL depending on type.
Table 11-38 ipv4_flags Values
|
The ipv6info_t structure is a DTrace translated version of the IPv6 header.
typedef struct ipv6info { uint8_t ipv6_ver; /* IP version (6) */ uint8_t ipv6_tclass; /* traffic class */ uint32_t ipv6_flow; /* flow label */ uint16_t ipv6_plen; /* payload length */ uint8_t ipv6_nexthdr; /* next header protocol */ string ipv6_nextstr; /* next header protocol, as a string*/ uint8_t ipv6_hlim; /* hop limit */ in6_addr_t *ipv6_src; /* source address */ in6_addr_t *ipv6_dst; /* destination address */ string ipv6_saddr; /* source address, string */ string ipv6_daddr; /* destination address, string */ ip6_t *ipv6_hdr; /* pointer to raw header */ } ipv6info_t;
Table 11-39 ipv6info_t Members
|
See RFC-2460 for a detailed explanation for these IPv6 header fields. If the packet is IPv4, these members are either "<null>", 0, or NULL depending on type.
Some simple examples of ip provider usage follow.
This DTrace one-liner counts received packets by host address:
# dtrace -n 'ip:::receive { @[args[2]->ip_saddr] = count(); }' dtrace: description 'ip:::receive ' matched 4 probes ^C 192.168.1.5 1 192.168.1.185 4 fe80::214:4fff:fe3b:76c8 9 127.0.0.1 14 192.168.1.109 28
The output above shows that 28 IP packets were received from 192.168.1.109, 14 IP packets from 127.0.0.1, and so on.
This DTrace one-liner prints distribution plots of sent payload size by destination:
# dtrace -n 'ip:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' dtrace: description 'ip:::send ' matched 11 probes ^C 192.168.2.27 value ------------- Distribution ------------- count 8 | 0 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7 32 |@@@@ 1 64 |@@@@ 1 128 | 0 192.168.1.109 value ------------- Distribution ------------- count 8 | 0 16 |@@@@@ 5 32 |@@@ 3 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 24 128 |@ 1 256 |@ 1 512 |@@ 2 1024 |@ 1 2048 | 0
The following DTrace script traces IP packets and prints various details:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10hz dtrace:::BEGIN { printf(" %3s %10s %15s %15s %8s %6s\n", "CPU", "DELTA(us)", "SOURCE", "DEST", "INT", "BYTES"); last = timestamp; } ip:::send { this->elapsed = (timestamp - last) / 1000; printf(" %3d %10d %15s -> %15s %8s %6d\n", cpu, this->elapsed, args[2]->ip_saddr, args[2]->ip_daddr, args[3]->if_name, args[2]->ip_plength); last = timestamp; } ip:::receive { this->elapsed = (timestamp - last) / 1000; printf(" %3d %10d %15s <- %15s %8s %6d\n", cpu, this->elapsed, args[2]->ip_daddr, args[2]->ip_saddr, args[3]->if_name, args[2]->ip_plength); last = timestamp; }
This example output shows tracing packets as they pass in and out of tunnels:
# ./ipio.d CPU DELTA(us) SOURCE DEST INT BYTES 1 598913 10.1.100.123 -> 192.168.10.75 ip.tun0 68 1 73 192.168.1.108 -> 192.168.5.1 nge0 140 1 18325 192.168.1.108 <- 192.168.5.1 nge0 140 1 69 10.1.100.123 <- 192.168.10.75 ip.tun0 68 0 102921 10.1.100.123 -> 192.168.10.75 ip.tun0 20 0 79 192.168.1.108 -> 192.168.5.1 nge0 92
The fields printed are:
|
Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; keep an eye on changes in the CPU column, or add a timestamp column and post sort.
This DTrace script provides a neat summary for both send and receive IP traffic, including the next level protocol:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } ip:::send, ip:::receive { this->protostr = args[2]->ip_ver == 4 ? args[4]->ipv4_protostr : args[5]->ipv6_nextstr; @num[args[2]->ip_saddr, args[2]->ip_daddr, this->protostr] = count(); } dtrace:::END { printf(" %-28s %-28s %6s %8s\n", "SADDR", "DADDR", "PROTO", "COUNT"); printa(" %-28s %-28s %6s %@8d\n", @num); }
This script was run on a system with both IPv4 and IPv6 interfaces for several seconds:
# ./ipproto.d Tracing... Hit Ctrl-C to end. ^C SADDR DADDR PROTO COUNT 192.168.1.108 192.168.155.32 UDP 1 192.168.1.108 192.168.17.55 UDP 1 192.168.1.108 192.168.228.54 UDP 1 192.168.1.108 192.168.1.5 UDP 1 192.168.1.108 192.168.2.27 ICMP 1 192.168.1.200 192.168.3.255 UDP 1 192.168.1.5 192.168.1.108 UDP 1 192.168.2.27 192.168.1.108 ICMP 1 fe80::214:4fff:fe3b:76c8 ff02::1 ICMPV6 1 fe80::2e0:81ff:fe5e:8308 fe80::214:4fff:fe3b:76c8 ICMPV6 1 fe80::2e0:81ff:fe5e:8308 ff02::1:2 UDP 1 192.168.1.185 192.168.1.255 UDP 2 192.168.1.211 192.168.1.255 UDP 3 192.168.1.109 192.168.1.108 TCP 428 192.168.1.108 192.168.1.109 TCP 789
The fields printed are:
|
The example output above provides a quick summary of network activity with host address details; we can see that both 192.168.1.109 and 192.168.1.108 are swapping many packets via TCP.
The ip provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.
|
The iscsi provider provides probes for tracing iSCSI target activity.
This is a kernel provider built into the COMSTAR iSCSI target port provider. The COMSTAR iSCSI target and the user-land iSCSI target (/usr/sbin/iscsitgtd) are mutually exclusive. Only one of the targets can be enabled at a time. The COMSTAR iSCSI target DTrace provider provides all the probes that are provided by the user-land iSCSI provider, so that any DTrace script written for the userland provider built into the iSCSI target daemon (iscsitgtd) will work with the COMSTAR iSCSI target port provider as well without any modification. Since this provider instruments the iSCSI target activity, DTrace commands and scripts must be run on the iSCSI target server to observe these probes.
|
|
All COMSTAR iSCSI target probes have the first and second argument in common:
args[0] conninfo_t * connection information
conninfo_t
typedef struct conninfo { string ci_local; /* local host IP address */ string ci_remote; /* remote host IP address */ string ci_protocol; /* protocol ("ipv4", "ipv6") */ } conninfo_t; The conninfo_t structure is used by NFSv4 provider, Fibre Channel provider and is intended for use by all application protocol providers as the first argument to indicate some basic information about the connection.
args[1] iscsiinfo_t * common iSCSI properties
iscsiinfo_t
typedef struct iscsiinfo { string ii_target; /* target iqn */ string ii_initiator; /* initiator iqn */ string ii_isid; /* initiator session identifier */ string ii_tsih; /* target session identifying handle */ string ii_transport; /* transport type ("iser-ib", "sockets") */ uint64_t ii_lun; /* target logical unit number */ uint32_t ii_itt; /* initiator task tag */ uint32_t ii_ttt; /* target transfer tag */ uint32_t ii_cmdsn; /* command sequence number */ uint32_t ii_statsn; /* status sequence number */ uint32_t ii_datasn; /* data sequence number */ uint32_t ii_datalen; /* length of data payload */ uint32_t ii_flags; /* probe-specific flags */ } iscsiinfo_t; The iscsiinfo_t structure is used to provide identifying information about the target and the initiator and also some PDU level information such as lun, data length and sequence numbers.
The third argument is only used for the SCSI command probe or the data transfer probe
args[2] scsicmd_t * SCSI command block (cdb)
scsicmd_t
typedef struct scsicmd { uint64_t ic_len; /* CDB length */ uint8_t *ic_cdb; /* CDB data */ } scsicmd_t; The scsicmd_t structure is used by the SCSI command probe and it contains information about the SCSI command blocks and is intended for use by all the application protocols that deal with SCSI data.
Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead the xfer-start and xfer-done probes can be used to trace the data transfer irrespective of the transport used. The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically.
args[2] xferinfo_t * data transfer information
xferinfo_t
typedef struct xferinfo { uintptr_t xfer_laddr; /* local buffer address */ uint32_t xfer_loffset; /* offset within the local buffer */ uint32_t xfer_lkey; /* access control to local memory */ uintptr_t xfer_raddr; /* remote virtual address */ uint32_t xfer_roffset; /* offset from the remote address */ uint32_t xfer_rkey; /* access control to remote virtual address */ uint32_t xfer_len; /* transfer length */ uint32_t xfer_type; /* Read or Write */ } xferinfo_t; The xferinfo_t structure is used by the xfer-start and the xfer-done probes and contain information about the data transfer. When the transport type is iSER, the remote buffer information is given by the xfer_raddr, xfer_rkey and xfer_roffset fields. It is set to 0 when the transport type is sockets.
Frequency of iSCSI command types:
# dtrace -n 'iscsi*::: { @[probename] = count(); }'
Frequency of iSCSI client IP addresses:
# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'
Payload bytes by iSCSI command type:
# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'
Payload byte distribution by iSCSI command type:
# dtrace -n 'iscsi*::: { @[probename] = quantize(args[1]->ii_datalen); }'
This is a simple script to produce a report of the remote IP addresses and a count of iSCSI events. This is intended to provide a quick summary of iSCSI activity when run on the iSCSI target server:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } iscsi*::: { @events[args[0]->ci_remote, probename] = count(); } dtrace:::END { printf(" %-26s %14s %8s\n", "REMOTE IP", "iSCSI EVENT", "COUNT"); printa(" %-26s %14s %@8d\n", @events); }
This output shows the host and the number of iSCSI operations:
# ./iscsiwho.d Tracing... Hit Ctrl-C to end. ^C REMOTE IP iSCSI EVENT COUNT 192.168.1.109 nop-receive 1 192.168.1.109 nop-send 1 192.168.1.109 scsi-response 126 192.168.1.109 scsi-command 208 192.168.1.109 data-request 1207 192.168.1.109 data-receive 1207
The fields are:
|
The example output shows normal traffic. For this simple script, these event names are not translated beyond their iSCSI provider probe names, and require some thought to comprehend (they are from the perspective of the iSCSI target server).
Although the transport layer is transparent to the user, the COMSTAR iSCSI target also supports iSCSI over Remote DMA (RDMA), also known as iSER. An iSER initiator should be able to read and write data from an iSER target at high data rates with relatively low CPU utilization compared to iSCSI using TCP/IP. In order to see the transport layer in use, display the ii_transport field from the iscsiinfo_t structure.
Since the data transfer phases are mapped to Remote DMA (RDMA) operations in iSER, the data-send, data-receive and data-request probes cannot be used with iSER. Instead here is a simple script to print an aggregation of all the data transferred between two points using the xfer-start probe. This can be used for iSCSI using TCP/IP and iSCSI over Remote DMA.
The data-receive, data-request and data-send probes can be used when a user wants to track the SCSI Data-IN and Data-OUT PDUs specifically (e.g. if the PDUs are received out of order, one might want to trace the ii_ttt, ii_datasn, ii_statsn etc.). To just get a trace of IO activity, the xfer-start/xfer-done probes should suffice.
#!/usr/sbin/dtrace -s #pragma D option quiet iscsi:::xfer-start { @[args[0]->ci_remote, args[2]->xfer_type] = sum(args[2]->xfer_len); } END { printf("%26s %10s %8s\n", "REMOTE IP", "READ/WRITE", "BYTES"); printa("%26s %10s %15@d\n", @); }
This output shows the transfer of bytes:
# ./iscsixfer.d Tracing... Hit Ctrl-C to end. ^C REMOTE IP READ/WRITE BYTES 192.168.1.109 write 464 192.168.1.109 read 1024
The fields are:
|
Now if a user is interested in just seeing the data move (read or write) as it happens, one could use this script
#!/usr/sbin/dtrace -s #pragma D option quiet BEGIN { printf(" %-26s %8s %10s\n", "REMOTE IP", "BYTES", "READ/WRITE"); } iscsi:::xfer-start { printf("%26s %%8d %108s\n", args[0]->ci_remote, args[2]->xfer_len, args[2]->xfer_type); }
An interpretation for some of these events are:
|
The nfsv3 provider provides probes for tracing NFS version 3 server activity.
All NFS operation probes have the first argument in common:
args[0] conninfo_t * socket connection information
The conninfo_t structure is already used by the iSCSI target provider (iscsi) and the NFS v4 provider (nfsv4), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).
typedef struct conninfo { string ci_local; /* local host address */ string ci_remote; /* remote host address */ string ci_protocol; /* protocol (ipv4, ipv6, etc) */ } conninfo_t;
Operation probes have their second argument in common:
args[1] nfsv3opinfo_t * NFS v3 operation properties typedef struct nfsv3opinfo { string noi_curpath; /* current file handle path (if any) */ cred_t *noi_cred; /* credentials */ uint64_t noi_xid; /* transaction ID */ } nfsv4opinfo_t;
Below is a list of the probes along with the specific argument for each whose type is defined by the NFS v3 specification:
|
Note - The op-null-* probes have an undefined args[2].
Some simple examples of nfsv3 provider usage follow.
This DTrace scripts traces NFSv3 read and write requests, showing details of each operation:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10hz dtrace:::BEGIN { printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)", "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME"); } nfsv3:::op-read-start { printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000, args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count, args[1]->noi_curpath); } nfsv3:::op-write-start { printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000, args[0]->ci_remote, "W", args[2]->offset / 1024, args[2]->data.data_len, args[1]->noi_curpath); }
The following output shows a read of /export/stuff/bin/ghex2, then a read of /export/stuff/bin/gksu, and finally a write of /export/stuff/words12:
# ./nfsv3iosnoop.d TIME(us) CLIENT OP OFFSET BYTES PATHNAME 4299383207 192.168.17.75 R 0 4096 /export/stuff/bin/ghex2 4299391813 192.168.17.75 R 4 28672 /export/stuff/bin/ghex2 4299395700 192.168.17.75 R 32 32768 /export/stuff/bin/ghex2 4299396038 192.168.17.75 R 96 32768 /export/stuff/bin/ghex2 4299396462 192.168.17.75 R 128 8192 /export/stuff/bin/ghex2 4299396550 192.168.17.75 R 64 32768 /export/stuff/bin/ghex2 4320233417 192.168.17.75 R 0 4096 /export/stuff/bin/gksu 4320240902 192.168.17.75 R 4 28672 /export/stuff/bin/gksu 4320242434 192.168.17.75 R 32 32768 /export/stuff/bin/gksu 4320242730 192.168.17.75 R 64 24576 /export/stuff/bin/gksu 4333460565 192.168.17.75 W 0 32768 /export/stuff/words12 4333461477 192.168.17.75 W 32 32768 /export/stuff/words12 4333463264 192.168.17.75 W 64 32768 /export/stuff/words12 4333463567 192.168.17.75 W 96 32768 /export/stuff/words12 4333463893 192.168.17.75 W 128 32768 /export/stuff/words12 4333464202 192.168.17.75 W 160 32768 /export/stuff/words12 4333464451 192.168.17.75 W 192 10055 /export/stuff/words12
The fields printed are:
|
Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering; post sort the TIME column if needed.
This DTrace script counts NFSv3 operations by client, printing a summary every 5 seconds:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Tracing... Interval 5 secs.\n"); } nfsv3:::op-* { @ops[args[0]->ci_remote, probename] = count(); } profile:::tick-5sec, dtrace:::END { printf("\n %-32s %-28s %8s\n", "Client", "Operation", "Count"); printa(" %-32s %-28s %@8d\n", @ops); trunc(@ops); }
The following output shows which client is sending which NFSv3 operations:
# ./nfsv3ops.d Tracing... Interval 5 secs. Client Operation Count 192.168.17.75 op-commit-done 1 192.168.17.75 op-commit-start 1 192.168.17.75 op-create-done 1 192.168.17.75 op-create-start 1 192.168.17.75 op-access-done 6 192.168.17.75 op-access-start 6 192.168.17.75 op-read-done 6 192.168.17.75 op-read-start 6 192.168.17.75 op-write-done 7 192.168.17.75 op-write-start 7 192.168.17.75 op-lookup-done 8 192.168.17.75 op-lookup-start 8 192.168.17.75 op-getattr-done 10 192.168.17.75 op-getattr-start 10 Client Operation Count Client Operation Count 192.168.17.75 op-getattr-done 1 192.168.17.75 op-getattr-start 1
The fields printed are:
|
This DTrace script prints a summary of file read and write bytes:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Tracing... Hit Ctrl-C to end.\n"); } nfsv3:::op-read-done { @readbytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.data.data_len); } nfsv3:::op-write-done { @writebytes[args[1]->noi_curpath] = sum(args[2]->res_u.ok.count); } dtrace:::END { printf("\n%12s %12s %s\n", "Rbytes", "Wbytes", "Pathname"); printa("%@12d %@12d %s\n", @readbytes, @writebytes); }
This output shows a few files were read, and one was written:
# ./nfsv3fileio.d Tracing... Hit Ctrl-C to end. ^C Rbytes Wbytes Pathname 0 206663 /export/stuff/words10 8624 0 /export/stuff/bin/echo-client-2 13228 0 /export/stuff/bin/echo 496292 0 /export/stuff/bin/ecpg
The fields printed are:
|
This DTrace script prints a summary NFSv3 read and write elapsed times (latencies), along with other details:
#!/usr/sbin/dtrace -s #pragma D option quiet inline int TOP_FILES = 10; dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } nfsv3:::op-read-start, nfsv3:::op-write-start { start[args[1]->noi_xid] = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done /start[args[1]->noi_xid] != 0/ { this->elapsed = timestamp - start[args[1]->noi_xid]; @rw[probename == "op-read-done" ? "read" : "write"] = quantize(this->elapsed / 1000); @host[args[0]->ci_remote] = sum(this->elapsed); @file[args[1]->noi_curpath] = sum(this->elapsed); start[args[1]->noi_xid] = 0; } dtrace:::END { printf("NFSv3 read/write distributions (us):\n"); printa(@rw); printf("\nNFSv3 read/write by host (total us):\n"); normalize(@host, 1000); printa(@host); printf("\nNFSv3 read/write top %d files (total us):\n", TOP_FILES); normalize(@file, 1000); trunc(@file, TOP_FILES); printa(@file); }
This output below shows a clear peak in the read time distribution plot in the 64 to 127 microsecond range, and a second smaller peak between 4 and 16 milliseconds:
# ./nfsv3rwtime.d Tracing... Hit Ctrl-C to end. ^C NFSv3 read/write distributions (us): write value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 64 | 0 read value ------------- Distribution ------------- count 8 | 0 16 |@ 1 32 |@ 1 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 36 128 |@ 1 256 |@ 1 512 | 0 1024 | 0 2048 |@ 1 4096 |@@@ 3 8192 |@@@ 4 16384 | 0 NFSv3 read/write by host (total us): 192.168.17.75 81674 NFSv4 read/write top 10 files (total us): /export/stuff/motd 63 /export/stuff/bin/daps 5876 /export/stuff/bin/date 8150 /export/stuff/bin/dasher 67584
Other details are printed, such as total read/write latency by host, and the top 10 files by latency.
The next example also shows a pair of peaks, the first around a fraction of a millisecond, the second at around 4 milliseconds:
# ./nfsv3rwtime.d Tracing... Hit Ctrl-C to end. ^C NFSv3 read/write distributions (us): read value ------------- Distribution ------------- count 8 | 0 16 |@ 4 32 |@ 5 64 |@@@@@@ 22 128 |@@@@ 13 256 |@@@@@@@@@ 30 512 |@@ 7 1024 |@ 3 2048 |@@@ 12 4096 |@@@@@@@ 26 8192 |@@@@ 15 16384 |@ 2 32768 | 0 NFSv3 read/write by host (total us): 192.168.17.75 414458 NFSv3 read/write top 10 files (total us): /export/stuff/bin/cal 11225 /export/stuff/bin/cjpeg 11947 /export/stuff/bin/charmap 12347 /export/stuff/bin/cdda2wav.bin 13449 /export/stuff/bin/chkey 13963 /export/stuff/bin/cputrack 14533 /export/stuff/bin/catman 15535 /export/stuff/bin/csslint-0.6 18302 /export/stuff/bin/col 19926 /export/stuff/bin/cdrecord.bin 40622
The first peak is likely to be NFS operations hitting the memory cache, and the second those that missed and went to disk. Further use of DTrace can confirm this theory.
The fields from the distribution plot are:
|
This is a simple DTrace script to provide basic I/O details by host every 5 seconds:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { interval = 5; printf("Tracing... Interval %d secs.\n", interval); tick = interval; } nfsv3:::op-* { @ops[args[0]->ci_remote] = count(); } nfsv3:::op-read-done { @reads[args[0]->ci_remote] = count(); @readbytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.data.data_len); } nfsv3:::op-write-done { @writes[args[0]->ci_remote] = count(); @writebytes[args[0]->ci_remote] = sum(args[2]->res_u.ok.count); } profile:::tick-1sec /tick-- == 0/ { normalize(@ops, interval); normalize(@reads, interval); normalize(@writes, interval); normalize(@writebytes, 1024 * interval); normalize(@readbytes, 1024 * interval); printf("\n %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s", "kr/s", "kw/s", "ops/s"); printa(" %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes, @readbytes, @writebytes, @ops); trunc(@ops); trunc(@reads); trunc(@writes); trunc(@readbytes); trunc(@writebytes); tick = interval; }
This output shows 192.168.17.75 calling NFSv3 reads and writes:
# ./nfsv3io.d Tracing... Interval 5 secs. Client r/s w/s kr/s kw/s ops/s 192.168.17.75 27 1 686 40 100 Client r/s w/s kr/s kw/s ops/s 192.168.17.75 0 0 0 0 8 Client r/s w/s kr/s kw/s ops/s 0.0.0.0 0 0 0 0 0 192.168.17.75 2 0 28 0 18 ^C
Other details can be calculated from the output, such as average read and write size (eg, 686(kr/s) / 27(r/s) = 25.4 average kr). These could also be added to the script to be printed as columns.
The fields printed are:
|
The nfsv4 provider provides probes for tracing NFS v4 server activity.
All NFS operation probes have the first argument in common:
args[0] conninfo_t * socket connection information
The conninfo_t structure is already used by the iSCSI target provider (iscsi), and is intended for use by all provider which are providing some higher level protocol (e.g. iscsi, nfs, http, ftp).
typedef struct conninfo { string ci_local; /* local host address */ string ci_remote; /* remote host address */ string ci_protocol; /* protocol (ipv4, ipv6, etc) */ } conninfo_t;
Operation probes have their second argument in common:
args[1] nfsv4opinfo_t * NFS v4 operation properties typedef struct nfsv4opinfo { string noi_curpath; /* current file handle path (if any) */ cred_t *noi_cred; /* credentials */ uint64_t noi_xid; /* transaction ID */ } nfsv4opinfo_t;
Callback operation probes have their second argument in common:
args[1] nfsv4cbinfo_t * NFS v4 callback properties typedef struct nfsv4cbinfo { string nci_curpath; /* file handle path (if any) */ } nfsv4cbinfo_t;
Below is a list of the top level operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:
|
Below is a list of operation probes along with the specific argument for each whose type is defined by the NFS v4 specification:
|
Callback compound probes have an undefined second argument; this slot is reserved for future use.
Below is a list of the top level callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:
|
Below is a list of callback probes along with the specific argument for each whose type is defined by the NFS v4 specification:
|
Note - Since the Solaris NFS v4 implementation does not yet use the 'getattr' callback, the probe will not be implemented; it is noted here in anticipation of a future implementation.
Some simple examples of nfsv4 provider usage follow.
This DTrace script traces NFSv4 reads and writes:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10hz dtrace:::BEGIN { printf("%-16s %-18s %2s %-8s %6s %s\n", "TIME(us)", "CLIENT", "OP", "OFFSET", "BYTES", "PATHNAME"); } nfsv4:::op-read-start { printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000, args[0]->ci_remote, "R", args[2]->offset / 1024, args[2]->count, args[1]->noi_curpath); } nfsv4:::op-write-start { printf("%-16d %-18s %2s %-8d %6d %s\n", timestamp / 1000, args[0]->ci_remote, "W", args[2]->offset / 1024, args[2]->data_len, args[1]->noi_curpath); }
This output shows a few files were read, and one was written:
# ./nfsv4rwsnoop.d TIME(us) CLIENT OP OFFSET BYTES PATHNAME 156889725960 192.168.1.109 R 0 4096 /export/share/bin/nawk 156889735515 192.168.1.109 R 4 28672 /export/share/bin/nawk 156889736298 192.168.1.109 R 32 32768 /export/share/bin/nawk 156889736544 192.168.1.109 R 96 32768 /export/share/bin/nawk 156889736902 192.168.1.109 R 64 32768 /export/share/bin/nawk 156916061653 192.168.1.109 R 0 4096 /export/share/bin/ssh 156916069375 192.168.1.109 R 4 28672 /export/share/bin/ssh 156916070098 192.168.1.109 R 32 32768 /export/share/bin/ssh 156916070435 192.168.1.109 R 96 32768 /export/share/bin/ssh 156916070758 192.168.1.109 R 64 32768 /export/share/bin/ssh 156916071036 192.168.1.109 R 128 32768 /export/share/bin/ssh 156916071352 192.168.1.109 R 160 32768 /export/share/bin/ssh 156916071582 192.168.1.109 R 192 32768 /export/share/bin/ssh 156916071696 192.168.1.109 R 72 4096 /export/share/bin/ssh 156916080508 192.168.1.109 R 224 4096 /export/share/bin/ssh 156916080844 192.168.1.109 R 228 28672 /export/share/bin/ssh 156916081566 192.168.1.109 R 256 32768 /export/share/bin/ssh 156916081833 192.168.1.109 R 288 32768 /export/share/bin/ssh 156916082237 192.168.1.109 R 320 20480 /export/share/bin/ssh 156933373074 192.168.1.109 W 0 32768 /export/share/words 156933373351 192.168.1.109 W 32 32768 /export/share/words 156933373855 192.168.1.109 W 64 32768 /export/share/words 156933374185 192.168.1.109 W 96 32768 /export/share/words 156933375442 192.168.1.109 W 128 32768 /export/share/words 156933375864 192.168.1.109 W 160 32768 /export/share/words 156933376105 192.168.1.109 W 192 10055 /export/share/words
The fields printed are:
|
This DTrace script counts NFSv4 operations by client, printing a summary every 5 seconds:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Tracing... Interval 5 secs.\n"); } nfsv4:::op-* { @ops[args[0]->ci_remote, probename] = count(); } profile:::tick-5sec, dtrace:::END { printf("\n %-32s %-28s %8s\n", "Client", "Operation", "Count"); printa(" %-32s %-28s %@8d\n", @ops); trunc(@ops); }
The following output shows which client is sending which NFSv4 operations:
# ./nfsv4ops.d Tracing... Interval 5 secs. Client Operation Count 192.168.1.109 op-getattr-done 1 192.168.1.109 op-getattr-start 1 192.168.1.109 op-putfh-done 1 192.168.1.109 op-putfh-start 1 Client Operation Count 192.168.1.109 op-access-done 1 192.168.1.109 op-access-start 1 192.168.1.109 op-close-done 1 192.168.1.109 op-close-start 1 192.168.1.109 op-getfh-done 1 192.168.1.109 op-getfh-start 1 192.168.1.109 op-open-done 1 192.168.1.109 op-open-start 1 192.168.1.109 op-getattr-done 3 192.168.1.109 op-getattr-start 3 192.168.1.109 op-read-done 9 192.168.1.109 op-read-start 9 192.168.1.109 op-putfh-done 12 192.168.1.109 op-putfh-start 12 ^C Client Operation Count
The fields printed are:
|
This DTrace script prints a summary of file read and write bytes:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { trace("Tracing... Hit Ctrl-C to end.\n"); } nfsv4:::op-read-done { @readbytes[args[1]->noi_curpath] = sum(args[2]->data_len); } nfsv4:::op-write-done { @writebytes[args[1]->noi_curpath] = sum(args[2]->count); } dtrace:::END { printf("\n%12s %12s %s\n", "Rbytes", "Wbytes", "Pathname"); printa("%@12d %@12d %s\n", @readbytes, @writebytes); }
This output shows a few files were read, and one was written:
# ./nfsv4fileio.d Tracing... Hit Ctrl-C to end. ^C Rbytes Wbytes Pathname 0 206663 /export/share/words1 24528 0 /export/share/bin/xargs 44864 0 /export/share/bin/ed 232476 0 /export/share/bin/vi
The fields printed are:
|
This DTrace script prints a summary NFSv4 read and write elapsed times (latencies), along with other details:
#!/usr/sbin/dtrace -s #pragma D option quiet inline int TOP_FILES = 10; dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } nfsv4:::op-read-start, nfsv4:::op-write-start { start[args[1]->noi_xid] = timestamp; } nfsv4:::op-read-done, nfsv4:::op-write-done { this->elapsed = timestamp - start[args[1]->noi_xid]; @rw[probename == "op-read-done" ? "read" : "write"] = quantize(this->elapsed / 1000); @host[args[0]->ci_remote] = sum(this->elapsed); @file[args[1]->noi_curpath] = sum(this->elapsed); start[args[1]->noi_xid] = 0; } dtrace:::END { printf("NFSv4 read/write distributions (us):\n"); printa(@rw); printf("\nNFSv4 read/write by host (total us):\n"); normalize(@host, 1000); printa(@host); printf("\nNFSv4 read/write top %d files (total us):\n", TOP_FILES); normalize(@file, 1000); trunc(@file, TOP_FILES); printa(@file); }
This output below shows a peak in the read time distribution plot in the 64 to 127 microsecond range, and a second peak between 2 and 8 milliseconds:
# ./nfsv4rwtime.d Tracing... Hit Ctrl-C to end. ^C NFSv4 read/write distributions (us): write value ------------- Distribution ------------- count 32 | 0 64 |@@@@@@ 1 128 |@@@@@@@@@@@ 2 256 |@@@@@@@@@@@@@@@@@ 3 512 |@@@@@@ 1 1024 | 0 read value ------------- Distribution ------------- count 16 | 0 32 |@@@@ 6 64 |@@@@@@@@@@@@ 17 128 |@ 1 256 |@@ 3 512 |@ 1 1024 |@@ 3 2048 |@@@@@@@@ 12 4096 |@@@@@@@@@@ 15 8192 |@ 1 16384 | 0 NFSv4 read/write by host (total us): 192.168.1.109 148215 NFSv4 read/write top 10 files (total us): /export/share/bin/man 5020 /export/share/bin/makeuuid 5132 /export/share/bin/mc68030 5836 /export/share/bin/m4 6446 /export/share/bin/msgfmt 6669 /export/share/bin/mkmsgs 6674 /export/share/bin/mailstats 6935 /export/share/bin/mkdir 7009 /export/share/bin/mac 7693 /export/share/bin/make 27903
Other details are printed, such as total read/write latency by host, and the top 10 files by latency.
The first peak in the read distribution is likely to be NFS operations hitting the memory cache, and the second those that missed and read from disk. The writes were all fast as they are likely to written to the memory cache and returned asynchronously. Further use of DTrace can confirm these theories.
The fields from the distribution plot are:
|
This is a simple DTrace script to provide basic I/O details by host every 5 seconds:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { interval = 5; printf("Tracing... Interval %d secs.\n", interval); tick = interval; } nfsv4:::op-* { @ops[args[0]->ci_remote] = count(); } nfsv4:::op-read-done { @reads[args[0]->ci_remote] = count(); @readbytes[args[0]->ci_remote] = sum(args[2]->data_len); } nfsv4:::op-write-done { @writes[args[0]->ci_remote] = count(); @writebytes[args[0]->ci_remote] = sum(args[2]->count); } profile:::tick-1sec /tick-- == 0/ { normalize(@ops, interval); normalize(@reads, interval); normalize(@writes, interval); normalize(@writebytes, 1024 * interval); normalize(@readbytes, 1024 * interval); printf("\n %-32s %6s %6s %6s %6s %8s\n", "Client", "r/s", "w/s", "kr/s", "kw/s", "ops/s"); printa(" %-32s %@6d %@6d %@6d %@6d %@8d\n", @reads, @writes, @readbytes, @writebytes, @ops); trunc(@ops); trunc(@reads); trunc(@writes); trunc(@readbytes); trunc(@writebytes); tick = interval; }
This output shows 192.168.1.109 calling NFSv4 reads and writes:
# ./nfsv4io.d Tracing... Interval 5 secs. Client r/s w/s kr/s kw/s ops/s 192.168.1.109 17 1 331 40 290 Client r/s w/s kr/s kw/s ops/s 192.168.1.109 9 0 197 0 152 Client r/s w/s kr/s kw/s ops/s 192.168.1.109 16 0 269 0 363 Client r/s w/s kr/s kw/s ops/s ^C
Other details can be calculated from the output, such as average read and write size (eg, 331(kr/s) / 17(r/s) = 19.5 average kr). These could also be added to the script to be printed as columns.
The fields printed are:
|
The srp provider provides probes for tracing srp port provider activity.
This is a kernel provider built into the COMSTAR srp target port provider.
|
For all of the providers below, string fields which are not known contain the string "<unknown>". Integer fields which are not known contain 0.
srp:::service-up and srp:::service-down trace SRP target online and offline events. Remote port information (ci_remote) is unavailable for both probes.
|
|
|
|
|
scsicmd_t, conninfo_t and xferinfo_t are common types which are used by other providers.
typedef struct scsicmd { uint64_t ic_len; /* CDB length */ uint8_t *ic_cdb; /* CDB data */ } scsicmd_t;
typedef struct conninfo { string ci_local; /* GID of the local HCA */ string ci_remote; /* GID of the remote HCA */ string ci_protocol; /* protocol ("ib") */ } conninfo_t;
typedef struct srp_portinfo { /* initiator */ string pi_initiator; /* Initiator: eui.xxxxxxxxxxxxxxx */ string pi_i_sid; /* Initiator seiion id */ /* target */ string pi_target; /* Target: eui.xxxxxxxxxxxxxxx */ string pi_t_sid; /* Target session id */ uintptr_t pi_chan_id; /* Channel identifier */ } srp_portinfo_t;
typedef struct srp_logininfo { uint64_t li_task_tag; /* SRP task tag */ uint32_t li_max_it_iu_len; /* Maxium iu length that initiator can send to target */ uint32_t li_max_ti_iu_len; /* Maxium iu length that target can send to initiator */ uint32_t li_request_limit; /* Maximun number of SRP requests that initiator can send on a channel */ uint32_t reason_code; /* Reason code */ } srp_logininfo_t;
typedef struct srp_taskinfo { uint64_t ti_task_tag; /* SRP task tag */ uint64_t ti_lun; /* Target logical unit number */ uint8_t ti_function; /* Task management function */ uint32_t ti_req_limit_delta; /* Increment of channel's request limit */ uint8_t ti_flag; /* bit 2:DOOVER 3:DOUNDER 4:DIOVER 5:DIUNDER */ uint32_t ti_do_resid_cnt; /* Data-out residual count */ uint32_t ti_di_resid_cnt; /* Data-in residual count */ uint8_t ti_status; /* Status of this task */ } srp_taskinfo_t;
typedef struct xferinfo { uintptr_t xfer_laddr; /* Local buffer address */ uint32_t xfer_loffset; /* Relative offset from local buffer */ uint32_t xfer_lkey; /* Access control to local memory */ uintptr_t xfer_raddr; /* Remote virtual address */ uint32_t xfer_roffset; /* Offset from the remote address */ uint32_t xfer_rkey; /* Access control to remote address */ uint32_t xfer_len; /* Transfer length */ uint8_t xfer_type; /* 0: read; 1: write; */ } xferinfo_t;
This is a simple script to produce a report of target online/offline events.
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n\n"); printf("%-14s %-35s %-20s\n", "SRP EVENT", \ "LOCAL PORT", "EUI NAME"); }; srp:::service-up { printf("%-14s %-35s %-20s\n", probename, \ args[0]->ci_local, args[1]->pi_target); } srp:::service-down { printf("%-14s %-35s %-20s\n", probename, \ args[0]->ci_local, args[1]->pi_target); }
This output shows the host and the number of iSCSI operations:
thump1## dtrace -s ~/src/service.d Tracing... Hit Ctrl-C to end. ^C SRP EVENT LOCAL PORT EUI NAME service-down fe80000000000000:0003ba0001004d31 eui.0003BA0001004D30 service-down fe80000000000000:0003ba0001004d32 eui.0003BA0001004D30 service-up fe80000000000000:0003ba0001004d31 eui.0003BA0001004D30 service-up fe80000000000000:0003ba0001004d32 eui.0003BA0001004D30 thump1##
The fields are:
|
This is a simple script to produce a report of the remote HCA port and a count of srp events. This is intended to provide a quick summary of srp activity when run on the SRP target server:
#!/usr/sbin/dtrace -s #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } srp:::login-command, srp:::login-response, srp:::task-command, srp:::task-response, srp:::scsi-command, srp:::scsi-response, srp:::xfer-start, srp:::xfer-done { @events[args[0]->ci_remote, probename] = count(); } dtrace:::END { printf(" %-33s %14s %8s\n", "REMOTE GID", "iSCSI EVENT", "COUNT"); printa(" %-33s %14s %@8d\n", @events); }
This output shows the host and the number of iSCSI operations:
thump1## dtrace -s ./srpwho.d Tracing... Hit Ctrl-C to end. ^C REMOTE GID iSCSI EVENT COUNT fe80000000000000:0003ba000100386d login-command 1 fe80000000000000:0003ba000100386d login-response 1 fe80000000000000:0003ba0001003851 login-command 2 fe80000000000000:0003ba0001003851 login-response 2 fe80000000000000:0003ba0001003852 login-command 2 fe80000000000000:0003ba0001003852 login-response 2 fe80000000000000:0003ba0001004d32 xfer-done 9 fe80000000000000:0003ba0001004d32 xfer-start 9 fe80000000000000:0003ba0001004d31 xfer-done 18 fe80000000000000:0003ba0001004d31 xfer-start 18 fe80000000000000:0003ba0001004d32 scsi-command 22 fe80000000000000:0003ba0001004d32 scsi-response 22 fe80000000000000:0003ba0001004d32 task-command 22 fe80000000000000:0003ba0001004d32 task-response 22 fe80000000000000:0003ba0001004d31 scsi-command 42 fe80000000000000:0003ba0001004d31 scsi-response 42 fe80000000000000:0003ba0001004d31 task-command 42 fe80000000000000:0003ba0001004d31 task-response 42
The fields are:
|
This is a simple script to snoop srp events when run on a srp target server.
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10 dtrace:::BEGIN { printf("%17s %3s %-40s %-14s %6s %10s %6s\n", "TIMESTAMP", "CPU", "REMOTE GID", "EVENT", "BYTES", "TAG", "SCSIOP"); /* * SCSI opcode to string translation hash. This is from * /usrp/include/sys/scsi/generic/commands.h. If you would * rather all hex, comment this out. */ scsiop[0x08] = "read"; scsiop[0x0a] = "write"; scsiop[0x0b] = "seek"; scsiop[0x28] = "read(10)"; scsiop[0x2a] = "write(10)"; scsiop[0x2b] = "seek(10)"; } srp:::login-* { printf("%17d %3d %-40s %-14s %17d -\n", timestamp, cpu, args[0]->ci_remote, probename, args[2]->li_task_tag); } srp:::task-command, srp:::task-response, srp:::scsi-response { printf("%17d %3d %-40s %-14s %6d %10d -\n", timestamp, cpu, args[0]->ci_remote, probename, 0, args[2]->ti_task_tag); } srp:::scsi-command /scsiop[args[2]->ic_cdb[0]] != NULL/ { printf("%17d %3d %-40s %-14s %6d %10d %s\n", timestamp, cpu, args[0]->ci_remote, probename, 0, args[3]->ti_task_tag, scsiop[args[2]->ic_cdb[0]]); } srp:::scsi-command /scsiop[args[2]->ic_cdb[0]] == NULL/ { printf("%17d %3d %-40s %-14s %6d %10d 0x%x\n", timestamp, cpu, args[0]->ci_remote, probename, 0, args[3]->ti_task_tag, args[2]->ic_cdb[0]); } srp:::xfer-start, srp:::xfer-done { printf("%17d %3d %-40s %-14s %6d %10d %s\n", timestamp, cpu, args[0]->ci_remote, probename,args[2]->xfer_len, args[3]->ti_task_tag, args[2]->xfer_type > 0 ? "READ" : "WRITE"); }
This output shows the snoop on dd commands executed by the initiator.
thump1## dtrace -s ./srpsnoop.d TIMESTAMP CPU REMOTE GID EVENT BYTES TAG SCSIOP 22644410404019 3 fe80000000000000:0003ba0001004d31 task-command 0 26 - 22644410493068 3 fe80000000000000:0003ba0001004d31 scsi-command 0 26 read(10) 22644410511422 3 fe80000000000000:0003ba0001004d31 task-command 0 30 - 22644410541494 3 fe80000000000000:0003ba0001004d31 scsi-command 0 30 read(10) 22644410621049 0 fe80000000000000:0003ba0001004d31 xfer-start 2048 26 READ 22644410720486 1 fe80000000000000:0003ba0001004d31 xfer-start 49152 30 READ 22644410681390 3 fe80000000000000:0003ba0001004d31 xfer-done 2048 26 READ 22644410694719 3 fe80000000000000:0003ba0001004d31 scsi-response 0 26 - 22644410703358 3 fe80000000000000:0003ba0001004d31 task-response 0 26 - 22644410895424 3 fe80000000000000:0003ba0001004d31 xfer-done 49152 30 READ 22644410901576 3 fe80000000000000:0003ba0001004d31 scsi-response 0 30 - 22644410905717 3 fe80000000000000:0003ba0001004d31 task-response 0 30 - 22727363721107 3 fe80000000000000:0003ba0001004d31 task-command 0 59 - 22727363919179 0 fe80000000000000:0003ba0001004d31 xfer-start 10240 59 WRITE 22727364095164 0 fe80000000000000:0003ba0001004d31 scsi-response 0 59 - 22727364105406 0 fe80000000000000:0003ba0001004d31 task-response 0 59 - 22727363812953 3 fe80000000000000:0003ba0001004d31 scsi-command 0 59 write(10) 22727363986185 3 fe80000000000000:0003ba0001004d31 xfer-done 10240 59 WRITE
The fields are:
|
he tcp provider provides probes for tracing the TCP protocol.
The tcp probes are described in the table below.
Table 11-40 tcp Probes
|
The send and receive probes trace packets on physical interfaces and also packets on loopback interfaces that are processed by tcp. On Solaris, loopback TCP connections can bypass the TCP layer when transferring data packets - this is a performance feature called tcp fusion; these packets are also traced by the tcp provider.
The argument types for the tcp probes are listed in the table below. The arguments are described in the following section. All probes expect state-change have 5 arguments - state-change has 6.
|
The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.
The pkt_addr member is currently always NULL.
typedef struct pktinfo { uintptr_t pkt_addr; /* currently always NULL */ } pktinfo_t;
The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.
typedef struct csinfo { uintptr_t cs_addr; uint64_t cs_cid; pid_t cs_pid; zoneid_t cs_zoneid; } csinfo_t;
|
The ipinfo_t structure contains common IP info for both IPv4 and IPv6.
typedef struct ipinfo { uint8_t ip_ver; /* IP version (4, 6) */ uint16_t ip_plength; /* payload length */ string ip_saddr; /* source address */ string ip_daddr; /* destination address */ } ipinfo_t;
These values are read at the time the probe fired in TCP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.
Table 11-41 ipinfo_t Members
|
The tcpsinfo_t structure contains tcp state info.
typedef struct tcpsinfo { uintptr tcps_addr; int tcps_local; /* is delivered locally, boolean */ int tcps_active; /* active open (from here), boolean */ uint16_t tcps_lport; /* local port */ uint16_t tcps_rport; /* remote port */ string tcps_laddr; /* local address, as a string */ string tcps_raddr; /* remote address, as a string */ int32_t tcps_state;/* TCP state. Use inline tcp_state_string[]to convert to string */ uint32_t tcps_iss; /* initial sequence # sent */ uint32_t tcps_suna; /* sequence # sent but unacked */ uint32_t tcps_snxt; /* next sequence # to send */ uint32_t tcps_rack; /* sequence # we have acked */ uint32_t tcps_rnxt; /* next sequence # expected */ uint32_t tcps_swnd; /* send window size */ uint32_t tcps_snd_ws; /* send window scaling */ uint32_t tcps_rwnd; /* receive window size */ uint32_t tcps_rcv_ws; /* receive window scaling */ uint32_t tcps_cwnd; /* congestion window */ uint32_t tcps_cwnd_ssthresh; /* threshold for congestion avoidance */ uint32_t tcps_sack_fack; /* SACK sequence # we have acked */ uint32_t tcps_sack_snxt; /* next SACK seq # for retransmission */ uint32_t tcps_rto; /* round-trip timeout, msec */ uint32_t tcps_mss; /* max segment size */ int tcps_retransmit; /* retransmit send event, boolean */ } tcpsinfo_t;
It may seem redundant to supply the local and remote ports and addresses here as well as in the tcpinfo_t below, but the tcp:::state-change probes do not have associated tcpinfo_t data, so in order to map the state change to a specific port, we need this data here.
Table 11-42 tcpsinfo_t Members
|
The tcplsinfo_t structure contains the previous tcp state during a state change.
typedef struct tcplsinfo { int32_t tcps_state; /* TCP state */ } tcplsinfo_t;
Table 11-43 tcplsinfo_t Members
|
The tcpinfo_t structure is a DTrace translated version of the TCP header.
typedef struct tcpinfo { uint16_t tcp_sport; /* source port */ uint16_t tcp_dport; /* destination port */ uint32_t tcp_seq; /* sequence number */ uint32_t tcp_ack; /* acknowledgment number */ uint8_t tcp_offset; /* data offset, in bytes */ uint8_t tcp_flags; /* flags */ uint16_t tcp_window; /* window size */ uint16_t tcp_checksum; /* checksum */ uint16_t tcp_urgent; /* urgent data pointer */ tcph_t *tcp_hdr; /* raw TCP header */ } tcpinfo_t;
Table 11-44 tcpinfo_t Members
|
Table 11-45 tcp_flags Values
|
See RFC-793 for a detailed explanation of the standard TCP header fields and flags.
Some simple examples of tcp provider usage follow.
This DTrace one-liner counts inbound TCP connections by source IP address:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 127.0.0.1 1 192.168.2.88 1 fe80::214:4fff:fe8d:59aa 1 192.168.1.109 3
The output above shows there were 3 TCP connections from 192.168.1.109, a single TCP connection from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.
This DTrace one-liner counts inbound TCP connections by local TCP port:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_lport] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 40648 1 22 3
The output above shows there were 3 TCP connections for port 22 (ssh), a single TCP connection for port 40648 (an RPC port).
Combining the previous two examples produces a useful one liner, to quickly identify who is connecting to what:
# dtrace -n 'tcp:::accept-established { @[args[3]->tcps_raddr, args[3]->tcps_lport] = count(); }' dtrace: description 'tcp:::state-change' matched 1 probes ^C 192.168.2.88 40648 1 fe80::214:4fff:fe8d:59aa 22 1 192.168.1.109 22 3
The output above shows there were 3 TCP connections from 192.168.1.109 to port 22 (ssh), etc.
It may be useful when troubleshooting connection issues to see who is failing to connect to their requested ports. This is equivalent to seeing where incoming SYNs arrive when no listener is present, as per RFC793:
# dtrace -n 'tcp:::accept-refused { @[args[2]->ip_daddr, args[4]->tcp_sport] = count(); }' dtrace: description 'tcp:::receive ' matched 1 probes ^C 192.168.1.109 23 2
Here we traced two failed attempts by host 192.168.1.109 to connect to port 23 (telnet).
This DTrace one-liner counts TCP received packets by host address:
# dtrace -n 'tcp:::receive { @[args[2]->ip_saddr] = count(); }' dtrace: description 'tcp:::receive ' matched 5 probes ^C 127.0.0.1 7 fe80::214:4fff:fe8d:59aa 14 192.168.2.30 43 192.168.1.109 44 192.168.2.88 3722
The output above shows that 7 TCP packets were received from 127.0.0.1, 14 TCP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.
This DTrace one-liner counts TCP received packets by the local TCP port:
# dtrace -n 'tcp:::receive { @[args[4]->tcp_dport] = count(); }' dtrace: description 'tcp:::receive ' matched 5 probes ^C 42303 3 42634 3 2049 27 40648 36 22 162
The output above shows that 162 packets were received for port 22 (ssh), 36 packets were received for port 40648 (an RPC port), 27 packets for 2049 (NFS), and a few packets to high numbered client ports.
This DTrace one-liner prints distribution plots of IP payload size by destination, for TCP sends:
# dtrace -n 'tcp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' dtrace: description 'tcp:::send ' matched 3 probes ^C 192.168.1.109 value ------------- Distribution ------------- count 32 | 0 64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 128 |@@@ 1 256 | 0 192.168.2.30 value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@@@@@ 7 64 |@@@@@@@@@ 3 128 |@@@ 1 256 |@@@@@@ 2 512 |@@@ 1 1024 | 0
This DTrace script demonstrates the capability to trace TCP state changes:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10 int last[int]; dtrace:::BEGIN { printf(" %3s %12s %-20s %-20s\n", "CPU", "DELTA(us)", "OLD", "NEW"); } tcp:::state-change / last[args[1]->cs_cid] / { this->elapsed = (timestamp - last[args[1]->cs_cid]) / 1000; printf(" %3d %12d %-20s -> %-20s\n", cpu, this->elapsed, tcp_state_string[args[5]->tcps_state], tcp_state_string[args[3]->tcps_state]); last[args[1]->cs_cid] = timestamp; } tcp:::state-change / last[args[1]->cs_cid] == 0 / { printf(" %3d %12s %-20s -> %-20s\n", cpu, "-", tcp_state_string[args[5]->tcps_state], tcp_state_string[args[3]->tcps_state]); last[args[1]->cs_cid] = timestamp;
This script was run on a system for a couple of minutes:
# ./tcpstate.d CPU DELTA(us) OLD NEW 0 - state-listen -> state-syn-received 0 613 state-syn-received -> state-established 0 - state-idle -> state-bound 0 63 state-bound -> state-syn-sent 0 685 state-syn-sent -> state-bound 0 22 state-bound -> state-idle 0 114 state-idle -> state-closed
In the above example output, an inbound connection is traced, It takes 613 us to go from syn-received to established. An outbound connection attempt is also made to a closed port. It takes 63us to go from bound to syn-sent, 685 us to go from syn-sent to bound etc.
The fields printed are:
|
The following DTrace script traces TCP packets and prints various details:
#!/usr/sbin/dtrace -s #pragma D option quiet #pragma D option switchrate=10hz dtrace:::BEGIN { printf(" %3s %15s:%-5s %15s:%-5s %6s %s\n", "CPU", "LADDR", "LPORT", "RADDR", "RPORT", "BYTES", "FLAGS"); } tcp:::send { this->length = args[2]->ip_plength - args[4]->tcp_offset; printf(" %3d %16s:%-5d -> %16s:%-5d %6d (", cpu, args[2]->ip_saddr, args[4]->tcp_sport, args[2]->ip_daddr, args[4]->tcp_dport, this->length); } tcp:::receive { this->length = args[2]->ip_plength - args[4]->tcp_offset; printf(" %3d %16s:%-5d <- %16s:%-5d %6d (", cpu, args[2]->ip_daddr, args[4]->tcp_dport, args[2]->ip_saddr, args[4]->tcp_sport, this->length); } tcp:::send, tcp:::receive { printf("%s", args[4]->tcp_flags & TH_FIN ? "FIN|" : ""); printf("%s", args[4]->tcp_flags & TH_SYN ? "SYN|" : ""); printf("%s", args[4]->tcp_flags & TH_RST ? "RST|" : ""); printf("%s", args[4]->tcp_flags & TH_PUSH ? "PUSH|" : ""); printf("%s", args[4]->tcp_flags & TH_ACK ? "ACK|" : ""); printf("%s", args[4]->tcp_flags & TH_URG ? "URG|" : ""); printf("%s", args[4]->tcp_flags & TH_ECE ? "ECE|" : ""); printf("%s", args[4]->tcp_flags & TH_CWR ? "CWR|" : ""); printf("%s", args[4]->tcp_flags == 0 ? "null " : ""); printf("\b)\n"); }
This example output has captured a TCP handshake:
# ./tcpio.d CPU LADDR:LPORT RADDR:RPORT BYTES FLAGS 1 192.168.2.80:22 -> 192.168.1.109:60337 464 (PUSH|ACK) 1 192.168.2.80:22 -> 192.168.1.109:60337 48 (PUSH|ACK) 2 192.168.2.80:22 -> 192.168.1.109:60337 20 (PUSH|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (SYN) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (SYN|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 20 (PUSH|ACK) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 376 (PUSH|ACK) 3 192.168.2.80:22 -> 192.168.1.109:60337 0 (ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 24 (PUSH|ACK) 2 192.168.2.80:22 -> 192.168.1.109:60337 736 (PUSH|ACK) 3 192.168.2.80:22 <- 192.168.1.109:60337 0 (ACK)
The fields printed are:
|
Note - The output may be shuffled slightly on multi-CPU servers due to DTrace per-CPU buffering, and events such as the TCP handshake can be printed out of order. Keep an eye on changes in the CPU column, or add a timestamp column to this script and post sort.
The tcp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.
|
The udp provider provides probes for tracing the UDP protocol.
The udp probes are described in the table below.
Table 11-46 udp Probes
|
The send and receive probes trace datagrams on physical interfaces and also packets on loopback interfaces that are processed by udp.
The argument types for the udp probes are listed in the table below. The arguments are described in the following section.
Table 11-47 udp Probe Arguments
|
The pktinfo_t structure is where packet ID info can be made available for deeper analysis if packet IDs become supported by the kernel in the future.
The pkt_addr member is currently always NULL.
typedef struct pktinfo { uintptr_t pkt_addr; /* currently always NULL */ } pktinfo_t;
The csinfo_t structure is where connection state info is made available. It contains a unique (system-wide) connection ID, and the process ID and zone ID associated with the connection.
typedef struct csinfo { uintptr_t cs_addr; uint64_t cs_cid; pid_t cs_pid; zoneid_t cs_zoneid; } csinfo_t;
Table 11-48 csinfo_t Members
|
The ipinfo_t structure contains common IP info for both IPv4 and IPv6.
typedef struct ipinfo { uint8_t ip_ver; /* IP version (4, 6) */ uint16_t ip_plength; /* payload length */ string ip_saddr; /* source address */ string ip_daddr; /* destination address */ } ipinfo_t;
These values are read at the time the probe fired in UDP, and so ip_plength is the expected IP payload length - however the IP layer may add headers (such as AH and ESP) which will increase the actual payload length. To examine this, also trace packets using the ip provider.
Table 11-49 ipinfo_t Members
|
The udpsinfo_t structure contains udp state info.
typedef struct udpsinfo { uintptr_t udps_addr; uint16_t upds_lport; /* local port */ uint16_t udps_fport; /* remote port */ string udps_laddr; /* local address, as a string */ string udps_faddr; /* remote address, as a string */ } udpsinfo_t;
Table 11-50 udpsinfo_t Members
|
The udpinfo_t structure is a DTrace translated version of the UDP header.
typedef struct udpinfo { uint16_t udp_sport; /* source port */ uint16_t udp_dport; /* destination port */ uint16_t udp_length; /* total length */ uint16_t udp_checksum; /* headers + data checksum */ udpha_t *udp_hdr; /* raw UDP header */ } udpinfo_t;
Table 11-51 udpinfo_t Members
|
See RFC-768 for a detailed explanation of the standard UDP header fields and flags.
Some simple examples of udp provider usage follow.
This DTrace one-liner counts UDP received packets by host address:
# dtrace -n 'udp:::receive { @[args[2]->ip_saddr] = count(); }' dtrace: description 'udp:::receive ' matched 5 probes ^C 127.0.0.1 7 fe80::214:4fff:fe8d:59aa 14 192.168.2.30 43 192.168.1.109 44 192.168.2.88 3722
The output above shows that 7 UDP packets were received from 127.0.0.1, 14 UDP packets from the IPv6 host fe80::214:4fff:fe8d:59aa, etc.
This DTrace one-liner counts UDP received packets by the local UDP port:
# dtrace -n 'udp:::receive { @[args[4]->udp_dport] = count(); }' dtrace: description 'udp:::receive ' matched 1 probe ^C 33294 1 33822 1 38961 1 44433 1 46258 1 46317 1 47511 1 50581 1 54685 1 56491 1 59056 1 62171 1 62769 1 64231 1
The output above shows that 1 packet was received for port 33294, 1 packet was received for port 33822, etc.
This DTrace one-liner prints distribution plots of IP payload size by destination, for UDP sends:
# dtrace -n 'udp:::send { @[args[2]->ip_daddr] = quantize(args[2]->ip_plength); }' dtrace: description 'udp:::send ' matched 6 probes ^C 129.156.86.11 value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14 64 | 0
The udp provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 18, Stability.
|