JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Writing Device Drivers     Oracle Solaris 11.1 Information Library
search filter icon
search icon

Document Information

Preface

Part I Designing Device Drivers for the Oracle Solaris Platform

1.  Overview of Oracle Solaris Device Drivers

2.  Oracle Solaris Kernel and Device Tree

3.  Multithreading

4.  Properties

5.  Managing Events and Queueing Tasks

6.  Driver Autoconfiguration

7.  Device Access: Programmed I/O

8.  Interrupt Handlers

9.  Direct Memory Access (DMA)

10.  Mapping Device and Kernel Memory

11.  Device Context Management

12.  Power Management

13.  Hardening Oracle Solaris Drivers

14.  Layered Driver Interface (LDI)

Part II Designing Specific Kinds of Device Drivers

15.  Drivers for Character Devices

16.  Drivers for Block Devices

17.  SCSI Target Drivers

18.  SCSI Host Bus Adapter Drivers

19.  Drivers for Network Devices

20.  USB Drivers

21.  SR-IOV Drivers

Part III Building a Device Driver

22.  Compiling, Loading, Packaging, and Testing Drivers

23.  Debugging, Testing, and Tuning Device Drivers

Testing Drivers

Enable the Deadman Feature to Avoid a Hard Hang

Testing With a Serial Connection

To Set Up the Host System for a tip Connection

Setting Up a Target System on the SPARC Platform

Setting Up a Target System on the x86 Platform

Setting Up Test Modules

Setting Kernel Variables

Loading and Unloading Test Modules

Setting kmem_flags Debugging Flags

Avoiding Data Loss on a Test System

Using an Alternate Boot Environment

Booting With an Alternate Kernel

Consider Alternative Back-Up Plans

Capture System Crash Dumps

Recovering the Device Directory

Debugging Tools

Postmortem Debugging

Using the kmdb Kernel Debugger

Booting kmdb With an Alternate Kernel on the SPARC Platform

Booting kmdb With an Alternate Kernel on the x86 Platform

Setting Breakpoints in kmdb

kmdb Macros for Driver Developers

Using the mdb Modular Debugger

Getting Started With the Modular Debugger

Useful Debugging Tasks With kmdb and mdb

Exploring System Registers With kmdb

Detecting Kernel Memory Leaks

Writing Debugger Commands With mdb

Obtaining Kernel Data Structure Information

Obtaining Device Tree Information

Retrieving Driver Soft State Information

Modifying Kernel Variables

Tuning Drivers

Kernel Statistics

Kernel Statistics Structure Members

Kernel Statistics Structures

Kernel Statistics Functions

Kernel Statistics for Oracle Solaris Ethernet Drivers

DTrace for Dynamic Instrumentation

24.  Recommended Coding Practices

Part IV Appendixes

A.  Hardware Overview

B.  Summary of Oracle Solaris DDI/DKI Services

C.  Making a Device Driver 64-Bit Ready

D.  Console Frame Buffer Drivers

E.  pci.conf File

Index

Testing Drivers

To avoid data loss and other problems, you should take special care when testing a new device driver. This section discusses various testing strategies. For example, setting up a separate system that you control through a serial connection is the safest way to test a new driver. You can load test modules with various kernel variable settings to test performance under different kernel conditions. Should your system crash, you should be prepared to restore backup data, analyze any crash dumps, and rebuild the device directory.

Enable the Deadman Feature to Avoid a Hard Hang

If your system is in a hard hang, then you cannot break into the debugger. If you enable the deadman feature, the system panics instead of hanging indefinitely. You can then use the kmdb(1) kernel debugger to analyze your problem.

The deadman feature checks every second whether the system clock is updating. If the system clock is not updating, then you are in an indefinite hang. If the system clock has not been updated for 50 seconds, the deadman feature induces a panic and puts you in the debugger.

Take the following steps to enable the deadman feature:

  1. Make sure you are capturing crash images with dumpadm(1M).

  2. Set the snooping variable in the /etc/system file. See the system(4) man page for information on the /etc/system file.

    set snooping=1
  3. Reboot the system so that the /etc/system file is read again and the snooping setting takes effect.

Note that any zones on your system inherit the deadman setting as well.

If your system hangs while the deadman feature is enabled, you should see output similar to the following example on your console:

panic[cpu1]/thread=30018dd6cc0: deadman: timed out after 9 seconds of
clock inactivity

panic: entering debugger (continue to save dump)

Inside the debugger, use the ::cpuinfo command to investigate why the clock interrupt was not able to fire and advance the system time.

Testing With a Serial Connection

Using a serial connection is a good way to test drivers. Use the tip(1) command to make a serial connection between a host system and a test system. With this approach, the tip window on the host console is used as the console of the test machine. See the tip(1) man page for additional information.

A tip window has the following advantages:


Note - Although using a tip connection and a second machine are not required to debug an Oracle Solaris device driver, this technique is still recommended.


To Set Up the Host System for a tip Connection

  1. Connect the host system to the test machine using serial port A on both machines.

    This connection must be made with a null modem cable.

  2. On the host system, make sure there is an entry in /etc/remote for the connection. See the remote(4) man page for details.

    The terminal entry must match the serial port that is used. The operating system comes with the correct entry for serial port B, but a terminal entry must be added for serial port A:

    debug:\
            :dv=/dev/term/a:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D:

    Note - The baud rate must be set to 9600.


  3. In a shell window on the host, run tip(1) and specify the name of the entry:
    % tip debug
    connected

    The shell window is now a tip window with a connection to the console of the test machine.


    Caution

    Caution - Do not use STOP-A for SPARC machines or F1-A for x86 architecture machines on the host machine to stop the test machine. This action actually stops the host machine. To send a break to the test machine, type ~# in the tip window. Commands such as ~# are recognized only if these characters on first on the line. If the command has no effect, press either the Return key or Control-U.


Setting Up a Target System on the SPARC Platform

A quick way to set up the test machine on the SPARC platform is to unplug the keyboard before turning on the machine. The machine then automatically uses serial port A as the console.

Another way to set up the test machine is to use boot PROM commands to make serial port A the console. On the test machine, at the boot PROM ok prompt, direct console I/O to the serial line. To make the test machine always come up with serial port A as the console, set the environment variables: input-device and output-device.

Example 23-1 Setting input-device and output-device With Boot PROM Commands

ok setenv input-device ttya
ok setenv output-device ttya

The eeprom command can also be used to make serial port A the console. As superuser, execute the following commands to make the input-device and output-device parameters point to serial port A. The following example demonstrates the eeprom command.

Example 23-2 Setting input-device and output-device With the eeprom Command

# eeprom input-device=ttya
# eeprom output-device=ttya

The eeprom commands cause the console to be redirected to serial port A at each subsequent system boot.

Setting Up a Target System on the x86 Platform

On x86 platforms, use the eeprom command to make serial port A the console. This procedure is the same as the SPARC platform procedure. See Setting Up a Target System on the SPARC Platform. The eeprom command causes the console to switch to serial port A (COM1) during reboot.


Note - x86 machines do not transfer console control to the tip connection until an early stage in the boot process unless the BIOS supports console redirection to a serial port. In SPARC machines, the tip connection maintains console control throughout the boot process.


Setting Up Test Modules

The system(4) file in the /etc directory enables you to set the value of kernel variables at boot time. With kernel variables, you can toggle different behaviors in a driver and take advantage of debugging features that are provided by the kernel. The kernel variables moddebug and kmem_flags, which can be very useful in debugging, are discussed later in this section. See also Enable the Deadman Feature to Avoid a Hard Hang.

Changes to kernel variables after boot are unreliable, because /etc/system is read only once when the kernel boots. After this file is modified, the system must be rebooted for the changes to take effect. If a change in the file causes the system not to work, boot with the ask (-a) option. Then specify /dev/null as the system file.


Note - Kernel variables cannot be relied on to be present in subsequent releases.


Setting Kernel Variables

The set command changes the value of module or kernel variables. To set module variables, specify the module name and the variable:

set module_name:variable=value

For example, to set the variable test_debug in a driver that is named myTest, use set as follows:

% set myTest:test_debug=1

To set a variable that is exported by the kernel itself, omit the module name.

You can also use a bitwise OR operation to set a value, for example:

% set moddebug | 0x80000000

Loading and Unloading Test Modules

The commands modload(1M), modunload(1M), and modinfo(1M) can be used to add test modules, which is a useful technique for debugging and stress-testing drivers. These commands are generally not needed in normal operation, because the kernel automatically loads needed modules and unloads unused modules. The moddebug kernel variable works with these commands to provide information and set controls.

Using the modload() Function

Use modload(1M) to force a module into memory. The modload command verifies that the driver has no unresolved references when that driver is loaded. Loading a driver does not necessarily mean that the driver can attach. When a driver loads successfully, the driver's _info(9E) entry point is called. The attach() entry point is not necessarily called.

Using the modinfo() Function

Use modinfo(1M) to confirm that the driver is loaded.

Example 23-3 Using modinfo to Confirm a Loaded Driver

$ modinfo
 Id Loadaddr   Size Info Rev Module Name
  6 101b6000    732   -   1  obpsym (OBP symbol callbacks)
  7 101b65bd  1acd0 226   1  rpcmod (RPC syscall)
  7 101b65bd  1acd0 226   1  rpcmod (32-bit RPC syscall)
  7 101b65bd  1acd0   1   1  rpcmod (rpc interface str mod)
  8 101ce8dd  74600   0   1  ip (IP STREAMS module)
  8 101ce8dd  74600   3   1  ip (IP STREAMS device)
...
$ modinfo | grep mydriver
169 781a8d78   13fb   0   1  mydriver (Test Driver 1.5)

The number in the info field is the major number that has been chosen for the driver. The modunload(1M) command can be used to unload a module if the module ID is provided. The module ID is found in the left column of modinfo output.

Sometimes a driver does not unload as expected after a modunload is issued, because the driver is determined to be busy. This situation occurs when the driver fails detach(9E), either because the driver really is busy, or because the detach entry point is implemented incorrectly.

Using modunload()

To remove all of the currently unused modules from memory, run modunload(1M) with a module ID of 0:

# modunload -i 0
Setting the moddebug Kernel Variable

The moddebug kernel variable controls the module loading process. The possible values of moddebug are:

0x80000000

Prints messages to the console when loading or unloading modules.

0x40000000

Gives more detailed error messages.

0x20000000

Prints more detail when loading or unloading, such as including the address and size.

0x00001000

No auto-unloading drivers. The system does not attempt to unload the device driver when the system resources become low.

0x00000080

No auto-unloading streams. The system does not attempt to unload the STREAMS module when the system resources become low.

0x00000010

No auto-unloading of kernel modules of any type.

0x00000001

If running with kmdb, moddebug causes a breakpoint to be executed and a return to kmdb immediately before each module's _init() routine is called. This setting also generates additional debug messages when the module's _info() and _fini() routines are executed.

Setting kmem_flags Debugging Flags

The kmem_flags kernel variable enables debugging features in the kernel's memory allocator. Set kmem_flags to 0xf to enable the allocator's debugging features. These features include runtime checks to find the following code conditions:

The Oracle Solaris Modular Debugger Guide describes how to use the kernel memory allocator to analyze such problems.


Note - Testing and developing with kmem_flags set to 0xf can help detect latent memory corruption bugs. Because setting kmem_flags to 0xf changes the internal behavior of the kernel memory allocator, you should thoroughly test without kmem_flags as well.


Avoiding Data Loss on a Test System

A driver bug can sometimes render a system incapable of booting. By taking precautions, you can avoid system reinstallation in this event, as described in this section.

Using an Alternate Boot Environment

A number of driver-related system files are difficult, if not impossible, to reconstruct. Files such as /etc/name_to_major, /etc/driver_aliases, /etc/driver_classes, and /etc/minor_perm can be corrupted if the driver crashes the system during installation. See the add_drv(1M) man page.

To be safe, use the beadm(1M) command to make a backup copy of the root file system after the test machine is in the proper configuration. If you plan to modify the /etc/system file, make a backup copy of the file before making modifications.

Booting With an Alternate Kernel

See the Chapter 4, Administering Boot Environments, in Creating and Administering Oracle Solaris 11.1 Boot Environments and Booting From a ZFS Boot Environment (Task Map) in Oracle Solaris Administration: Common Tasks for detailed information.

Consider Alternative Back-Up Plans

If the system is attached to a network, the test machine can be added as a client of a server. If a problem occurs, the system can be booted from the network. The local disks can then be mounted, and any fixes can be made. Alternatively, the system can be booted directly from the Oracle Solaris system CD-ROM.

Another way to recover from disaster is to have another bootable root file system. Use format(1M) to make a partition that is the exact size of the original. Then use dd(1M) to copy the bootable root file system. After making a copy, run fsck(1M) on the new file system to ensure its integrity.

Subsequently, if the system cannot boot from the original root partition, boot the backup partition. Use dd(1M) to copy the backup partition onto the original partition. You might have a situation where the system cannot boot even though the root file system is undamaged. For example, the damage might be limited to the boot block or the boot program. In such a case, you can boot from the backup partition with the ask (-a) option. You can then specify the original file system as the root file system.

Capture System Crash Dumps

When a system panics, the system writes an image of kernel memory to the dump device. The dump device is by default the most suitable swap device. The dump is a system crash dump, similar to core dumps generated by applications. On rebooting after a panic, savecore(1M) checks the dump device for a crash dump. If a dump is found, savecore makes a copy of the kernel's symbol table, which is called unix.n. The savecore utility then dumps a core file that is called vmcore.n in the core image directory. By default, the core image directory is /var/crash/machine_name. If /var/crash has insufficient space for a core dump, the system displays the needed space but does not actually save the dump. The mdb(1) debugger can then be used on the core dump and the saved kernel.

In the Oracle Solaris operating system, crash dump is enabled by default. The dumpadm(1M) command is used to configure system crash dumps. Use the dumpadm command to verify that crash dumps are enabled and to determine the location of core files that have been saved.


Note - You can prevent the savecore utility from filling the file system. Add a file that is named minfree to the directory in which the dumps are to be saved. In this file, specify the number of kilobytes to remain free after savecore has run. If insufficient space is available, the core file is not saved.


Recovering the Device Directory

Damage to the /devices and /dev directories can occur if the driver crashes during attach(9E). If either directory is damaged, you can rebuild the directory by booting the system and running fsck(1M) to repair the damaged root file system. The root file system can then be mounted. Recreate the /devices and /dev directories by running devfsadm(1M) and specifying the /devices directory on the mounted disk.

The following example shows how to repair a damaged root file system on a SPARC system. In this example, the damaged disk is /dev/dsk/c0t3d0s0, and an alternate boot disk is /dev/dsk/c0t1d0s0.

Example 23-4 Recovering a Damaged Device Directory

ok boot disk1
...
Rebooting with command: boot kernel.test/sparcv9/unix
Boot device: /sbus@1f,0/espdma@e,8400000/esp@e,8800000/sd@31,0:a File and \
    args:
kernel.test/sparcv9/unix
...
# fsck /dev/dsk/c0t3d0s0** /dev/dsk/c0t3d0s0
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1478 files, 9922 used, 29261 free
     (141 frags, 3640 blocks, 0.4% fragmentation)
# mount /dev/dsk/c0t3d0s0 /mnt
# devfsadm -r /mnt

Note - A fix to the /devices and /dev directories can allow the system to boot while other parts of the system are still corrupted. Such repairs are only a temporary fix to save information, such as system crash dumps, before reinstalling the system.