JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Writing Device Drivers     Oracle Solaris 11.1 Information Library
search filter icon
search icon

Document Information

Preface

Part I Designing Device Drivers for the Oracle Solaris Platform

1.  Overview of Oracle Solaris Device Drivers

2.  Oracle Solaris Kernel and Device Tree

3.  Multithreading

4.  Properties

5.  Managing Events and Queueing Tasks

6.  Driver Autoconfiguration

7.  Device Access: Programmed I/O

8.  Interrupt Handlers

9.  Direct Memory Access (DMA)

10.  Mapping Device and Kernel Memory

11.  Device Context Management

12.  Power Management

13.  Hardening Oracle Solaris Drivers

14.  Layered Driver Interface (LDI)

Part II Designing Specific Kinds of Device Drivers

15.  Drivers for Character Devices

16.  Drivers for Block Devices

17.  SCSI Target Drivers

18.  SCSI Host Bus Adapter Drivers

19.  Drivers for Network Devices

GLDv3 Network Device Driver Framework

GLDv3 MAC Registration

GLDv3 MAC Registration Process

GLDv3 MAC Registration Functions

GLDv3 MAC Registration Data Structures

GLDv3 Capabilities

MAC Rings Capability

Hardware Checksum Offload

Large Segment (or Send) Offload

GLDv3 Data Paths

Transmit Data Path

Receive Data Path

GLDv3 State Change Notifications

GLDv3 Network Statistics

GLDv3 Properties

Summary of GLDv3 Interfaces

GLDv2 Network Device Driver Framework

GLDv2 Device Support

Ethernet V2 and ISO 8802-3 (IEEE 802.3)

TPR and FDDI: SNAP Processing

TPR: Source Routing

GLDv2 DLPI Providers

GLDv2 DLPI Primitives

GLDv2 I/O Control Functions

GLDv2 Driver Requirements

GLDv2 Network Statistics

GLDv2 Declarations and Data Structures

gld_mac_info Structure

gld_stats Structure

GLDv2 Function Arguments

GLDv2 Entry Points

gldm_reset() Entry Point

gldm_start() Entry Point

gldm_stop() Entry Point

gldm_set_mac_addr() Entry Point

gldm_set_multicast() Entry Point

gldm_set_promiscuous() Entry Point

gldm_send() Entry Point

gldm_intr() Entry Point

gldm_get_stats() Entry Point

gldm_ioctl() Entry Point

GLDv2 Return Values

GLDv2 Service Routines

gld_mac_alloc() Function

gld_mac_free() Function

gld_register() Function

gld_unregister() Function

gld_recv() Function

gld_sched() Function

gld_intr() Function

20.  USB Drivers

21.  SR-IOV Drivers

Part III Building a Device Driver

22.  Compiling, Loading, Packaging, and Testing Drivers

23.  Debugging, Testing, and Tuning Device Drivers

24.  Recommended Coding Practices

Part IV Appendixes

A.  Hardware Overview

B.  Summary of Oracle Solaris DDI/DKI Services

C.  Making a Device Driver 64-Bit Ready

D.  Console Frame Buffer Drivers

E.  pci.conf File

Index

GLDv3 Network Device Driver Framework

The GLDv3 framework is a function calls-based interface of MAC plugins and MAC driver service routines and structures. The GLDv3 framework implements the necessary STREAMS entry points on behalf of GLDv3 compliant drivers and handles DLPI compatibility.

This section discusses the following topics:

GLDv3 MAC Registration

GLDv3 defines a driver API for drivers that register with a plugin type of MAC_PLUGIN_IDENT_ETHER.

GLDv3 MAC Registration Process

A GLDv3 device driver must perform the following steps to register with the MAC layer:

GLDv3 MAC Registration Functions

The GLDv3 interface includes driver entry points that are advertised during registration with the MAC layer and MAC entry points that are invoked by drivers.

The mac_init_ops() and mac_fini_ops() Functions
void mac_init_ops(struct dev_ops *ops, const char *name);

A GLDv3 device driver must invoke the mac_init_ops(9F) function in its _init(9E) entry point before calling mod_install(9F).

void mac_fini_ops(struct dev_ops *ops);

A GLDv3 device driver must invoke the mac_fini_ops(9F) function in its _fini(9E) entry point after calling mod_remove(9F).

Example 19-1 The mac_init_ops() and mac_fini_ops() Functions

int
_init(void)
{
        int     rv;
        mac_init_ops(&xx_devops, "xx");
        if ((rv = mod_install(&xx_modlinkage)) != DDI_SUCCESS) {
                mac_fini_ops(&xx_devops);
        }
        return (rv);
}

int
_fini(void)
{
        int     rv;
        if ((rv = mod_remove(&xx_modlinkage)) == DDI_SUCCESS) {
                mac_fini_ops(&xx_devops);
        }
        return (rv);
}
The mac_alloc() and mac_free() Functions
mac_register_t *mac_alloc(uint_t version);

The mac_alloc(9F) function allocates a new mac_register structure and returns a pointer to it. Initialize the structure members before you pass the new structure to mac_register(). MAC-private elements are initialized by the MAC layer before mac_alloc() returns. The value of version must be MAC_VERSION_V1.

void mac_free(mac_register_t *mregp);

The mac_free(9F) function frees a mac_register structure that was previously allocated by mac_alloc().

The mac_register() and mac_unregister() Functions
int mac_register(mac_register_t *mregp, mac_handle_t *mhp);

To register a new instance with the MAC layer, a GLDv3 driver must invoke the mac_register(9F) function in its attach(9E) entry point. The mregp argument is a pointer to a mac_register registration information structure. On success, the mhp argument is a pointer to a MAC handle for the new MAC instance. This handle is needed by other routines such as mac_tx_update(), mac_link_update(), and mac_rx().

Example 19-2 The mac_alloc(), mac_register(), and mac_free() Functions and mac_register Structure

int
xx_attach(dev_info_t *dip, ddi_attach_cmd_t cmd)
{
        mac_register_t        *macp;

/* ... */

        if ((macp = mac_alloc(MAC_VERSION)) == NULL) {
                xx_error(dip, "mac_alloc failed");
                goto failed;
        }

        macp->m_type_ident = MAC_PLUGIN_IDENT_ETHER;
        macp->m_driver = xxp;
        macp->m_dip = dip;
        macp->m_src_addr = xxp->xx_curraddr;
        macp->m_callbacks = &xx_m_callbacks;
        macp->m_min_sdu = 0;
        macp->m_max_sdu = ETHERMTU;
        macp->m_margin = VLAN_TAGSZ;

        if (mac_register(macp, &xxp->xx_mh) == DDI_SUCCESS) {
                mac_free(macp);
                return (DDI_SUCCESS);
        }

/* failed to register with MAC */
        mac_free(macp);
failed:
        /* ... */
}
int mac_unregister(mac_handle_t mh);

The mac_unregister(9F) function unregisters a MAC instance that was previously registered with mac_register(). The mh argument is the MAC handle that was allocated by mac_register(). Invoke mac_unregister() from the detach(9E) entry point.

Example 19-3 The mac_unregister() Function

int
xx_detach(dev_info_t *dip, ddi_detach_cmd_t cmd)
{
        xx_t        *xxp; /* driver soft state */

        /* ... */

        switch (cmd) {
        case DDI_DETACH:

                if (mac_unregister(xxp->xx_mh) != 0) {
                        return (DDI_FAILURE);
                }
        /* ... */
}

GLDv3 MAC Registration Data Structures

The structures described in this section are defined in the sys/mac_provider.h header file. Include the sys/mac_ether.h, and sys/mac_provider.h MAC header files in your GLDv3 driver. Do not include any other MAC-related header file.

The mac_register(9S) data structure is the MAC registration information structure that is allocated by mac_alloc() and passed to mac_register(). Initialize the structure members before you pass the new structure to mac_register(). MAC-private elements are initialized by the MAC layer before mac_alloc() returns. The m_version structure member is the MAC version. Do not modify the MAC version. The m_type_ident structure member is the MAC type identifier. Set the MAC type identifier to MAC_PLUGIN_IDENT_ETHER. The m_callbacks member of the mac_register structure is a pointer to an instance of the mac_callbacks structure.

The mac_callbacks(9S) data structure is the structure that your device driver uses to expose its entry points to the MAC layer. These entry points are used by the MAC layer to control the driver. These entry points are used to do tasks such as start and stop the adapters, manage multicast addresses, set promiscuous mode, query the capabilities of the adapter, and get and set properties. See Table 19-1 for a complete list of required and optional GLDv3 entry points. Provide a pointer to your mac_callbacks structure in the m_callbacks field of the mac_register structure.

The mc_callbacks member of the mac_callbacks structure is a bit mask that is a combination of the following flags that specify which of the optional entry points are implemented by the driver. Other members of the mac_callbacks structure are pointers to each of the entry points of the driver.

MC_IOCTL

The mc_ioctl() entry point is present.

MC_GETCAPAB

The mc_getcapab() entry point is present.

MC_SETPROP

The mc_setprop() entry point is present.

MC_GETPROP

The mc_getprop() entry point is present.

MC_PROPINFO

The mc_propinfo() entry point is present.

MC_PROPERTIES

All properties entry points are present. Setting MC_PROPERTIES is equivalent to setting all three flags: MC_SETPROP, MC_GETPROP, and MC_PROPINFO.

Example 19-4 The mac_callbacks Structure

#define XX_M_CALLBACK_FLAGS \
    (MC_IOCTL | MC_GETCAPAB | MC_PROPERTIES)

static mac_callbacks_t xx_m_callbacks = {
        XX_M_CALLBACK_FLAGS,
        xx_m_getstat,     /* mc_getstat() */
        xx_m_start,       /* mc_start() */
        xx_m_stop,        /* mc_stop() */
        xx_m_promisc,     /* mc_setpromisc() */
        xx_m_multicst,    /* mc_multicst() */
        xx_m_unicst,      /* mc_unicst() */
        xx_m_tx,          /* mc_tx() */
        NULL,             /* Reserved, do not use */
        xx_m_ioctl,       /* mc_ioctl() */
        xx_m_getcapab,    /* mc_getcapab() */
        NULL,             /* Reserved, do not use */
        NULL,             /* Reserved, do not use */
        xx_m_setprop,     /* mc_setprop() */
        xx_m_getprop,     /* mc_getprop() */
        xx_m_propinfo     /* mc_propinfo() */
};

GLDv3 Capabilities

GLDv3 implements a capability mechanism that allows the framework to query and enable capabilities that are supported by the GLDv3 driver. Use the mc_getcapab(9E)entry point to report capabilities. If a capability is supported by the driver, pass information about that capability, such as capability-specific entry points or flags through mc_getcapab(). Pass a pointer to the mc_getcapab() entry point in the mac_callback structure. See GLDv3 MAC Registration Data Structures for more information about the mac_callbacks structure.

boolean_t mc_getcapab(void *driver_handle, mac_capab_t cap, void *cap_data);

The cap argument specifies the type of capability being queried. The value of cap can be MAC_CAPAB_HCKSUM (hardware checksum offload), MAC_CAPAB_LSO (large segment offload) or MAC_CAPAB_RINGS. Use the cap_data argument to return the capability data to the framework.

If the driver supports the cap capability, the mc_getcapab() entry point must return B_TRUE. If the driver does not support the cap capability, mc_getcapab() must return B_FALSE.

Example 19-5 The mc_getcapab() Entry Point

static boolean_t
xx_m_getcapab(void *arg, mac_capab_t cap, void *cap_data)
{
        switch (cap) {
        case MAC_CAPAB_HCKSUM: {
                uint32_t *txflags = cap_data;
                *txflags = HCKSUM_INET_FULL_V4 | HCKSUM_IPHDRCKSUM;
                break;
        }
        case MAC_CAPAB_LSO: {
                /* ... */
                break;
        }
             case MAC_CAPAB_RINGS: {
                /* ... */
                break;
        }
        default:
                return (B_FALSE);
        }
        return (B_TRUE);
}

MAC Rings Capability

The following sections describe the supported capabilities and the corresponding capability data to return.

Rings and Ring Groups Layer–2 Classification

Both transmit and receive hardware rings are DMA channels and can be exposed by device drivers. Rings are associated with ring groups. Receive ring groups are associated with one or more MAC addresses, and all network traffic matching any of the MAC addresses associated with a receive group must be delivered by the NIC through one of the rings of that group. The steering of traffic to the receive ring groups is enabled in hardware through layer-2 classification.

The mapping of receive rings to ring groups can be either dynamic or static. With dynamic ring groups, rings can be moved between the groups, as requested by the framework, thereby dynamically shrinking or growing the size of the groups. However with static ring groups, the rings are statically assigned to the groups and this assignment cannot change.

If a receive group contains more than one ring, the NIC must spread traffic through these rings using a hashing mechanism such as RSS (Receive Side Scaling) allowing multiple connections to be assigned different ring.

Exactly one of the receive groups must be designated as the default group (usually the first group at index 0). The following properties are associated with this receive group :

The following points are noteworthy with regards to the hardware implementation of receive rings and receive ring groups:

Registering Rings and Groups Process Overview

Registering rings with the framework involves a process consisting of various calls from the framework to the driver. The following steps describe the registration process :

  1. The framework queries the MAC_CAPAB_RINGS capability of the driver by calling the driver. One call is made for the transmit rings and one call for the receive rings. See MAC_CAPAB_RINGS Capability for more information.

  2. The framework uses the mr_rget(9E) and mr_gget(9E) entry points which are obtained from the previous step, to retrieve information about a specific ring or ring group. See the mr_rget(9E) and mr_gget(9E) man pages for more information.

  3. When the framework wants to use a ring, it starts the ring group with the mgi_start(9E) entry point, and then starts the ring using the mri_start(9E) entry point as advertised in the previous step.

    Traffic can now flow through the rings until they are stopped through the mgi_stop(9E) and mri_stop(9E) entry points.

MAC_CAPAB_RINGS Capability

To obtain information about support for hardware transmit and receive rings, the framework sends MAC_CAPAB_RINGS in the cap argument and expects the information back in the cap_data field, which points to the mac_capab_rings structure.

The framework allocates the mac_capab_rings(9S) structure and sets the mr_type member to MAC_RING_TYPE_RX for receive rings, or MAC_RING_TYPE_TX for transmit rings. The remaining members of the structure mac_capab_rings is then filled by the drivers.

The following fields are defined in the mac_capab_rings structure:

mr_version

Must be set to MAC_RINGS_VERSION_1.

mr_rnum

Number of rings.

mr_gnum

Number of groups.

mr_group_type

The following values are defined:

  • MAC_GROUP_TYPE_DYNAMIC – The group is dynamic.

  • MAC_GROUP_TYPE_STATIC – The group is static.

See Rings and Ring Groups Layer-2 Classification for more information.

mr_gget()

Driver entry point to get more information about ring groups. See mr_gget() Entry Point for more information.

mr_rget()

Driver entry point to get more information about ring. See mr_rget() Entry Point for more information.

mr_gaddring()

Driver entry point to add a ring to a group. See mr_gaddring(9E).

mr_gremring()

Driver entry point to remove a ring from a group. See mr_gremring(9E).

mr_gget() Entry Point

The mr_gget(9E) entry point is invoked by the framework for each valid group indices corresponding to the number of groups which is indicated by mr_gnum parameter. See mr_gget(9E) for more information. After the call to mr_gget(), the group information is returned in the mac_group_info structure by the driver. The structure itself is pre-allocated by the framework and is filled in by the driver.

The following fields are defined in the mac_group_infostructure:

mgi_driver

An opaque driver group handle which is used by the framework in future calls to group entry points.

mgi_count

Number of rings in the group.

mgi_flags

Group flags MAC_GROUP_DEFAULT identifies the group to be a default group. See Rings and Ring Groups Layer-2 Classification for more information.

mgi_start

Group start entry point.

mgi_stop

Group stop entry point.

mgi_addmac

Add unicast MAC address entry point.

mgi_remmac

Remove unicast MAC address entry point.

mgi_addvlan

Entry point to add hardware VLAN filtering, tagging, and stripping of VLAN tags.

mgi_remvlan

Entry point to remove hardware VLAN filtering, tagging, and stripping of VLAN tags.

mgi_setmtu

Set RX group MTU entry point

mgi_getsriov_info

Entry point to retrieve SR-IOV information for the group. See Ring Groups and SR-IOV for more information.

See mac_group_info(9S) and mac_group_info(9E) for detailed information.


Note - mgi_addmac(9E) and mgi_remmac(9E) entry points are used only for the receive groups. The mc_unicst(9E) entry point must be set to NULL whenever device drivers support rings capability.



Note - The mgi_addvlan() entry point performs the following actions:


mr_rget() Entry Point

The mr_rget(9E) entry point is invoked by the framework for each valid group and ring indices corresponding to the number of groups which is indicated by mr_gnum and the number of rings which is indicated by mr_rnum as advertised by the call to MAC_CAPAB_RINGS. See mr_rget(9E) for detailed information.

After the call to mr_rget() is completed, the ring information is returned in the mac_ring_info structure by the driver. The structure is pre-allocated by the framework and is filled in by the driver.

The following fields are defined in the mac_ring_info structure:

mri_driver

An opaque driver group handle which is used by the framework in future calls to ring entry points.

mri_start

Ring start entry point.

mri_stop

Ring stop entry point

mri_stat

Ring statistics entry point. See GLDv3 Network Statistics for more information.

mri_tx

Ring transmit entry point. See Transmit Data Path for more information.

mri_poll

Ring poll entry point. Receive Data Path for more information.

mri_intr_ddi_handle

The DDI interrupt handle associate with the interrupt for this ring.

mri_intr_enable(9E)

Enable interrupts on RX rings. See Receive Data Path for more information.

mri_intr_disable(9E)

Disable interrupts on RX rings. Receive Data Path for more information.

See mac_group_info(9S) and mac_ring_info(9S) man pages for detailed information.


Note - mri_tx() must be set for transmit rings only and mri_poll() must be set only for receive rings.



Note - If a driver implements rings capability, then the mc_tx() entry point in the mac_callbacks structure must be set to NULL.


Ring Groups and SR-IOV

The device drivers that are SR-IOV capable use the MAC_CAPAB_RINGS capability to inform the framework that they are SR-IOV capable by implementing the mgi_getsriov_info(9E) group entry point. The PF driver is responsible for implementing this entry point.

After the call to mgi_getsriov_info(9E), the SR-IOV information is returned in the mac_sriov_info structure by the driver. The structure is pre-allocated by the framework and is filled-in by the driver.

The PF (Physical Function) driver instance registers as many transmit and receive ring groups as the number of VFs (Virtual Functions). These ring groups advertised by the PF driver are special and are used to manage the VFs. The ring groups do not have any data flowing through them. They are used to configure unicast MAC address, set MTU, add VLAN filters, remove VLAN filters, remove VLAN hardware, and perform VLAN tagging and stripping for VFs.


Note - The VF driver programs the MAC multicast group that the driver wants to join. The PF driver does not control the programming of these addresses.


The msi_vf_index structure member, set by the PF driver, captures the VF index that corresponds to a ring group. This is the same index used by the device driver when the driver calls the pci_plist_getvf(9F) function.

See Chapter 21, SR-IOV Drivers for detailed information about SR-IOV drivers.

Hardware Checksum Offload

To get data about support for hardware checksum offload, the framework sends MAC_CAPAB_HCKSUM in the cap argument. See Hardware Checksum Offload Capability Information.

To query checksum offload metadata and retrieve the per-packet hardware checksumming metadata when hardware checksumming is enabled, use mac_hcksum_get(9F). See The mac_hcksum_get() Function Flags.

To set checksum offload metadata, use mac_hcksum_set(9F). See The mac_hcksum_set() Function Flags.

See Hardware Checksumming: Hardware and Hardware Checksumming: MAC Layer for more information.

Hardware Checksum Offload Capability Information

To pass information about the MAC_CAPAB_HCKSUM capability to the framework, the driver must set a combination of the following flags in cap_data, which points to a uint32_t. These flags indicate the level of hardware checksum offload that the driver is capable of performing for outbound packets.

HCKSUM_INET_PARTIAL

Partial 1's complement checksum ability

HCKSUM_INET_FULL_V4

Full 1's complement checksum ability for IPv4 packets

HCKSUM_INET_FULL_V6

Full 1's complement checksum ability for IPv6 packets

HCKSUM_IPHDRCKSUM

IPv4 Header checksum offload capability

The mac_hcksum_get() Function Flags

The flags argument of mac_hcksum_get() is a combination of the following values:

HCK_FULLCKSUM

Compute the full checksum for this packet.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

Compute the partial 1's complement checksum based on other parameters passed to mac_hcksum_get(). HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

Compute the IP header checksum.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

The mac_hcksum_set() Function Flags

The flags argument of mac_hcksum_set() is a combination of the following values:

HCK_FULLCKSUM

The full checksum was computed and passed through the value argument.

HCK_FULLCKSUM_OK

The full checksum was verified in hardware and is correct.

HCK_PARTIALCKSUM

The partial checksum was computed and passed through the value argument. HCK_PARTIALCKSUM is mutually exclusive with HCK_FULLCKSUM.

HCK_IPV4_HDRCKSUM

The IP header checksum was computed and passed through the value argument.

HCK_IPV4_HDRCKSUM_OK

The IP header checksum was verified in hardware and is correct.

Large Segment (or Send) Offload

To query support for large segment (or send) offload, the framework sends MAC_CAPAB_LSO in the cap argument and expects the information back in cap_data, which points to a mac_capab_lso(9S) structure. The framework allocates the mac_capab_lso structure and passes a pointer to this structure in cap_data. The mac_capab_lso structure consists of an lso_basic_tcp_ipv4(9S) structure and an lso_flags member. If the driver instance supports LSO for TCP on IPv4, set the LSO_TX_BASIC_TCP_IPV4 flag in lso_flags and set the lso_max member of the lso_basic_tcp_ipv4 structure to the maximum payload size supported by the driver instance.

Use mac_lso_get(9F) to obtain per-packet LSO metadata. If LSO is enabled for this packet, the HW_LSO flag is set in the mac_lso_get() flags argument. The maximum segment size (MSS) to be used during segmentation of the large segment is returned through the location pointed to by the mss argument. See Large Segment Offload for more information.

GLDv3 Data Paths

Data-path entry points are comprised of the following components:


Note - If a driver implements the rings capability then all data sent and received by the driver is passed through ring-specific entry points.


Transmit Data Path

The type of transmit entry point invoked by the GLDv3 framework to pass a message block to the driver is dependent on the underlying driver support for MAC_CAPAB_RINGS. If the driver supports MAC_CAPAB_RINGS capability then the framework invokes mri_tx(9E) ring entry point. Otherwise the framework invokes mc_tx(9E) entry point.

Accordingly, the device driver has to provide a pointer to the transmit entry point in either mc_tx() or mri_tx(). See GLDv3 MAC Registration Data Structures and mr_rget() Entry Point for more information.

Example 19-6 The mc_tx() and mri_tx() Entry Point

mblk_t *
xx_m_tx(void *arg, mblk_t *mp)
{
        xx_t    *xxp = arg;
        mblk_t   *nmp;

        mutex_enter(&xxp->xx_xmtlock);

        if (xxp->xx_flags & XX_SUSPENDED) {
                while ((nmp = mp) != NULL) {
                        xxp->xx_carrier_errors++;
                        mp = mp->b_next;
                        freemsg(nmp);
                }
                mutex_exit(&xxp->xx_xmtlock);
                return (NULL);
        }

        while (mp != NULL) {
                nmp = mp->b_next;
                mp->b_next = NULL;

                if (!xx_send(xxp, mp)) {
                        mp->b_next = nmp;
                        break;
                }
                mp = nmp;
        }
        mutex_exit(&xxp->xx_xmtlock);

        return (mp);
}

The following sections discuss topics related to transmitting data to the hardware.

Flow Control

If the driver cannot send the packets because of insufficient hardware resources, the driver returns the sub-chain of packets that could not be sent. When more descriptors become available at a later time, the driver must invoke mac_tx_update(9F) or mac_tx_ring(9F) to notify the framework. The driver will invoke either function depending on whether the driver implements Rings Capability.

Hardware Checksumming: Hardware

If the driver specified hardware checksum support (see Hardware Checksum Offload), then the driver must do the following tasks:

Large Segment Offload

If the driver specified LSO capabilities (see Large Segment (or Send) Offload), then the driver must use mac_lso_get(9F) to query whether LSO must be performed on the packet.

Virtual LAN: Hardware

When the administrator configures VLANs, the MAC layer inserts the needed VLAN headers on the outbound packets before they are passed to the driver through the mc_tx() entry point. However, if the hardware supports VLAN tagging then the tagging is offloaded to the hardware. See mr_gget() Entry Point for more details.

Receive Data Path

The receive data-path can be interrupt-driven or poll-driven.

Receive Interrupt Data Path

Note: If the driver does not support the rings capability then call the mac_rx(9F) function in your driver's interrupt handler to pass a chain of one or more packets up the stack to the MAC layer. Avoid holding mutex or other locks during the call to mac_rx() or mac_rx_ring(). In particular, do not hold locks that could be taken by a transmit thread during a call to mac_rx() or mac_rx_ring().

In interrupt mode, packet chains are sent up from the driver to the framework whenever they are received by the NIC and available by the driver for pickup. Packet chains consists of one or more mblk_t chained with each other through b_next and allow per-packet processing overhead to be reduced. Received packets are passed up to the framework in interrupt mode by calling the mac_rx_ring() entry point.

void mac_rx_ring(mac_handle_t mh, mac_ring_handle_t mrh, mblk_t *mp_chain, int64_tmr_gen_num)

The mh_handle corresponds to the MAC handle obtained by the device driver when it registered with the kernel via the mac_register() function. The mrh _handle is the framework ring handle which was passed to the driver as part of the mr_rget() call. mr_gen_num must be set to the generation number specified by the framework when the receive ring was started via the mri_start() entry point. The ring generation number provided by the driver is matched with the ring generation number held in framework. If they do not match, received packets are considered stale packets coming from an older assignment of the ring and they will be dropped.

Receive Polling Data-Path

In addition to being able to receive packets through an interrupt-driven path, framework also supports a polling-based data path. In polling mode, a kernel thread running in the stack fetches packets from the driver through a polling entry point. This allows the stack to efficiently control when packets will be processed, with which priority, while reducing the numbers of interrupts coming into the system based on actual load. In addition, polling allows the stack to more effectively enforce bandwidth limits on received traffic, which is especially critical in virtualization scenarios. The host toggles between interrupt and polling mode on demand. While a ring is in polling mode, the driver should not deliver packets received through the receive ring using mac_rx_ring() function. This is guaranteed as interrupts are disabled while in polling mode. Instead, the framework will call the mri_poll() entry point that was exposed by the driver as part of the mac_ring_info structure. See mr_rget() Entry Point for more information.

Switching Between Interrupt and Polling Mode

By default, a ring should be in interrupt mode after it is started. As long as a ring is in interrupt mode, it should pass up received packets in the form of chains through the entry points. When the host switches a ring to polling mode, it disables its interrupt by invoking the entry point through the mac_intr structure, which was previously exposed through the mac_ring_info structure.

Hardware Checksumming: MAC Layer

If the driver specified hardware checksum support (see Hardware Checksum Offload), then the driver must use the mac_hcksum_set(9F) function to associate hardware checksumming metadata with the packet.

Virtual LAN: MAC Layer

VLAN packets must be passed with their tags to the MAC layer. Do not strip the VLAN headers from the packets. However if the hardware supports VLAN stripping and the framework has requested the hardware to strip VLAN tags then the hardware can strip VLAN tags to improve performance. See mr_gget() Entry Point for more information.

GLDv3 State Change Notifications

A driver can call the following functions to notify the network stack that the driver's state has changed.

void mac_tx_update(mac_handle_t mh);
void mac_tx_ring_update(mac_handle_t mh, mac_ring_handle_t rh)

The mac_tx_update(9F) or mac_tx_ring(9F) functions notify the framework that more TX descriptors are available. If mc_tx() or mri_tx() return a non-empty chain of packets, then the driver must call mac_tx_update() or mac_tx_ring_update() as soon as possible after resources are available to inform the MAC layer to retry the packets that were returned as not sent. See Transmit Data Path for more information about the mc_tx() and mri_tx() entry points.

void mac_link_update(mac_handle_t mh, link_state_t new_state);

The mac_link_update(9F) function notifies the MAC layer that the state of the media link has changed. The new_state argument must be one of the following values:

LINK_STATE_UP

The media link is up.

LINK_STATE_DOWN

The media link is down.

LINK_STATE_UNKNOWN

The media link is unknown.

GLDv3 Network Statistics

Device drivers maintain a set of statistics for the device instances they manage. The MAC layer queries these statistics through the mc_getstat(9E) entry point of the driver.

int mc_getstat(void *driver_handle, uint_t stat, uint64_t *stat_value);

The GLDv3 framework uses stat to specify the statistic being queried. The driver uses stat_value to return the value of the statistic specified by stat. If the value of the statistic is returned, mc_getstat() must return 0. If the stat statistic is not supported by the driver, mc_getstat() must return ENOTSUP.

The GLDv3 statistics that are supported are the union of generic MAC statistics and Ethernet-specific statistics. See the mc_getstat(9E) man page for a complete list of supported statistics.

Example 19-7 The mc_getstat() Entry Point

int
xx_m_getstat(void *arg, uint_t stat, uint64_t *val)
{
        xx_t    *xxp = arg;

        mutex_enter(&xxp->xx_xmtlock);
        if ((xxp->xx_flags & (XX_RUNNING|XX_SUSPENDED)) == XX_RUNNING)
                xx_reclaim(xxp);
        mutex_exit(&xxp->xx_xmtlock);

        switch (stat) {
        case MAC_STAT_MULTIRCV:
                *val = xxp->xx_multircv;
                break;
        /* ... */
        case ETHER_STAT_MACRCV_ERRORS:
                *val = xxp->xx_macrcv_errors;
                break;
        /* ... */
        default:
                return (ENOTSUP);
        }
        return (0);
}

The mri_stat() ring entry point is a mandatory ring entry point that must be implemented by all the device drivers that support rings capability. This entry point will be used by the framework to query the statistics maintained for each of the hardware transmit and receive rings.

For the hardware transmit rings, the framework queries the following statistics:

For the hardware receive rings, the framework queries the following statistics:

GLDv3 Properties

Use the mc_propinfo(9E) entry point to return immutable attributes of a property. This information includes permissions, default values, and allowed value ranges. Use mc_setprop(9E) to set the value of a property for this particular driver instance. Use mc_getprop(9E) to return the current value of a property.

See the mc_propinfo(9E) man page for a complete list of properties and their types.

The mc_propinfo() entry point should invoke the mac_prop_info_set_perm(), mac_prop_info_set_default(), and mac_prop_info_set_range() functions to associate specific attributes of the property being queried, such as default values, permissions, or allowed value ranges.

The mac_prop_info_set_default_uint8(9F), mac_prop_info_set_default_str(9F), and mac_prop_info_set_default_link_flowctrl(9F) functions associate a default value with a specific property. The mac_prop_info_set_range_uint32(9F) function associates an allowed range of values for a specific property.

The mac_prop_info_set_perm(9F) function specifies the permission of the property. The permission can be one of the following values:

MAC_PROP_PERM_READ

The property is read-only

MAC_PROP_PERM_WRITE

The property is write-only

MAC_PROP_PERM_RW

The property can be read and written

If the mc_propinfo() entry point does not call mac_prop_info_set_perm() for a particular property, the GLDv3 framework assumes that the property has read and write permissions, corresponding to MAC_PROP_PERM_RW.

In addition to the properties listed in the mc_propinfo(9E) man page, drivers can also expose driver-private properties. Use the m_priv_props field of the mac_register structure to specify driver-private properties supported by the driver. The framework passes the MAC_PROP_PRIVATE property ID in mc_setprop(), mc_getprop(), or mc_propinfo(). See the mc_propinfo(9E) man page for more information.

Summary of GLDv3 Interfaces

The following table lists entry points, other DDI functions, and data structures that are part of the GLDv3 network device driver framework.

Table 19-1 GLDv3 Interfaces

Interface Name
Description
Required Entry Points
Retrieve network statistics from the driver. See GLDv3 Network Statistics.
Start a driver instance. The GLDv3 framework invokes the start entry point before any operation is attempted.
Stop a driver instance. The MAC layer invokes the stop entry point before the device is detached.
Change the promiscuous mode of the device driver instance.
Add or remove a multicast address.
Set the primary unicast address. The device must start passing back through mac_rx() the packets with a destination MAC address that matches the new unicast address. See Receive Data Path for information about mac_rx().
Send one or more packets. See Transmit Data Path.
Obtain transmit and receive ring information. See mr_rget() Entry Point for more information.
Obtain transmit and receive ring information. See mr_gget() Entry Point for more information.
Add ring to a receive group. This is required only if dynamic ring grouping is supported. See MAC_CAPAB_RINGS Capability.
Remove ring from a receive group. This is required only if dynamic ring grouping is supported. See MAC_CAPAB_RINGS Capability.
mri_tx(9E)
Transmit packets for TX rings. See mr_rget() Entry Point for more information.
mri_poll()
Poll RX ring for packets. See mr_rget() Entry Point for more information.
mri_stat()
Ring statistics. See mr_rget() Entry Point for more information.
Enable interrupts on RX ring. See mr_rget() Entry Point for more information.
Disable interrupts on RX ring. See mr_rget() Entry Point for more information.
Program a MAC address into the driver's hardware for RX ring group. See mr_gget() Entry Point for more information.
Remove a previously programmed MAC address from the driver's hardware for RX ring group. See mr_gget() Entry Point for more information.
Optional Entry Points
Optional ioctl driver interface. This facility is intended to be used only for debugging purposes.
Retrieve capabilities. See GLDv3 Capabilities.
Set a property value. See GLDv3 Properties.
Get a property value. See GLDv3 Properties.
Get information about a property. See GLDv3 Properties.
mri_start()
Start ring. See mr_rget() Entry Point for more information
mri_stop()
Stop ring. See mr_rget() Entry Point for more information.
Ring Group start. See mr_gget() Entry Point for more information.
Ring Group stop. See mr_gget() Entry Point for more information.
mgi_addvlan()
Enable VLAN filtering in hardware. See mr_gget() Entry Point for more information.
mgi_remvlan()
Remove previously programmed VLAN filter. See mr_gget() Entry Point for more information.
mgi_setmtu()
Set RX group MTU. See mr_gget() Entry Point for more information.
mgi_get_sriov_info()
Obtain SR-IOV information. See Ring Groups and SR-IOV for more information.
Data Structures
Registration information. See GLDv3 MAC Registration Data Structures.
LSO metadata for TCP/IPv4. See Large Segment (or Send) Offload.
See MAC Rings Capability for more information.
See mr_gget() Entry Point for more information.
See mr_rget() Entry Point for more information.
mac_intr_t
mac_sriov_info
MAC Registration Functions
Allocate a new mac_register structure. See GLDv3 MAC Registration.
Free a mac_register structure.
Register with the MAC layer.
Unregister from the MAC layer.
Initialize the driver's dev_ops(9S) structure.
Release the driver's dev_ops structure.
Data Transfer Functions
Pass up received packets. See Receive Data Path.
mac_rx_ring(9F)
Pass up received packets. See Receive Data Path.
TX resources are available. See GLDv3 State Change Notifications.
mac_tx_ring_update(9F)
TX resources are available. See GLDv3 State Change Notifications for more information.
Link state has changed.
Retrieve hardware checksum information. See Hardware Checksum Offload and Transmit Data Path.
Attach hardware checksum information. See Hardware Checksum Offload and Receive Data Path.
Retrieve LSO information. See Large Segment (or Send) Offload.
Properties Functions
Set the permission of a property. See GLDv3 Properties.
Set a property value.
Set a property values range.