JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Managing Services and Faults in Oracle Solaris 11.1     Oracle Solaris 11.1 Information Library
search filter icon
search icon

Document Information

Preface

1.  Managing Services (Overview)

2.  Managing Services (Tasks)

3.  Using the Fault Manager

Fault Management Overview

Notification of Faults and Defects

Displaying Information About Faults or Defects

How to Display Information About Faulty Components

How to Identify Which CPUs Are Offline

How to Display Information About Defective Services

Repairing Faults or Defects

fmadm replaced Command

fmadm repaired Command

fmadm acquit Command

Fault Management Log Files

Fault Statistics

Index

Displaying Information About Faults or Defects

The preferred method to display fault or defect information and determine the FRUs involved is the fmadm faulty command. However, the fmdump command is also supported. fmdump is often used to display a historical log of problems on the system, and fmadm faulty is used to display the active problems.


Caution

Caution - Do not base administrative action on the output of the fmdump command, but rather on the fmadm faulty output. The log files can contain error statements, which should not be considered faults or defects.


How to Display Information About Faulty Components

  1. Become an administrator.

    For more information, see How to Use Your Assigned Administrative Rights in Oracle Solaris 11.1 Administration: Security Services.

  2. Display information about the components.
    # fmadm faulty

    See the following examples for a description of the text generated.

Example 3-1 fmadm Output With One Faulty CPU

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- ---------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- ---------
5    Aug 24 17:56:03 7b83c87c-78f6-6a8e-fa2b-d0cf16834049  SUN4V-8001-8H  Minor
6    
7    Host        : bur419-61
8    Platform    : SUNW,T5440        Chassis_id  : BEL07524BN
9    Product_sn  : BEL07524BN
10
11   Fault class : fault.cpu.ultraSPARC-T2plus.ireg
12   Affects     : cpu:///cpuid=0/serial=1F95806CD1421929
13                     faulted and taken out of service
14   FRU         : "MB/CPU0" (hc://:product-id=SUNW,T5440:server-id=bur419-61:\
15                 serial=3529:part=541255304/motherboard=0/cpuboard=0)
16                     faulty
17   Serial ID.  : 3529
18                 1F95806CD1421929
19   
20   Description : The number of integer register errors associated with this thread
21                 has exceeded acceptable levels.
22   
23   Response    : The fault manager will attempt to remove the affected thread from
24                 service.
25   
26   Impact      : System performance may be affected.
27   
28   Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
29                 Please refer to the associated reference document at
30                 http://support.oracle.com/msg/SUN4V-8001-8H for the latest service
31                 procedures and policies regarding this diagnosis.
 

Of primary interest is line 14, which shows the data for the impacted FRUs. The more human-readable location string is presented in quotation marks, "MB/CPU0". The quoted value is intended to match the label on the physical hardware. The FRU is also represented in a Fault Management Resource Identifier (FMRI) format, which includes descriptive properties about the system containing the fault, such as its host name and chassis serial number. On platforms that support it, the part number and serial number of the FRU are also included in the FRU's FMRI.

The Affects lines (lines 12 and 13) indicate the components that are affected by the fault and their relative state. In this example, a single CPU strand is affected. It is faulted and taken out of service.

Following the FRU description in the fmadm faulty command output, line 16 shows the state as faulty. The Action section might also include other specific actions instead of, or in addition to, the usual reference to the fmadm command.

Example 3-2 fmadm Output With Multiple Faults

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- -------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- -------
5    Sep 21 10:01:36 d482f935-5c8f-e9ab-9f25-d0aaafec1e6c  PCIEX-8000-5Y  Major
6    
7    Fault class  : fault.io.pci.device-invreq
8    Affects      : dev:///pci@0,0/pci1022,7458@11/pci1000,3060@0
9                   dev:///pci@0,0/pci1022,7458@11/pci1000,3060@1
10                   ok and in service
11                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@2
12                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@3
13                    faulty and taken out of service
14   FRU          : "SLOT 2" (hc://.../pciexrc=3/pciexbus=4/pciexdev=0)
15                    repair attempted
16                  "SLOT 3" (hc://.../pciexrc=3/pciexbus=4/pciexdev=1)
17                    acquitted
18                  "SLOT 4" (hc://.../pciexrc=3/pciexbus=4/pciexdev=2)
19                    not present
20                  "SLOT 5" (hc://.../pciexrc=3/pciexbus=4/pciexdev=3)
21                    faulty
22   
23    Description  : The transmitting device sent an invalid request.
24   
25    Response     : One or more device instances may be disabled
26   
27    Impact       : Possible loss of services provided by the device instances
28                   associated with this fault
29   
30    Action       : Use 'fmadm faulty' to provide a more detailed view of this event.
31                   Please refer to the associated reference document at
32                   http://support.oracle.com/msg/PCIEX-8000-5Y for the latest service
33                   procedures and policies regarding this diagnosis.

Following the FRU description in the fmadm faulty command output, line 21 shows the state as faulty. Other state values that you might see in other situations include acquitted and repair attempted, as shown for SLOT 2 and SLOT 3 in lines 15 and 17.

Example 3-3 Showing Faults with the fmdump Command

Some console messages and knowledge articles might instruct you to use the older fmdump -v -u UUID command to display fault information. Although the fmadm faulty command is preferred, the fmdump command still operates, as shown in the following example:

1    % fmdump -v -u 7b83c87c-78f6-6a8e-fa2b-d0cf16834049
2    TIME                 UUID                                 SUNW-MSG-ID EVENT
3    Aug 24 17:56:03.4596 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 SUN4V-8001-8H Diagnosed
4      100%  fault.cpu.ultraSPARC-T2plus.ireg
5
6            Problem in: -
7               Affects: cpu:///cpuid=0/serial=1F95806CD1421929
8                   FRU: hc://:product-id=SUNW,T5440:server-id=bur419-61:\
9                   serial=9999:part=541255304/motherboard=0/cpuboard=0
10              Location: MB/CPU0

The information about the affected FRUs is still present, although separated across three lines (lines 8 through 10). The Location string presents the human-readable FRU string. The FRU lines presents the formal FMRI. Note that the severity, descriptive text, and action are not shown with the fmdump command, unless you use the -m option. See the fmdump(1M) man page for more information.

How to Identify Which CPUs Are Offline

How to Display Information About Defective Services

  1. Become an administrator.

    For more information, see How to Use Your Assigned Administrative Rights in Oracle Solaris 11.1 Administration: Security Services.

  2. Display information about the defect.
    # fmadm faulty
    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID         SEVERITY
    --------------- ------------------------------------  -------------- ---------
    May 12 22:52:47 915cb64b-e16b-4f49-efe6-de81ff96fce7  SMF-8000-YX    major
    
    Host        : parity
    Platform    : Sun-Fire-V40z     Chassis_id  : XG051535088
    Product_sn  : XG051535088
    
    Fault class : defect.sunos.smf.svc.maintenance
    Affects     : svc:///system/intrd:default
                      faulted and taken out of service
    Problem in  : svc:///system/intrd:default
                      faulted and taken out of service
    
    Description : A service failed - it is restarting too quickly.
    
    Response    : The service has been placed into the maintenance state.
    
    Impact      : svc:/system/intrd:default is unavailable.
    
    Action      : Run 'svcs -xv svc:/system/intrd:default' to determine the
                  generic reason why the service failed, the location of any
                  logfiles, and a list of other services impacted. Please refer to
                  the associated reference document at
                  http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures
                  and policies regarding this diagnosis.
  3. Display information about the defective service.

    Follow the instructions given in the Action section in the fmadm output.

    # svcs -xv svc:/system/intrd:default
    svc:/system/intrd:default (interrupt balancer)
     State: maintenance since Wed May 12 22:52:47 2010
    Reason: Restarting too quickly.
       See: http://support.oracle.com/msg/SMF-8000-YX
       See: man -M /usr/share/man -s 1M intrd
       See: /var/svc/log/system-intrd:default.log
    Impact: This service is not running.

    Refer to the knowledge article, SMF-8000-YX, for further instructions on fixing this problem.