Displaying Information About Faults or Defects

The preferred method to display fault or defect information and determine the FRUs involved is the fmadm faulty command. However, the fmdump command is also supported. fmdump is often used to display a historical log of problems on the system, and fmadm faulty is used to display the active problems.

Caution - Do not base administrative action on the output of the fmdump command, but rather on the fmadm faulty output. The log files can contain error statements, which should not be considered faults or defects.

How to Display Information About Faulty Components

Become an administrator.
For more information, see How to Use Your Assigned Administrative Rights in Oracle Solaris 11.1 Administration: Security Services.
Display information about the components.
```
# fmadm faulty
```
See the following examples for a description of the text generated.

Example 3-1 fmadm Output With One Faulty CPU

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- ---------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- ---------
5    Aug 24 17:56:03 7b83c87c-78f6-6a8e-fa2b-d0cf16834049  SUN4V-8001-8H  Minor
6    
7    Host        : bur419-61
8    Platform    : SUNW,T5440        Chassis_id  : BEL07524BN
9    Product_sn  : BEL07524BN
10
11   Fault class : fault.cpu.ultraSPARC-T2plus.ireg
12   Affects     : cpu:///cpuid=0/serial=1F95806CD1421929
13                     faulted and taken out of service
14   FRU         : "MB/CPU0" (hc://:product-id=SUNW,T5440:server-id=bur419-61:\
15                 serial=3529:part=541255304/motherboard=0/cpuboard=0)
16                     faulty
17   Serial ID.  : 3529
18                 1F95806CD1421929
19   
20   Description : The number of integer register errors associated with this thread
21                 has exceeded acceptable levels.
22   
23   Response    : The fault manager will attempt to remove the affected thread from
24                 service.
25   
26   Impact      : System performance may be affected.
27   
28   Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
29                 Please refer to the associated reference document at
30                 http://support.oracle.com/msg/SUN4V-8001-8H for the latest service
31                 procedures and policies regarding this diagnosis.

Of primary interest is line 14, which shows the data for the impacted FRUs. The more human-readable location string is presented in quotation marks, "MB/CPU0". The quoted value is intended to match the label on the physical hardware. The FRU is also represented in a Fault Management Resource Identifier (FMRI) format, which includes descriptive properties about the system containing the fault, such as its host name and chassis serial number. On platforms that support it, the part number and serial number of the FRU are also included in the FRU's FMRI.

The Affects lines (lines 12 and 13) indicate the components that are affected by the fault and their relative state. In this example, a single CPU strand is affected. It is faulted and taken out of service.

Following the FRU description in the fmadm faulty command output, line 16 shows the state as faulty. The Action section might also include other specific actions instead of, or in addition to, the usual reference to the fmadm command.

Example 3-2 fmadm Output With Multiple Faults

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- -------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- -------
5    Sep 21 10:01:36 d482f935-5c8f-e9ab-9f25-d0aaafec1e6c  PCIEX-8000-5Y  Major
6    
7    Fault class  : fault.io.pci.device-invreq
8    Affects      : dev:///pci@0,0/pci1022,7458@11/pci1000,3060@0
9                   dev:///pci@0,0/pci1022,7458@11/pci1000,3060@1
10                   ok and in service
11                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@2
12                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@3
13                    faulty and taken out of service
14   FRU          : "SLOT 2" (hc://.../pciexrc=3/pciexbus=4/pciexdev=0)
15                    repair attempted
16                  "SLOT 3" (hc://.../pciexrc=3/pciexbus=4/pciexdev=1)
17                    acquitted
18                  "SLOT 4" (hc://.../pciexrc=3/pciexbus=4/pciexdev=2)
19                    not present
20                  "SLOT 5" (hc://.../pciexrc=3/pciexbus=4/pciexdev=3)
21                    faulty
22   
23    Description  : The transmitting device sent an invalid request.
24   
25    Response     : One or more device instances may be disabled
26   
27    Impact       : Possible loss of services provided by the device instances
28                   associated with this fault
29   
30    Action       : Use 'fmadm faulty' to provide a more detailed view of this event.
31                   Please refer to the associated reference document at
32                   http://support.oracle.com/msg/PCIEX-8000-5Y for the latest service
33                   procedures and policies regarding this diagnosis.

Following the FRU description in the fmadm faulty command output, line 21 shows the state as faulty. Other state values that you might see in other situations include acquitted and repair attempted, as shown for SLOT 2 and SLOT 3 in lines 15 and 17.

Example 3-3 Showing Faults with the fmdump Command

Some console messages and knowledge articles might instruct you to use the older fmdump -v -u UUID command to display fault information. Although the fmadm faulty command is preferred, the fmdump command still operates, as shown in the following example:

1    % fmdump -v -u 7b83c87c-78f6-6a8e-fa2b-d0cf16834049
2    TIME                 UUID                                 SUNW-MSG-ID EVENT
3    Aug 24 17:56:03.4596 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 SUN4V-8001-8H Diagnosed
4      100%  fault.cpu.ultraSPARC-T2plus.ireg
5
6            Problem in: -
7               Affects: cpu:///cpuid=0/serial=1F95806CD1421929
8                   FRU: hc://:product-id=SUNW,T5440:server-id=bur419-61:\
9                   serial=9999:part=541255304/motherboard=0/cpuboard=0
10              Location: MB/CPU0

The information about the affected FRUs is still present, although separated across three lines (lines 8 through 10). The Location string presents the human-readable FRU string. The FRU lines presents the formal FMRI. Note that the severity, descriptive text, and action are not shown with the fmdump command, unless you use the -m option. See the fmdump(1M) man page for more information.

How to Identify Which CPUs Are Offline

Display information about the CPUs.
```
% /usr/sbin/psrinfo 
0       faulted   since 05/13/2011 12:55:26 
1       on-line   since 05/12/2011 11:47:26 
```
The faulted state indicates that the CPU has been taken offline by a Fault Management response agent.

How to Display Information About Defective Services

Become an administrator.
For more information, see How to Use Your Assigned Administrative Rights in Oracle Solaris 11.1 Administration: Security Services.

Display information about the defect.

# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
May 12 22:52:47 915cb64b-e16b-4f49-efe6-de81ff96fce7  SMF-8000-YX    major

Host        : parity
Platform    : Sun-Fire-V40z     Chassis_id  : XG051535088
Product_sn  : XG051535088

Fault class : defect.sunos.smf.svc.maintenance
Affects     : svc:///system/intrd:default
                  faulted and taken out of service
Problem in  : svc:///system/intrd:default
                  faulted and taken out of service

Description : A service failed - it is restarting too quickly.

Response    : The service has been placed into the maintenance state.

Impact      : svc:/system/intrd:default is unavailable.

Action      : Run 'svcs -xv svc:/system/intrd:default' to determine the
              generic reason why the service failed, the location of any
              logfiles, and a list of other services impacted. Please refer to
              the associated reference document at
              http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures
              and policies regarding this diagnosis.

Display information about the defective service.

Follow the instructions given in the Action section in the fmadm output.

# svcs -xv svc:/system/intrd:default
svc:/system/intrd:default (interrupt balancer)
 State: maintenance since Wed May 12 22:52:47 2010
Reason: Restarting too quickly.
   See: http://support.oracle.com/msg/SMF-8000-YX
   See: man -M /usr/share/man -s 1M intrd
   See: /var/svc/log/system-intrd:default.log
Impact: This service is not running.

Refer to the knowledge article, SMF-8000-YX, for further instructions on fixing this problem.

Skip Navigation Links
Exit Print View
	Managing Services and Faults in Oracle Solaris 11.1 Oracle Solaris 11.1 Information Library