Skip Navigation Links | |
Exit Print View | |
Oracle Solaris 11.1 Administration: ZFS File Systems Oracle Solaris 11.1 Information Library |
1. Oracle Solaris ZFS File System (Introduction)
2. Getting Started With Oracle Solaris ZFS
3. Managing Oracle Solaris ZFS Storage Pools
4. Managing ZFS Root Pool Components
5. Managing Oracle Solaris ZFS File Systems
6. Working With Oracle Solaris ZFS Snapshots and Clones
7. Using ACLs and Attributes to Protect Oracle Solaris ZFS Files
8. Oracle Solaris ZFS Delegated Administration
9. Oracle Solaris ZFS Advanced Topics
10. Oracle Solaris ZFS Troubleshooting and Pool Recovery
ZFS File System Space Reporting
ZFS Storage Pool Space Reporting
Missing Devices in a ZFS Storage Pool
Damaged Devices in a ZFS Storage Pool
Checking ZFS File System Integrity
Controlling ZFS Data Scrubbing
ZFS Data Scrubbing and Resilvering
Determining If Problems Exist in a ZFS Storage Pool
Overall Pool Status Information
Pool Configuration Information
System Reporting of ZFS Error Messages
Repairing a Damaged ZFS Configuration
Replacing or Repairing a Damaged Device
Determining the Type of Device Failure
Replacing a Device in a ZFS Storage Pool
Determining If a Device Can Be Replaced
Devices That Cannot be Replaced
Replacing a Device in a ZFS Storage Pool
Identifying the Type of Data Corruption
Repairing a Corrupted File or Directory
Repairing Corrupted Data With Multiple Block References
Repairing ZFS Storage Pool-Wide Damage
Repairing an Unbootable System
11. Archiving Snapshots and Root Pool Recovery
12. Recommended Oracle Solaris ZFS Practices
If a device cannot be opened, it displays the UNAVAIL state in the zpool status output. This state means that ZFS was unable to open the device when the pool was first accessed, or the device has since become unavailable. If the device causes a top-level virtual device to be unavailable, then nothing in the pool can be accessed. Otherwise, the fault tolerance of the pool might be compromised. In either case, the device just needs to be reattached to the system to restore normal operations. If you need to replace a device that is UNAVAIL because it has failed, see Replacing a Device in a ZFS Storage Pool.
If a device is UNAVAIL in a root pool or a mirrored root pool, see the following references:
Mirrored root pool disk failed – Booting From an Alternate Disk in a Mirrored ZFS Root Pool
Replacing a disk in a root pool
Full root pool disaster recovery – Chapter 11, Archiving Snapshots and Root Pool Recovery
For example, you might see a message similar to the following from fmd after a device failure:
SUNW-MSG-ID: ZFS-8000-QJ, TYPE: Fault, VER: 1, SEVERITY: Minor EVENT-TIME: Wed Jun 20 13:09:55 MDT 2012 PLATFORM: ORCL,SPARC-T3-4, CSN: 1120BDRCCD, HOSTNAME: tardis SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: e13312e0-be0a-439b-d7d3-cddaefe717b0 DESC: Outstanding dtls on ZFS device 'id1,sd@n5000c500335dc60f/a' in pool 'pond'. AUTO-RESPONSE: No automated response will occur. IMPACT: None at this time. REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-QJ for the latest service procedures and policies regarding this diagnosis.
To view more detailed information about the device problem and the resolution, use the zpool status -v command. For example:
# zpool status -v pool: pond state: DEGRADED status: One or more devices are unavailable in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or 'fmadm repaired', or replace the device with 'zpool replace'. scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jun 20 13:16:09 2012 config: NAME STATE READ WRITE CKSUM pond DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c0t5000C500335F95E3d0 ONLINE 0 0 0 c0t5000C500335F907Fd0 ONLINE 0 0 0 mirror-1 DEGRADED 0 0 0 c0t5000C500335BD117d0 ONLINE 0 0 0 c0t5000C500335DC60Fd0 UNAVAIL 0 0 0 device details: c0t5000C500335DC60Fd0 UNAVAIL cannot open status: ZFS detected errors on this device. The device was missing. see: http://support.oracle.com/msg/ZFS-8000-LR for recovery
You can see from this output that the c0t5000C500335DC60Fd0 device is not functioning. If you determine that the device is faulty, replace it.
If necessary, use the zpool online command to bring the replaced device online. For example:
Let FMA know that the device has been replaced if the output of the fmadm faulty identifies the device error. For example:
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jun 20 13:15:41 3745f745-371c-c2d3-d940-93acbb881bd8 ZFS-8000-LR Major Problem Status : solved Diag Engine : zfs-diagnosis / 1.0 System Manufacturer : unknown Name : ORCL,SPARC-T3-4 Part_Number : unknown Serial_Number : 1120BDRCCD Host_ID : 84a02d28 ---------------------------------------- Suspect 1 of 1 : Fault class : fault.fs.zfs.open_failed Certainty : 100% Affects : zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/pool_name=pond/vdev_ name=id1,sd@n5000c500335dc60f/a Status : faulted and taken out of service FRU Name : "zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/pool_name=pond/vdev_ name=id1,sd@n5000c500335dc60f/a" Status : faulty Description : ZFS device 'id1,sd@n5000c500335dc60f/a' in pool 'pond' failed to open. Response : An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.
Extract the string in the Affects: section of the fmadm faulty output and include it with the following command to let FMA know that the device is replaced:
# fmadm repaired zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/pool_name=pond/vdev_ name=id1,sd@n5000c500335dc60f/a fmadm: recorded repair to of zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/pool_name=pond/vdev_ name=id1,sd@n5000c500335dc60f/a
As a last step, confirm that the pool with the replaced device is healthy. For example:
# zpool status -x tank pool 'tank' is healthy
Exactly how a missing device is reattached depends on the device in question. If the device is a network-attached drive, connectivity to the network should be restored. If the device is a USB device or other removable media, it should be reattached to the system. If the device is a local disk, a controller might have failed such that the device is no longer visible to the system. In this case, the controller should be replaced, at which point the disks will again be available. Other problems can exist and depend on the type of hardware and its configuration. If a drive fails and it is no longer visible to the system, the device should be treated as a damaged device. Follow the procedures in Replacing or Repairing a Damaged Device.
A pool might be SUSPENDED if device connectivity is compromised. A SUSPENDED pool remains in the wait state until the device issue is resolved. For example:
# zpool status cybermen pool: cybermen state: SUSPENDED status: One or more devices are unavailable in response to IO failures. The pool is suspended. action: Make sure the affected devices are connected, then run 'zpool clear' or 'fmadm repaired'. Run 'zpool status -v' to see device specific details. see: http://support.oracle.com/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM cybermen UNAVAIL 0 16 0 c8t3d0 UNAVAIL 0 0 0 c8t1d0 UNAVAIL 0 0 0
After device connectivity is restored, clear the pool or device errors.
# zpool clear cybermen # fmadm repaired zfs://pool=name/vdev=guid
After a device is reattached to the system, ZFS might or might not automatically detect its availability. If the pool was previously UNAVAIL or SUSPENDED, or the system was rebooted as part of the attach procedure, then ZFS automatically rescans all devices when it tries to open the pool. If the pool was degraded and the device was replaced while the system was running, you must notify ZFS that the device is now available and ready to be reopened by using the zpool online command. For example:
# zpool online tank c0t1d0
For more information about bringing devices online, see Bringing a Device Online.