场景描述
长期使用的一个ZFS硬盘阵列出现了硬盘损坏,SMART测试报大量错误。但是由于硬盘安装的时候没有做标记,因此面对12个硬盘位害怕抽错硬盘导致阵列GG。同时阵列正在执行读写和新盘的同步,不太方便停机抽出来看,因此需要在机器运行的时候定位损坏的硬盘。
基本环境
服务器:RH2288H V2
硬盘背板:SAS2308
操作系统:Esxi8直通SAS2308,TrueNAS-SCALE-22.02.4
操作流程
1、通过SSH登录TrueNAS Scale
如果在操作过程中出现`SAS2IRCU: MPTLib2 Error 1`,一般是权限问题,请加sudo或使用root账户。
admin@truenas[/mnt]$ sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
SAS2IRCU: MPTLib2 Error 1
2、检查sas2ircu是否能识别阵列卡
root@truenas[~]# sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
Adapter Vendor Device SubSys SubSys
Index Type ID ID Pci Address Ven ID Dev ID
----- ------------ ------ ------ ----------------- ------ ------
0 SAS2308_2 1000h 87h 00h:0bh:00h:00h 1000h 0087h
SAS2IRCU: Utility Completed Successfully.
root@truenas[~]# sas2ircu 0 display
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
Controller type : SAS2308_2
BIOS version : 7.25.00.00
Firmware version : 15.00.03.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 255
Concurrent commands supported : 3072
Slot : 0
Segment : 0
Bus : 11
Device : 0
Function : 0
RAID Support : Yes
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
略
3、在TrueNAS Scale上找到发生损坏的硬盘的序列号(Storage -> Disks -> Serial),是序列号(Serial No)不是硬盘型号(Model Number)。
4、在硬盘信息中找到该硬盘的相关信息
root@truenas[~]# sas2ircu 0 display | grep -B 8 WCC4E3LJFF91
Enclosure # : 2
Slot # : 5
SAS Address : 500e004-a-aaaa-aa05
State : Ready (RDY)
Size (in MB)/(in sectors) : 3815447/7814037167
Manufacturer : ATA
Model Number : WDC WD40PURX-64G
Firmware Revision : 0A80
Serial No : WDWCC4E3LJFF91
5、从上述信息中找到Enclosure
编号和Slot
编号,构成硬盘盘位的编号Enclosure:Slot
,例子中即为:2:5
6、使用定位指令与硬盘盘位号让硬盘盘位的知识灯亮起来
root@truenas[~]# sas2ircu 0 locate 2:5 on
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
SAS2IRCU: LOCATE command completed successfully.
SAS2IRCU: Command LOCATE Completed Successfully.
SAS2IRCU: Utility Completed Successfully.
关灯
root@truenas[~]# sas2ircu 0 locate 2:5 off
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.
SAS2IRCU: LOCATE command completed successfully.
SAS2IRCU: Command LOCATE Completed Successfully.
SAS2IRCU: Utility Completed Successfully.
7、可以看到机箱上的灯已经亮起(或闪烁)