- 论坛徽章:
- 1
|
昨天巡检发现一台v890的右侧热故障灯亮起,心里一紧,赶紧登陆系统,系统还是正常,prtdiag只是cpu5高温报警。fmadm下却只有一条内存告警。现在申请备件准备把故障的cpu板和内存都换了,但是想不通为啥只有一个cpu报高温,cpu还正常工作。请教各位大侠帮忙分析下。
root # uname -a
SunOS 5.10 Generic_141414-07 sun4u sparc SUNW,Sun-Fire-V890
root # prtdiag -v
System Configuration: Sun Microsystems sun4u Sun Fire V890
System clock frequency: 150 MHz
Memory size: 32768 Megabytes
========================= CPUs ===============================================
Run E$ CPU CPU
Brd CPU MHz MB Impl. Mask
--- ----- ---- ---- ------- ----
A 0, 16 1350 16.0 US-IV 3.1
B 1, 17 1350 16.0 US-IV 3.1
A 2, 18 1350 16.0 US-IV 3.1
B 3, 19 1350 16.0 US-IV 3.1
C 4, 20 1350 16.0 US-IV 3.1
D 5, 21 1350 16.0 US-IV 3.1
C 6, 22 1350 16.0 US-IV 3.1
D 7, 23 1350 16.0 US-IV 3.1
========================= Memory Configuration ===============================
Logical Logical Logical
MC Bank Bank Bank DIMM Interleave Interleaved
Brd ID num size Status Size Factor with
---- --- ---- ------ ----------- ------ ---------- -----------
A 0 0 1024MB no_status 512MB 8-way 0
A 0 1 1024MB no_status 512MB 8-way 0
A 0 2 1024MB no_status 512MB 8-way 0
A 0 3 1024MB no_status 512MB 8-way 0
B 1 0 1024MB no_status 512MB 8-way 1
B 1 1 1024MB no_status 512MB 8-way 1
B 1 2 1024MB no_status 512MB 8-way 1
B 1 3 1024MB no_status 512MB 8-way 1
A 2 0 1024MB no_status 512MB 8-way 0
A 2 1 1024MB no_status 512MB 8-way 0
A 2 2 1024MB no_status 512MB 8-way 0
A 2 3 1024MB no_status 512MB 8-way 0
B 3 0 1024MB no_status 512MB 8-way 1
B 3 1 1024MB no_status 512MB 8-way 1
B 3 2 1024MB no_status 512MB 8-way 1
B 3 3 1024MB no_status 512MB 8-way 1
C 4 0 1024MB no_status 512MB 8-way 2
C 4 1 1024MB no_status 512MB 8-way 2
C 4 2 1024MB no_status 512MB 8-way 2
C 4 3 1024MB no_status 512MB 8-way 2
D 5 0 1024MB no_status 512MB 8-way 3
D 5 1 1024MB no_status 512MB 8-way 3
D 5 2 1024MB no_status 512MB 8-way 3
D 5 3 1024MB no_status 512MB 8-way 3
C 6 0 1024MB no_status 512MB 8-way 2
C 6 1 1024MB no_status 512MB 8-way 2
C 6 2 1024MB no_status 512MB 8-way 2
C 6 3 1024MB no_status 512MB 8-way 2
D 7 0 1024MB no_status 512MB 8-way 3
D 7 1 1024MB no_status 512MB 8-way 3
D 7 2 1024MB no_status 512MB 8-way 3
D 7 3 1024MB no_status 512MB 8-way 3
========================= IO Cards =========================
Bus Max
IO Port Bus Freq Bus Dev,
Brd Type ID Side Slot MHz Freq Func State Name Model
---- ---- ---- ---- ---- ---- ---- ---- ----- -------------------------------- ----------------------
I/O PCI 9 B 6 33 33 2,0 ok pci-pci8086,b154.0/network (netw+ PCI-BRIDGE
I/O PCI 9 B 6 33 33 0,0 ok network-pci100b,35.30 SUNW,pci-ce/pci-bridge
I/O PCI 9 B 6 33 33 1,0 ok network-pci100b,35.30 SUNW,pci-ce/pci-bridge
I/O PCI 9 B 6 33 33 2,0 ok scsi-pci1000,b.7/disk (block) device on pci-bridge
I/O PCI 9 B 6 33 33 2,1 ok scsi-pci1000,b.7/disk (block) device on pci-bridge
I/O PCI 9 A 8 66 66 1,0 ok scsi-pci1000,30.1000.10c0.8/disk+ LSI,1030
I/O PCI 9 A 8 66 66 1,1 ok scsi-pci1000,30.1000.10c0.8/disk+ LSI,1030
No failures found in System
===========================
========================= Environmental Status =========================
System Temperatures (Celsius):
-------------------------------
Device Temperature Status
---------------------------------------
CPU0 72 OK
CPU1 70 OK
CPU2 70 OK
CPU3 79 OK
CPU4 76 OK
CPU5 104 ERROR
CPU6 77 OK
CPU7 72 OK
MB 37 OK
IOB 30 OK
DBP0 29 OK
=================================
Front Status Panel:
-------------------
Keyswitch position: NORMAL
System LED Status:
GEN FAULT REMOVE
[ ON] [OFF]
DISK FAULT POWER FAULT
[OFF] [OFF]
LEFT THERMAL FAULT RIGHT THERMAL FAULT
[OFF] [ ON]
LEFT DOOR RIGHT DOOR
[OFF] [OFF]
=================================
Disk Status:
Presence Fault LED Remove LED
DISK 0: [PRESENT] [OFF] [OFF]
DISK 1: [PRESENT] [OFF] [OFF]
DISK 2: [PRESENT] [OFF] [OFF]
DISK 3: [PRESENT] [OFF] [OFF]
DISK 4: [PRESENT] [OFF] [OFF]
DISK 5: [PRESENT] [OFF] [OFF]
DISK 6: [ EMPTY]
DISK 7: [ EMPTY]
DISK 8: [ EMPTY]
DISK 9: [ EMPTY]
DISK 10: [ EMPTY]
DISK 11: [ EMPTY]
=================================
Fan Bank :
----------
Bank Speed Status Fan State
( RPMS )
---- -------- --------- ---------
CPU0_PRIM_FAN 3191 [ENABLED] OK
CPU1_PRIM_FAN 3333 [ENABLED] OK
CPU0_SEC_FAN 0 [DISABLED] OK
CPU1_SEC_FAN 0 [DISABLED] OK
IO0_PRIM_FAN 3000 [ENABLED] OK
IO1_PRIM_FAN 2912 [ENABLED] OK
IO0_SEC_FAN 0 [DISABLED] OK
IO1_SEC_FAN 0 [DISABLED] OK
IO_BRIDGE_PRIM_FAN 3658 [ENABLED] OK
IO_BRIDGE_SEC_FAN 0 [DISABLED] OK
=================================
Power Supplies:
---------------
Current Drain:
Supply Status Fan Fail Temp Fail CS Fail 3.3V 5V 12V 48V
------ ------------ -------- --------- ------- ---- -- --- ---
PS0 GOOD 6 4 3 10
PS1 GOOD 6 4 3 10
PS2 GOOD 6 4 3 10
========================= HW Revisions =======================================
System PROM revisions:
----------------------
OBP 4.18.11 2006/05/03 07:41
IO ASIC revisions:
------------------
Port
Model ID Status Version
-------- ---- ------ -------
Schizo 8 ok 7
Schizo 9 ok 7
root # uptime
3:58pm up 680 day(s), 15:19, 1 user, load average: 0.56, 1.18, 1.38
root # psrinfo -v
Status of virtual processor 0 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 2 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 3 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 4 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 5 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 6 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 7 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:39.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 16 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 17 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 18 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 19 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 20 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 21 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 22 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
Status of virtual processor 23 as of: 10/08/2015 15:58:12
on-line since 11/27/2013 00:39:55.
The sparcv9 processor operates at 1350 MHz,
and has a sparcv9 floating point processor.
root #
root # fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 23 14:09:43 e5b52aa4-22d5-4980-c5b3-94df75aed89e SUN4U-8000-2S Major
Fault class : fault.memory.dimm 95%
Affects : mem:///unum=Slot,C:J8000
faulted but still in service
FRU : mem:///unum=Slot,C:J8000 95%
faulty
Serial ID. :
Description : The number of errors associated with this memory module has
exceeded acceptable levels. Refer to
sun.com/msg/SUN4U-8000-2S for more information.
Response : Pages of memory associated with this memory module are being
removed from service as errors are reported.
Impact : Total system memory capacity will be reduced as pages are
retired.
Action : Schedule a repair procedure to replace the affected memory
module. Use fmdump -v -u <EVENT_ID> to identify the module.
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Apr 16 03:38:49 e5c4075d-8f5e-ee9e-a496-a449c96939e6 FMD-8000-0W Minor
Fault class : defect.sunos.fmd.nosub
Description : The Solaris Fault Manager received an event from a component to
which no automated diagnosis software is currently subscribed.
Refer to sun.com/msg/FMD-8000-0W for more information.
Response : Error reports from the component will be logged for examination
by Sun.
Impact : Automated diagnosis and response for these events will not occur.
Action : Run pkgchk -n SUNWfmd to ensure that fault management software is
installed properly. Contact Sun for support.
root # |
|