Scott;
Below is the log, any info is appreciated.
Eric
=====
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x47e88814.0xa220262 (0xafd46d33c7d)
PLATFORM: SUNW,Sun-Blade-T6320, CSN: -, HOSTNAME: gd65-19-1
SOURCE: SunOS, REV: 5.10 glendale-on10-nightly_nightly:08/07/2007
DESC: Errors have been detected that require a reboot to ensure system
integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved
ereport.io.fire.fabric ena=afd46c71b2e0a001 detector=[ version=0
scheme="dev"
device-path="/***@0/***@0/***@c/***@0" ] req_id=7a00 device_id=105e
vendor_id=8086 rev_id=6 dev_type=0 cap_off=e0 aer_off=100 sts_reg=4010
sts_sreg=0 pcix_sts_reg=0 pcix_bdg_sts_reg=0 dev_sts_reg=4 aer_ce=0 aer_ue=
40000 aer_sev=62011 aer_ctr=12 aer_h1=a002000 aer_h2=8200 aer_h3=7a010870
aer_h4=0 saer_ue=0 saer_sev=0 saer_ctr=0 saer_h1=0 saer_h2=0 saer_h3=0
saer_h4=0 remainder=2 severity=40
ereport.io.fire.fabric ena=afd46c71b2e0a001 detector=[ version=0
scheme="dev"
device-path="/***@0/***@0/***@c" ] req_id=360 device_id=8548 vendor_id=10b5
rev_id=aa dev_type=60 cap_off=68 aer_off=fb4 sts_reg=10 sts_sreg=0
pcix_sts_reg=0 pcix_bdg_sts_reg=0 dev_sts_reg=0 aer_ce=0 aer_ue=0 aer_sev=
62030 aer_ctr=1ff aer_h1=0 aer_h2=0 aer_h3=0 aer_h4=0 saer_ue=0 saer_sev=0
saer_ctr=0 saer_h1=0 saer_h2=0 saer_h3=0 saer_h4=0 remainder=1 severity=1
ereport.io.fire.fabric ena=afd46c71b2e0a001 detector=[ version=0
scheme="dev"
device-path="/***@0/***@0" ] req_id=200 device_id=8548 vendor_id=10b5
rev_id=
aa dev_type=50 cap_off=68 aer_off=fb4 sts_reg=10 sts_sreg=0 pcix_sts_reg=0
pcix_bdg_sts_reg=0 dev_sts_reg=0 aer_ce=0 aer_ue=0 aer_sev=62030
aer_ctr=1ff
aer_h1=0 aer_h2=0 aer_h3=0 aer_h4=0 saer_ue=0 saer_sev=0 saer_ctr=0
saer_h1=0
saer_h2=0 saer_h3=0 saer_h4=0 remainder=0 severity=1
panic[cpu40]/thread=2a10c289cc0: Fatal error has occured in: PCIe fabric.
000002a10c2b1c40 px:px_err_panic+174 (0, 1337c00, 2a10c2b1cf0, 41,
2a10c2b1cf1, 0)
%l0-3: 0000000000000034 00000000018fe000 0000000000000000 0000000000000001
%l4-7: 00000000018fe000 0000000000000000 0000000001846c00 ffffffffffffffff
000002a10c2b1d50 px:px_err_fabric_intr+b8 (300008e9e00, 33, 7a00,
300008d0210, 300008e9f50, 7a00000000000000)
%l0-3: 0000000000000001 0000060005ac4000 0000000000000020 0000060005bdccc0
%l4-7: 0000000000000041 0000000000000000 0000000000000000 0000000000000001
000002a10c2b1e40 px:px_msiq_intr+1c0 (300008e7ce8, 300008d0210, 132c9bc,
0, 300008e14f0, 60001c2a3e0) %l0-3: 0000000000000000 000002a10c2b1f10
0000000000000000 0000000000000003
%l4-7: 000002a10c2b1f40 00000600025fc000 0000000000000000 0000000000000033
000002a10c2b1f50 unix:current_thread+170 (16, 10000000000,
fffffefffffffeff, fffffefffffffeff, 0, 12) %l0-3: 000000000100985c
000002a10c289021 000000000000000e 000000000000003a
%l4-7: ffffffffffffffff 0000000000000000 0000000000000000 000002a10c2898d0
000002a10c289970 unix:cpu_halt+114 (30001b32000, 10000000000, 184d100,
28, 30001b32000, 60005b0b35c)
%l0-3: 0000000000000016 00000300005afa80 00000300005afa80 0000000000000000
%l4-7: 0000000000000000 0000000000000000 0000000000000001 0000000000000001
000002a10c289a20 unix:idle+128 (1819c00, 0, 30001b32000,
ffffffffffffffff, 29, 1818c00)
%l0-3: 0000060005b0b338 000000000000001b 0000000000000000 ffffffffffffffff
%l4-7: 0000060005b0b338 ffffffffffffffff 000000000184d100 000000000103afbc
syncing file systems...
======
Post by Scott DavenportPost by Eric SunHi,
Recently we got some error from N2/VF system, if anyone on this board
could give a pointer as where to further FA is appreciated.
1. On N2, Glendale, system panic due to "PCIe fabric". On this system,
FEM (fabric express module) is not installed.
I don't know the specific layout of this system, but the T2
(Niagara-2) has a built-in PIU. And if it's connected to at
least one PLX switch, that constitutes a fabric.
Do you have more info you can share? The panic itself would produce
a general FMA message (SUNOS-8000-0G, IIRC) - basically saying the
system had to panic. But if the panic was due to HW problem, I would
expect subsequent telemetry and another diagnosis. Look for FMA
message codes in the /var/adm/messages file, check 'fmadm faulty',
or run an 'fmdump'.
Post by Eric Sun2. On VF/Batoka, FMA logged
"fault.asic.ultraSPARC-T2plus.interconnect.lfu-f", any suggestion as
where FMA decument can be found on this error?
Each fault has a corresponding message id. That is printed to
the console and /var/adm/messages when the diagnosis is
issued. This particular one is http://www.sun.com/msg/SUN4V-8001-UH.
But in short you've had a single lane failure on a VF coherency
channel.
-- scott
http://blogs.sun.com/sdaven
_______________________________________________
fm-discuss mailing list