Discussion:
PCI errors?
Matty
2008-05-10 04:42:35 UTC
Permalink
After rebooting one of my V880s this morning, I noticed that the fault
manager detected some issues with a PCI bus [1][2][3][4]. Are there
any documents available that describe how to interpret the fmdump
output? Also, is there a reason fmadm doesn't actually list the bus /
motherboard as being faulty?

Thanks for any insight,
- Ryan
--
UNIX Administrator
http://prefetch.net

[1]

# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded dev:////***@8,700000/***@4
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded dev:////***@8,700000/***@5
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded dev:////***@8,700000/***@2
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded dev:////***@8,700000/***@3
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded dev:////***@8,700000/***@1
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded mod:///mod-name=emlxs/mod-id=101
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded mod:///mod-name=glm/mod-id=146
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------
degraded mod:///mod-name=pci_pci/mod-id=132
a0461e5e-4356-ca7b-ee83-c66816b9caba
-------- ----------------------------------------------------------------------

[2]

# fmdump
TIME UUID SUNW-MSG-ID
Jul 17 10:27:50.5982 a0461e5e-4356-ca7b-ee83-c66816b9caba PCI-8000-42

[3]

# fmdump -v -u a0461e5e-4356-ca7b-ee83-c66816b9caba
TIME UUID SUNW-MSG-ID
Jul 17 10:27:50.5982 a0461e5e-4356-ca7b-ee83-c66816b9caba PCI-8000-42
13% defect.io.pci.driver

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=5/pcifn=0
Affects: mod:///mod-name=emlxs/mod-id=101
FRU: pkg:///SUNWemlxs

13% defect.io.pci.driver

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=3/pcifn=0
Affects: mod:///mod-name=pci_pci/mod-id=132
FRU: pkg:///SUNWckr

13% defect.io.pci.driver

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=1/pcifn=0
Affects: mod:///mod-name=glm/mod-id=146
FRU: pkg:///SUNWpd

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=4/pcifn=0
Affects: dev:////***@8,700000/***@4
FRU: hc:///component=PCI 1

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=5/pcifn=0
Affects: dev:////***@8,700000/***@5
FRU: hc:///component=PCI 0

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=2/pcifn=0
Affects: dev:////***@8,700000/***@2
FRU: hc:///component=PCI 3

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=3/pcifn=0
Affects: dev:////***@8,700000/***@3
FRU: hc:///component=PCI 2

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=1/pcifn=0
Affects: dev:////***@8,700000/***@1
FRU: hc:///component=MB

[4]

# fmdump -V -u a0461e5e-4356-ca7b-ee83-c66816b9caba |more
TIME UUID SUNW-MSG-ID
Jul 17 10:27:50.5982 a0461e5e-4356-ca7b-ee83-c66816b9caba PCI-8000-42

TIME CLASS ENA
Jul 17 10:16:57.5205 ereport.io.pci.sta 0x2e1960defd800801

nvlist version: 0
version = 0x0
class = list.suspect
uuid = a0461e5e-4356-ca7b-ee83-c66816b9caba
code = PCI-8000-42
diag-time = 1184682470 597218
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = SUNW,Sun-Fire-880
server-id = orange
(end authority)

mod-name = eft
mod-version = 1.16
(end de)

fault-list-sz = 0x8
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.io.pci.driver
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
scheme = mod
version = 0x0
mod-id = 101
mod-name = emlxs
mod-desc = SunFC emlxs FCA v20070518-2.20j
mod-pkg = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWemlxs
pkg-version = 11.10.0,REV=2005.01.30.01.58
(end mod-pkg)

(end asru)

fru = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWemlxs
pkg-version = 11.10.0,REV=2005.01.30.01.58
(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 5
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[0])
(start fault-list[1])
nvlist version: 0
version = 0x0
class = defect.io.pci.driver
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
scheme = mod
version = 0x0
mod-id = 132
mod-name = pci_pci
mod-desc = Standard PCI to PCI bridge nexu
mod-pkg = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWckr
pkg-version = 11.10.0,REV=2005.01.21.15.53
(end mod-pkg)

(end asru)

fru = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWckr
pkg-version = 11.10.0,REV=2005.01.21.15.53
(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 3
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[1])
(start fault-list[2])
nvlist version: 0
version = 0x0
class = defect.io.pci.driver
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
scheme = mod
version = 0x0
mod-id = 146
mod-name = glm
mod-desc = GLM SCSI HBA Driver 1.206.
mod-pkg = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWpd
pkg-version = 11.10.0,REV=2005.01.21.15.53
(end mod-pkg)

(end asru)

fru = (embedded nvlist)
nvlist version: 0
scheme = pkg
version = 0x0
pkg-basedir = /
pkg-inst = SUNWpd
pkg-version = 11.10.0,REV=2005.01.21.15.53
(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 1
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[2])
(start fault-list[3])
nvlist version: 0
version = 0x0
class = fault.io.pci.device
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /***@8,700000/***@4
(end asru)

fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x1
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = component
hc-id = PCI 1
(end hc-list[0])

(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 4
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[3])
(start fault-list[4])
nvlist version: 0
version = 0x0
class = fault.io.pci.device
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /***@8,700000/***@5
(end asru)

fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x1
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = component
hc-id = PCI 0
(end hc-list[0])

(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 5
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[4])
(start fault-list[5])
nvlist version: 0
version = 0x0
class = fault.io.pci.device
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /***@8,700000/***@2
(end asru)

fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x1
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = component
hc-id = PCI 3
(end hc-list[0])

(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 2
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[5])
(start fault-list[6])
nvlist version: 0
version = 0x0
class = fault.io.pci.device
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /***@8,700000/***@3
(end asru)

fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x1
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = component
hc-id = PCI 2
(end hc-list[0])

(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 3
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[6])
(start fault-list[7])
nvlist version: 0
version = 0x0
class = fault.io.pci.device
certainty = 0xd
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /***@8,700000/***@1
(end asru)

fru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x1
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = component
hc-id = MB
(end hc-list[0])

(end fru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = hc
hc-root =
hc-list-sz = 0x5
hc-list = (array of embedded nvlists)
(start hc-list[0])
nvlist version: 0
hc-name = motherboard
hc-id = 0
(end hc-list[0])
(start hc-list[1])
nvlist version: 0
hc-name = hostbridge
hc-id = 0
(end hc-list[1])
(start hc-list[2])
nvlist version: 0
hc-name = pcibus
hc-id = 1
(end hc-list[2])
(start hc-list[3])
nvlist version: 0
hc-name = pcidev
hc-id = 1
(end hc-list[3])
(start hc-list[4])
nvlist version: 0
hc-name = pcifn
hc-id = 0
(end hc-list[4])

(end resource)

(end fault-list[7])

fault-status = 0x1 0x1 0x1 0x1 0x1 0x1 0x1 0x1
__ttl = 0x1
__tod = 0x469cd1e6 0x23a84fa8
Richard Elling
2008-05-10 23:55:25 UTC
Permalink
Post by Matty
After rebooting one of my V880s this morning, I noticed that the fault
manager detected some issues with a PCI bus [1][2][3][4]. Are there
any documents available that describe how to interpret the fmdump
output? Also, is there a reason fmadm doesn't actually list the bus /
motherboard as being faulty?
Nit: the SCSI controller on the MB is identified as possibly being
faulty:

13% fault.io.pci.device

Problem in: hc:///motherboard=0/hostbridge=0/pcibus=1/pcidev=1/pcifn=0
Affects: dev:////***@8,700000/***@1
FRU: hc:///component=MB


Note that all of the suspect parts share the same PCI bus (pcibus=1)

One of the problems with busses, like PCI (and unlike PCI-Express)
is that if a device hangs the bus, then it may be difficult to isolate.
Hence, the message ID information (PCI-8000-42) shows that we
suspect a number of devices, but can't exactly tell which one is defective.
http://www.sun.com/msg/PCI-8000-42

I can't speak for the V880 diagnosis engine. But the system does support
hot-plug of the PCI cards. As such, each card slot has a set of diagnosis
LEDs which show the state of the card. You might take a peek.
http://docs.sun.com/app/docs/doc/806-6592-11

-- richard

Loading...