Discussion:
MC_CH_reserved_ERR and MC_CH_RD_ERR
Piotr Jasiukajtis
2009-12-03 12:50:11 UTC
Permalink
Hi,

Where can I find more information regarding these error names?
--
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com
Gavin Maltby
2009-12-03 13:31:32 UTC
Permalink
Hi
Post by Piotr Jasiukajtis
Hi,
Where can I find more information regarding these error names?
The Intel docs for your processor (which can usually be found
on the developer.intel.com site, but take some digging) should
include this string or something close to it.

In the "ereports" that Solaris raises for MCA errors we include
the raw MCi_STATUS etc info and some interpretation of that.
Part of that is the rendering of the strings like MC_CH_RD_ERR
which is what the Intel and AMD docs list as an interpretation of
the error type. But the docs essentially give you a format
string and then you have to fill in the template based on
the status values.

I suspect you must have gotten these strings from the fmdump -eV
output? The ereport class is (arguably) more descriptive.
These look like Quickpath memory ECC errors during a read
operation, and I think the class you'll see is
ereport.cpu.intel.quickpath.mem_ce. These events are fed
into a diagnosis engine which will tolerate a number of
memory ECC errors but may eventually diagnose a dimm
as faulty. Unless you are seeing other memory related
errors (like resets) or are experiencing zillions of these
errors I wouldn't worry about them.

Cheers

Gavin
Piotr Jasiukajtis
2009-12-03 14:35:40 UTC
Permalink
Hi Gavin,

Thanks for the reply.

Yes I've gotten these from 'fmdump -eV'.
I have a lot of errors in class: 'ereport.cpu.intel.quickpath.mem_ce'
and I realized the problem is with memory/memory controller however
'fmadm faulty' doesn't say anything about that.
Post by Gavin Maltby
Hi
Post by Piotr Jasiukajtis
Hi,
Where can I find more information regarding these error names?
The Intel docs for your processor (which can usually be found
on the developer.intel.com site, but take some digging) should
include this string or something close to it.
In the "ereports" that Solaris raises for MCA errors we include
the raw MCi_STATUS etc info and some interpretation of that.
Part of that is the rendering of the strings like MC_CH_RD_ERR
which is what the Intel and AMD docs list as an interpretation of
the error type.  But the docs essentially give you a format
string and then you have to fill in the template based on
the status values.
I suspect you must have gotten these strings from the fmdump -eV
output?  The ereport class is (arguably) more descriptive.
These look like Quickpath memory ECC errors during a read
operation, and I think the class you'll see is
ereport.cpu.intel.quickpath.mem_ce.  These events are fed
into a diagnosis engine which will tolerate a number of
memory ECC errors but may eventually diagnose a dimm
as faulty.  Unless you are seeing other memory related
errors (like resets) or are experiencing zillions of these
errors I wouldn't worry about them.
Cheers
Gavin
--
Piotr Jasiukajtis | estibi | SCA OS0072
http://estseg.blogspot.com
_______________________________________________
fm-discuss mai
Loading...