Discussion:
Updated: Solaris FMA Demo Kit
Scott Davenport
18 years ago
Permalink
Hello,

I'm pleased to announce that the Solaris FMA Demo Kit has been updated
to include CPU demo support for the UltraSPARC-T1 and UltraSPARC-T2
processors. This update is for CPU support only...I'm still working on
memory support.

For those that missed Rob Johnston's earlier announcement:

The Solaris FMA Demo Kit consists of a set of PERL and Korn shell
scripts which implement an automated harness for executing FMA demos.
The Demo Kit also provides example demos which demonstrate Solaris'
ability to handle and diagnose CPU, Memory and PCI I/O errors. The
Solaris FMA Demo kit is designed to run on stock Solaris systems (both
SPARC and x86), out-of-the-box - no custom error injection hardware or
drivers are required.

For more information on the demo kit, including download, installation
and usage instructions, please see:

http://www.opensolaris.org/os/community/fm/demokit/

Thanks,
-scott
Scott Davenport
18 years ago
Permalink
Scott,
Are there any plans to make this work for OPL (SPARCVI/VII) someday?
[expanding to fm-dicsuss, seeking OPL knowledge/expertise]

Hi,

I don't know too much about OPL's FMA model...but I thought that
OPL did diagnosis some of their diagnosis, such as CPUs, on the
service processor (XCSF?? or some acronym like that). I'm not sure
how that'd snap in with the demo as-is.

<brainstorming>
I haven't reviewed the OPL portfolio, so I'm not up on their telemetry
flow. But supposing I'm correct and diagnosis is in the SP, the demo
will fminject telemetry into the domain. The injected ereports would
need to flow to the SP for diagnosis, diagnosis would occur, and then
the fault events would flow back into the domain.

Such plumbing is there to allow event flow, but not certain what events
OPL's ETM subscribes to on each side. Also, the demo might only be able
to run live. When I added in the T1/T2 support, I had to ensure that
in a simulation world, the ETM was disabled. Since that module opens
a communication path back to the SP, having more than one running was
a bad thing, and resulted in ereport.fm.fmd.module events.
</brainstorming>

Hopefully someone else on fm-discuss has more insight into OPL's
fault model than I.

Thanks,
-scott
...
Loading...