Discussion:
Unable to load disk-monitor plugin / how to change SES indicators?
Tom Lanyon
2009-11-05 05:02:20 UTC
Permalink
Hi all,

I'm trying to discover two things regarding my test system which has a
bunch of SATA disks attached to a SAS expander:

* if I have a drive error, how do I know which cXtYdZ logical device
maps to which physical disk/bay?

* how can I read temperature information from the drives?

There seems to be some work done in this area by Eric Schrock and Rob
Johnston[1], which has led me to the disk-monitor FMA plugin. I am
assuming that this plugin will automatically handle temperature
monitoring and lighting the fault/locate LEDs but am not entirely sure
of this.

I attempted to load the module but received:

# fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so
fmadm: failed to load /usr/lib/fm/fmd/plugins/disk-monitor.so: module
failed to load (consult fmd(1M) log)


I checked the log as instructed, but no errors or warnings were
recorded. I know the log is working because when I accidentally tried
to load the plugin's .conf file instead of the .so, I *did* receive an
error in the fmd log:

Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init
nvlist version: 0
version = 0x0
class = ereport.fm.fmd.mod_init
ena = 0x751b2ea5e2305401
msg = failed to load /usr/lib/fm/fmd/plugins/disk-
monitor.conf: Operation not supported

__ttl = 0x1
__tod = 0x4af256cc 0x77a1d84


Can anyone suggest whether this is indeed what I should be doing, and
if so, why can't I load this FMA plugin?

Additionally, even if I get this running - what methods are there to
manually identify a drive in the enclosure? ie, how do I send a
command to the SES device? There needs to be some level of manual
control available for this as I can think of multiple scenarios where
I'd need to identify and extract a non-faulty drive from an enclosure.

Regards,
Tom

[1] - http://blogs.sun.com/eschrock/entry/ses_sensors
Rob Johnston
2009-11-05 07:08:28 UTC
Permalink
Hi Tom,

The disk-monitor module is not actually used to detect or diagnose disk faults,
but rather is a response agent designed for the thumper/thor platforms. The
disk-monitor module subscribes to FMA diagnosis and repair events and monitors
changes in the disk topology (by listening to hotplug sysevents). In response
to these events, it will send requests to the service processor (via IPMI) to
update FRU information and flip the disk bay LED's on/off, as appropriate.

In order for the disk-monitor module to operate, it needs to know the disk
topology of the system, including, as you alluded to, the mapping of solaris
disk devices to physical disk bays. For internal SATA/SAS disks, the code that
constructs the disk topology currently relies on a set of xml files where we've
hard-coded the mapping of drive bays to device nodes for a subset of Sun X64
platforms. For many (but not all[1]) external storage enclosures which support
SES, we're able to dynamically derive the disk topology without the aid of any
hard-coded information.

In the absence of this disk topology, disk-monitor will bail out during
initialization, which is likely happening on your system.

That said, disk error telemetry is actually fed into the Fault Manager from two
sources

1) The disk-transport module, which uses libdiskstatus to check for three
failure conditions via uSCSI interfaces:

over temperature
predictive failure
self-test failure

2) The sd driver, which will generate error telemetry for problems detected at
the target driver level.

Unfortunately, even though the your system will be capable of generating error
telemetry for your disks, the system that diagnosis faults from the error
telemetry also needs to consume information in the disk topology, so you're
still probably out of luck.

Hope this helps,

rob


[1] The full answer as to why can't derive the topology on all SES storage
enclosures is a bit too involved to dive into here, but it basically depends on
the complexity of the internal SAS topology of the array in question. If the
array presents a single root target at the top of the topology then libses will
do the right thing. However, if the topology uses SAS expanders to either
multi-attach the disks or to talk to different subsets of disks then SES will
present multiple targets at the top of the topology and to libses it may appear
as multiple storage enclosures, which cause us to generate an inaccurate topology.

There is a workaround for the latter case - libses provides a means of
overriding the interpretation by delivering a small plugin module to ses (either
a model-specific plugin for a specific array a single vendor specific plugin.

There are a couple projects underway
Post by Tom Lanyon
Hi all,
I'm trying to discover two things regarding my test system which has a
* if I have a drive error, how do I know which cXtYdZ logical device
maps to which physical disk/bay?
* how can I read temperature information from the drives?
There seems to be some work done in this area by Eric Schrock and Rob
Johnston[1], which has led me to the disk-monitor FMA plugin. I am
assuming that this plugin will automatically handle temperature
monitoring and lighting the fault/locate LEDs but am not entirely sure
of this.
# fmadm load /usr/lib/fm/fmd/plugins/disk-monitor.so
module failed to load (consult fmd(1M) log)
I checked the log as instructed, but no errors or warnings were
recorded. I know the log is working because when I accidentally tried to
load the plugin's .conf file instead of the .so, I *did* receive an
Nov 05 2009 15:08:36.125443460 ereport.fm.fmd.mod_init
nvlist version: 0
version = 0x0
class = ereport.fm.fmd.mod_init
ena = 0x751b2ea5e2305401
msg = failed to load
/usr/lib/fm/fmd/plugins/disk-monitor.conf: Operation not supported
__ttl = 0x1
__tod = 0x4af256cc 0x77a1d84
Can anyone suggest whether this is indeed what I should be doing, and if
so, why can't I load this FMA plugin?
Additionally, even if I get this running - what methods are there to
manually identify a drive in the enclosure? ie, how do I send a command
to the SES device? There needs to be some level of manual control
available for this as I can think of multiple scenarios where I'd need
to identify and extract a non-faulty drive from an enclosure.
Regards,
Tom
[1] - http://blogs.sun.com/eschrock/entry/ses_sensors
_______________________________________________
fm-discuss mailing list
Loading...