Discussion:
When do the FMD APIs become usable outside Sun?
Mark R. Bowyer
2008-12-05 16:40:24 UTC
Permalink
Hi, My developer customer has asked the following question of me, but
Currently our event management uses the traditional syslog based
approach (monitoring the messages written to /dev/log) for OS/OEM
error reporting and clearance. It is planned that event management
should make use of the Fault Management facilities available in
Solaris 10 for the same because of its many advantages. The CR
addresses this change. For this purpose, we would require to write our
own modules which can register with fmd, subscribe to different
classes of events and report the received events as our own events.
Hence we need the support for external modules from fmd and also the
client interfaces to be exposed so that we can use them.
The header file /usr/include/fm/fmd_api.h present on our nodes has a
comment saying that these interfaces should not be used outside Sun
until it is publicly documented.
The following link lists the client interfaces of fmd. The chapter
2.1.3 says external modules are not currently supported and will be
supported in future.
_http://opensolaris.org/os/community/fm/files/FMDPRM.pdf_
<http://opensolaris.org/os/community/fm/files/FMDPRM.pdf;jsessionid=EACEF0792404C833058977A7E76E472F>
I have some queries related to Solaris 10 Fault Management Daemon
(fmd) client interfaces (present in the header file
1) Are these interface exposed for applications currently? If not, by
when Sun is expecting to do this?
2) Does fmd currently support external modules (diagnosis engines
and/or response agents). If not, by when this support will be provided.
So, how close to stable are these in OpenSolaris, and when will they
become "Public" In Solaris 10?

Ta,
Mark.
Gavin Maltby
2008-12-08 03:41:58 UTC
Permalink
Hi,
Post by Mark R. Bowyer
Hi, My developer customer has asked the following question of me, but
Currently our event management uses the traditional syslog based
approach (monitoring the messages written to /dev/log) for OS/OEM
error reporting and clearance. It is planned that event management
should make use of the Fault Management facilities available in
Solaris 10 for the same because of its many advantages. The CR
addresses this change. For this purpose, we would require to write our
own modules which can register with fmd, subscribe to different
classes of events
Are we thinking of existing events here, or of new events to be generated
by customer-provided software? Is the intention to propogate
error report events themselves, or to propagate fault diagnoses that
may arise from that ereport telemetry stream?
Post by Mark R. Bowyer
and report the received events as our own events.
Hence we need the support for external modules from fmd and also the
client interfaces to be exposed so that we can use them.
I don't see that the above necessarily requires the use of external
modules? An external module (no support implemented as yet, as noted)
is one which is realized in a separate address space, ie process, to
the fault management daemon fmd; all our modules today are .so
objects that are dlopen'd into the fmd address space.

The above could most likely be realized as a non-external module.
The advantage to having it as an external module would be that
it could fail reasonably independently of the fault manager itself,
eg a SIGSEGV in the external module would not harm the state of fmd.
Similarly, modules that deal with notoriously badly behaved hardware,
say writing to FRUID over I2C buses, would be better housed in
external modules to try to isolate fmd from those failures.

As I said, so far we've only used non-external modules. Partly that
is because (on systems that implement DFRUID updates) the DFRUID
updates are perfomed on a service processor, and the Solaris fmd
modules are for the most part pretty clean-living and not too
dangerous to fmd itself.
Post by Mark R. Bowyer
The header file /usr/include/fm/fmd_api.h present on our nodes has a
comment saying that these interfaces should not be used outside Sun
until it is publicly documented.
The following link lists the client interfaces of fmd. The chapter
2.1.3 says external modules are not currently supported and will be
supported in future.
Indeed. I am just spinning up a project that may itself have a good
reason to use external modules, so that *may* lead us to implement
them in Nevada (and OpenSolaris). If implemented, there's no guarantee
at all that they'd go back to Solaris 10 unless someone makes a strong
case for it (or some new platform requires it etc).
Post by Mark R. Bowyer
_http://opensolaris.org/os/community/fm/files/FMDPRM.pdf_
<http://opensolaris.org/os/community/fm/files/FMDPRM.pdf;jsessionid=EACEF0792404C833058977A7E76E472F>
Yes these interfaces are still all private in one form or another - Appendix A
lists the levels. They're private as in "subject to change without
notification", no as in "secret" of course. The FMD_API_VERSION
is only at 4 now (started at 1) so we're not making frequent
incompatible changes, so we could certainly look at raising the
commitment level soon. Alternatively they can of course be
used at your own risk today, but you may have to
redeliver for each S10 update etc.
Post by Mark R. Bowyer
I have some queries related to Solaris 10 Fault Management Daemon
(fmd) client interfaces (present in the header file
1) Are these interface exposed for applications currently? If not, by
when Sun is expecting to do this?
No - as per Appendix A they're all private to the ON consolidation or
our project.

We have no schedule planned for exposing these interfaces. We're open
to raising the stability level, but we can't lock it down so tight that
future projects are restricted from making any changes. I'd
guess that we could raise it to a level that won't change incompatibly
in Solaris 10, but allows change in the next minor release (i.e, 5.11
aka Nevada & OpenSolaris).
Post by Mark R. Bowyer
2) Does fmd currently support external modules (diagnosis engines
and/or response agents). If not, by when this support will be provided.
There is no schedule - nothing delivered in Solaris has really required
them as yet. As I said above I may have a need for them in a new
(as yet very vague) project.

Gavin
Post by Mark R. Bowyer
So, how close to stable are these in OpenSolaris, and when will they
become "Public" In Solaris 10?
Ta,
Mark.
_______________________________________________
fm-discuss mailing list
Mark R. Bowyer
2008-12-08 08:33:34 UTC
Permalink
Many thanks, Gavin =O)

Mark.
Post by Gavin Maltby
Hi,
Post by Mark R. Bowyer
Hi, My developer customer has asked the following question of me, but
Currently our event management uses the traditional syslog based
approach (monitoring the messages written to /dev/log) for OS/OEM
error reporting and clearance. It is planned that event management
should make use of the Fault Management facilities available in
Solaris 10 for the same because of its many advantages. The CR
addresses this change. For this purpose, we would require to write our
own modules which can register with fmd, subscribe to different
classes of events
Are we thinking of existing events here, or of new events to be generated
by customer-provided software? Is the intention to propogate
error report events themselves, or to propagate fault diagnoses that
may arise from that ereport telemetry stream?
Post by Mark R. Bowyer
and report the received events as our own events.
Hence we need the support for external modules from fmd and also the
client interfaces to be exposed so that we can use them.
I don't see that the above necessarily requires the use of external
modules? An external module (no support implemented as yet, as noted)
is one which is realized in a separate address space, ie process, to
the fault management daemon fmd; all our modules today are .so
objects that are dlopen'd into the fmd address space.
The above could most likely be realized as a non-external module.
The advantage to having it as an external module would be that
it could fail reasonably independently of the fault manager itself,
eg a SIGSEGV in the external module would not harm the state of fmd.
Similarly, modules that deal with notoriously badly behaved hardware,
say writing to FRUID over I2C buses, would be better housed in
external modules to try to isolate fmd from those failures.
As I said, so far we've only used non-external modules. Partly that
is because (on systems that implement DFRUID updates) the DFRUID
updates are perfomed on a service processor, and the Solaris fmd
modules are for the most part pretty clean-living and not too
dangerous to fmd itself.
Post by Mark R. Bowyer
The header file /usr/include/fm/fmd_api.h present on our nodes has a
comment saying that these interfaces should not be used outside Sun
until it is publicly documented.
The following link lists the client interfaces of fmd. The chapter
2.1.3 says external modules are not currently supported and will be
supported in future.
Indeed. I am just spinning up a project that may itself have a good
reason to use external modules, so that *may* lead us to implement
them in Nevada (and OpenSolaris). If implemented, there's no guarantee
at all that they'd go back to Solaris 10 unless someone makes a strong
case for it (or some new platform requires it etc).
Post by Mark R. Bowyer
_http://opensolaris.org/os/community/fm/files/FMDPRM.pdf_
<http://opensolaris.org/os/community/fm/files/FMDPRM.pdf;jsessionid=EACEF0792404C833058977A7E76E472F>
Yes these interfaces are still all private in one form or another - Appendix A
lists the levels. They're private as in "subject to change without
notification", no as in "secret" of course. The FMD_API_VERSION
is only at 4 now (started at 1) so we're not making frequent
incompatible changes, so we could certainly look at raising the
commitment level soon. Alternatively they can of course be
used at your own risk today, but you may have to
redeliver for each S10 update etc.
Post by Mark R. Bowyer
I have some queries related to Solaris 10 Fault Management Daemon
(fmd) client interfaces (present in the header file
1) Are these interface exposed for applications currently? If not, by
when Sun is expecting to do this?
No - as per Appendix A they're all private to the ON consolidation or
our project.
We have no schedule planned for exposing these interfaces. We're open
to raising the stability level, but we can't lock it down so tight that
future projects are restricted from making any changes. I'd
guess that we could raise it to a level that won't change incompatibly
in Solaris 10, but allows change in the next minor release (i.e, 5.11
aka Nevada & OpenSolaris).
Post by Mark R. Bowyer
2) Does fmd currently support external modules (diagnosis engines
and/or response agents). If not, by when this support will be provided.
There is no schedule - nothing delivered in Solaris has really required
them as yet. As I said above I may have a need for them in a new
(as yet very vague) project.
Gavin
Post by Mark R. Bowyer
So, how close to stable are these in OpenSolaris, and when will they
become "Public" In Solaris 10?
Ta,
Mark.
_______________________________________________
fm-discuss mailing list
Mark R. Bowyer
2008-12-17 15:33:59 UTC
Permalink
Hi again,
Post by Gavin Maltby
Are we thinking of existing events here, or of new events to be
generated by customer-provided software? Is the intention to
propogate error report events themselves, or to propagate fault
diagnoses that may arise from that ereport telemetry stream?
<Nishanth> TSP event management wants to propagate all the error
telemetry events and fault diagnoses generated by the existing
components (which use Solaris 10 FMA) as well as the new components
which will be integrated in future (which will use Solaris 10 FMA).
Post by Gavin Maltby
The above could most likely be realized as a non-external module.
The advantage to having it as an external module would be that it
could fail reasonably independently of the fault manager itself, eg a
SIGSEGV in the external module would not harm the state of fmd.
Similarly, modules that deal with notoriously badly behaved hardware,
say writing to FRUID over I2C buses, would be better housed in
external modules to try to isolate fmd from those failures.
<Nishanth> This is exactly the reason why I mentioned that we need
external module support. I don’t think it is a good idea to write a TSP
module which registers with fmd as a plug-in (non-external) module. It is
not good to have internal errors of a TSP module affecting Solaris 10 fmd.
Post by Gavin Maltby
Yes these interfaces are still all private in one form or another -
Appendix A lists the levels. They're private as in "subject to change
without notification", no as in "secret" of course. The
FMD_API_VERSION is only at 4 now (started at 1) so we're not making
frequent incompatible changes, so we could certainly look at raising
the commitment level soon. Alternatively they can of course be used
at your own risk today, but you may have to redeliver for each S10
update etc.
<Nishanth> So is it currently in a state to start using these interfaces
outside Sun (for eg: in TSP) to develop a module, keeping this risk in mind?
We could not see any fmd client API which can be used by a client
module to interpret the different fields of a received event (error
telemetry event and/or fault event). Are such APIs not existing? If
not, will they be provided?
I think they're ready to start coding, but want us to say they can, as
long as they're aware of these risks? What do you think?

Ta,
--
| o o Software Support Engineering,
/v\ark R. Bowyer. SPARC House, Guillemont Park,
`-' Minley Rd, Blackwater,
Tel: +44 (0)1252 420691 Camberley, SURREY, GU17 9QG
Fax: +44 (0)1252 421658 United Kingdom __|
Mark R. Bowyer
2009-01-05 12:55:47 UTC
Permalink
Hi,

Can anyone add to this?

Thanks,
Mark.
Post by Mark R. Bowyer
Hi again,
Post by Gavin Maltby
Are we thinking of existing events here, or of new events to be
generated by customer-provided software? Is the intention to
propogate error report events themselves, or to propagate fault
diagnoses that may arise from that ereport telemetry stream?
<Nishanth> TSP event management wants to propagate all the error
telemetry events and fault diagnoses generated by the existing
components (which use Solaris 10 FMA) as well as the new components
which will be integrated in future (which will use Solaris 10 FMA).
Post by Gavin Maltby
The above could most likely be realized as a non-external module.
The advantage to having it as an external module would be that it
could fail reasonably independently of the fault manager itself, eg a
SIGSEGV in the external module would not harm the state of fmd.
Similarly, modules that deal with notoriously badly behaved hardware,
say writing to FRUID over I2C buses, would be better housed in
external modules to try to isolate fmd from those failures.
<Nishanth> This is exactly the reason why I mentioned that we need
external module support. I don’t think it is a good idea to write a TSP
module which registers with fmd as a plug-in (non-external) module. It is
not good to have internal errors of a TSP module affecting Solaris 10 fmd.
Post by Gavin Maltby
Yes these interfaces are still all private in one form or another -
Appendix A lists the levels. They're private as in "subject to change
without notification", no as in "secret" of course. The
FMD_API_VERSION is only at 4 now (started at 1) so we're not making
frequent incompatible changes, so we could certainly look at raising
the commitment level soon. Alternatively they can of course be used
at your own risk today, but you may have to redeliver for each S10
update etc.
<Nishanth> So is it currently in a state to start using these interfaces
outside Sun (for eg: in TSP) to develop a module, keeping this risk in mind?
We could not see any fmd client API which can be used by a client
module to interpret the different fields of a received event (error
telemetry event and/or fault event). Are such APIs not existing? If
not, will they be provided?
I think they're ready to start coding, but want us to say they can, as
long as they're aware of these risks? What do you think?
Ta,
_______________________________________________
fm-discuss mailing list
--
| o o Software Support Engineering,
/v\ark R. Bowyer. SPARC House, Guillemont Park,
`-' Minley Rd, Blackwater,
Tel: +44 (0)1252 420691 Camberley, SURREY, GU17 9QG
Fax: +44 (0)1252 421658 United Kingdom __|
Loading...