A Nagios plugin for Sun hardware failure detection

As I needed a script to parse Solaris ' prtdiag output, and couldn't find any, I just wrote one.
Analyzing prtdiag output made easy ...

The script is logicaly called check_prtdiag.
It's 100% config-file based, massively using Perl regular expressions and logic, as prtdiag output is very different from one system to another.

Without any parameter, check_prtdiag will launch a prtdiag -vcommand and parse its output.
For testing purpose, you can use the -v flag to get more verbose output (on STDERR), or use the f <file>option to use the content of the specified file as input.

See check_prtdiag.txt for details about configuration file format.

Edited 2008-12-05 - release 1.10 :
You can now specify an alternate config file location using the -c <file> option.
The prtdiag command is no more mandatory when using the -f <file> option for testing purpose.
The Unrecognized escape \s passed through warnings should be gone.
Added tests for SunFire V210 in the sample configuration file.

Edited 2009-01-08 - release 1.11 :
Corrected exit codes on CRITICAL and WARNING statuses.
Thanks to Jonathon Weiss for finding this bug.

Edited 2009-01-08 - release 1.12 :
Release 1.11 was crappy.
Thanks to Eric Pearce for feedback.

The provided sample configuration file checks these :

[Enterprise 150]

  • IO Cards : checks for "No failures found" / "No System Faults" messages presence

[Enterprise 250]

  • IO Cards : checks for "No failures found" / "No System Faults" messages presence
  • Memory : looks for memory modules not in "OK" state
  • System leds : looks for lit 'ERROR' leds
  • Disks : looks for disks not in 'OK' or 'EMPTY' states
  • Fans : looks for fans not in 'OK' state
  • Power Supplies : looks for PSU not in 'OK' state

[Enterprise 450]

  • IO Cards : checks for "No failures found" / "No System Faults" messages presence
  • Memory : looks for memory modules not in "OK" state
  • System leds : looks for lit 'ERROR' leds
  • Disks : looks for disks not in 'OK' or 'EMPTY' states
  • Fans : looks for fans not in 'OK' state
  • Power Supplies : looks for PSU not in 'OK' state

[Enterprise 3000]

  • System leds : looks for lit failure system led
  • Fans : looks for fans not in 'OK' state
  • Temperatures : looks for temperature sensors not in 'stable' trend
  • Power Supplies : looks for PSU not in 'OK' state
  • IO Cards : checks for "No failures found" / "No System Faults" messages presence

[SunFire 280R]

  • System leds : looks for lit 'FAULT' leds
  • Fans : looks for fans not in 'NO_FAULT' state
  • Disks : looks for disks not in 'NO_FAULT' state
  • Power Supplies : looks for PSUs not in 'OK' state

[SunFire V120]

  • IO Cards : checks for "No failures found" / "No System Faults" messages presence

[SunFire V210]

  • CPU : check for CPUs not in 'on-line' state
  • Fans : checks for fans not in 'okay' state
  • System leds : looks for lit 'SERVICE' leds
  • Temperatures : looks for temperature sensors not in 'okay' state
  • Voltages : looks for voltage sensors not in 'okay' state
  • Current : looks for current sensors not in 'okay' state
  • Field Replaceable Units : looks for FRUs not in 'okay' (PSUs) or 'present' (disks) states

[SunFire V240]

  • Fans : checks for fans not in 'okay' state
  • System leds : looks for lit 'SERVICE' leds
  • Temperatures : looks for temperature sensors not in 'okay' state
  • Voltages : looks for voltage sensors not in 'okay' state
  • Current : looks for current sensors not in 'okay' state
  • Field Replaceable Units : looks for FRUs not in 'okay' (PSUs) or 'present' (disks) states

[SunFire V440]

  • Fans : checks for fans not in 'okay' state
  • System leds : looks for lit 'SERVICE' leds
  • Temperatures : looks for temperature sensors not in 'okay' state
  • Voltages : looks for voltage sensors not in 'okay' state
  • Current : looks for current sensors not in 'okay' state
  • Field Replaceable Units : looks for FRUs not in 'okay' (PSUs) or 'present' (disks) states

[SunFire V490]

  • Temperatures : looks for temperature sensors not in 'OK' state
  • System leds : looks for lit 'FAULT' leds
  • Disks : looks for disks not in 'NO_FAULT' state
  • Fans : looks for fans not in 'NO_FAULT' state
  • Power Supplies : looks for PSUs not in 'NO_FAULT' state

[SunFire 880]

  • Temperatures : looks for temperature sensors not in 'OK' state
  • System leds : looks for lit 'FAULT' leds
  • Disks : looks for disks with lit 'FAULT' led
  • Fans : looks for fans not in 'OK' state
  • Power Supplies : looks for PSUs not in 'GOOD' state
  • IO Cards : checks for "No failures found" / "No System Faults" messages presence

[Ultra 10]

  • IO Cards : checks for "No failures found" / "No System Faults" messages presence

Haut de page