Bug 7238 - we should monitor more things such as servers' date
Summary: we should monitor more things such as servers' date
Status: RESOLVED FIXED
Alias: None
Product: Infrastructure
Classification: Unclassified
Component: Others (show other bugs)
Version: unspecified
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Sysadmin Team
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 7228
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-29 10:41 CEST by Thierry Vignaud
Modified: 2012-08-29 11:17 CEST (History)
2 users (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments

Description Thierry Vignaud 2012-08-29 10:41:34 CEST
As shown in bug #7228, we should monitor more things such as servers' date.

Also, as yesterday mail failure show, we should monitor more services and send mail alerts to sysadmin ml on error.
Thierry Vignaud 2012-08-29 10:41:59 CEST

Depends on: (none) => 7228

Comment 1 Thomas Backlund 2012-08-29 11:03:06 CEST
We already do xymon monitoring wich alerts us on a separate list to not flood sysadm list, and sympa also notified us that it died when it lost db access.

But as there was server maintenance in progress in the DC, there was no point in restarting services just to have them fail again.

Status: NEW => RESOLVED
CC: (none) => tmb
Resolution: (none) => FIXED

Comment 2 Thierry Vignaud 2012-08-29 11:09:59 CEST
Do we monitor server's date too?
Comment 3 Thomas Backlund 2012-08-29 11:17:53 CEST
Yep, for example head of a mail yesterday regarding valstar:

yellow Tue Aug 28 14:37:48 CEST 2012 up: 18:07, 1 users, 203 procs, load=0.16
&yellow System clock is -7202 seconds off (max 60)

Note You need to log in before you can comment on or make changes to this bug.