7238 – we should monitor more things such as servers' date

Bug 7238 - we should monitor more things such as servers' date

Summary: we should monitor more things such as servers' date

Status:	RESOLVED FIXED

Alias:	None

Product:	Infrastructure
Classification:	Unclassified
Component:	Others (show other bugs)
Version:	unspecified
Hardware:	All Linux

Priority:	Normal Severity: normal
Target Milestone:	---
Assignee:	Sysadmin Team
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	7228
Blocks:
	Show dependency tree / graph

Reported:	2012-08-29 10:41 CEST by Thierry Vignaud
Modified:	2012-08-29 11:17 CEST (History)
CC List:	2 users (show)

See Also:
Source RPM:
CVE:
Status comment:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Thierry Vignaud 2012-08-29 10:41:34 CEST

As shown in bug #7228, we should monitor more things such as servers' date.

Also, as yesterday mail failure show, we should monitor more services and send mail alerts to sysadmin ml on error.

Thierry Vignaud 2012-08-29 10:41:59 CEST

Depends on: (none) => 7228

Comment 1 Thomas Backlund 2012-08-29 11:03:06 CEST

We already do xymon monitoring wich alerts us on a separate list to not flood sysadm list, and sympa also notified us that it died when it lost db access.

But as there was server maintenance in progress in the DC, there was no point in restarting services just to have them fail again.

Status: NEW => RESOLVED
CC: (none) => tmb
Resolution: (none) => FIXED

Comment 2 Thierry Vignaud 2012-08-29 11:09:59 CEST

Do we monitor server's date too?

Comment 3 Thomas Backlund 2012-08-29 11:17:53 CEST

Yep, for example head of a mail yesterday regarding valstar:

yellow Tue Aug 28 14:37:48 CEST 2012 up: 18:07, 1 users, 203 procs, load=0.16
&yellow System clock is -7202 seconds off (max 60)

Note You need to log in before you can comment on or make changes to this bug.