As shown in bug #7228, we should monitor more things such as servers' date. Also, as yesterday mail failure show, we should monitor more services and send mail alerts to sysadmin ml on error.
Depends on: (none) => 7228
We already do xymon monitoring wich alerts us on a separate list to not flood sysadm list, and sympa also notified us that it died when it lost db access. But as there was server maintenance in progress in the DC, there was no point in restarting services just to have them fail again.
Status: NEW => RESOLVEDCC: (none) => tmbResolution: (none) => FIXED
Do we monitor server's date too?
Yep, for example head of a mail yesterday regarding valstar: yellow Tue Aug 28 14:37:48 CEST 2012 up: 18:07, 1 users, 203 procs, load=0.16 &yellow System clock is -7202 seconds off (max 60)