Multiple Alerts Very Annoyoning
I have sever checks that alert me multple times of a down check and it seems to be random.. All my checks are set to alert on each stat change but yet just tonight I got back to back alerts two checks that have been down all day.. and I got several alerts today telling me they were down.. but they never came up... I have to say at midnight it is very annoying to get back to back alerts that you know have been down all day.. has anyone else experienced this? The new machine has not helped this situation.. this has been an issue for me for as long as I have been using SC.
Here is a sample of the check log..
Thu Sep 2 21:48:38 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 21:57:06 2004 DOWN - RTT:.16. - Connection to host timed out
Thu Sep 2 21:59:06 2004 DOWN - RTT:.16. - Connection to host timed out
Thu Sep 2 22:07:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:09:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:17:59 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 22:19:59 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 22:28:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:30:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:38:32 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:40:32 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:48:59 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:50:58 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:59:31 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 23:01:31 2004 DOWN - RTT:.15. - Connection to host timed out
and here you can see the alert SMTP... The QC check is an example but there are several this is is occuring on..
Thu Sep 2 18:24:02 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:20:02 2004 SNMP - GATES AMBIENT is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:20:05 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:51:02 2004 SNMP - GATES AMBIENT is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:51:04 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
THanks
Comments
The issue is the software sends an alert that a check is down.. that is good.. then later sends it again and again... but it is random on its time.. it is like it will "forget" that the alert was already sent... even though it is supposed to only alert each status change.
If it were random then it would occur like that. Also it seems to happen at the same time.
Forum Administrator
No dependancy.
No I have set to alert on each status change...
SNMP|X|SNMP - TRINITY AMBIENT|X|1|X|120|X||X||X||X|true|X|true|X|lt|X|1|X||X||X||X|trinity<X>public<X>161<X>1.3.6.1.4.1.674.10892.1.700.20.1.5.1.6|X||X|98|X|no|X|120|X||X||X|On each status change|X|General Alert Team|X||X||X|Hardware Group|X||X|<X>|X||X|yes|X|<X><X><X>|X|<X><X>|X||X||X|no|X||X||X|yes|X|
Can you give the content of the "SNMP - Trinity Ambient" log file?
Also what is the content of the log file in the logging directory that was created around the time of the error (this file stores all monitoring activity so that we can compare it to the alerts).
Please download it from
http://www.serverscheck.com/files/monitoring_rule.zip
Let it run and when the multiple alerts occurs again, then please do send the latest log file in the logging subdirectory to [email protected]
I will do that...
Thank you
I tried to get this file because I have a several ping checks that when down for hours send multiple alerts.. but the file is not there...
See sample from log below.. .this should alert every change...
Fri Sep 17 05:24:48 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:30:54 2004 DOWN - RTT:.31. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:33:54 2004 DOWN - RTT:.31. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:39:02 2004 DOWN - RTT:.33. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:42:02 2004 DOWN - RTT:.33. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:46:52 2004 DOWN - RTT:.50. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:49:52 2004 DOWN - RTT:.50. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:54:47 2004 DOWN - RTT:.58. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:57:47 2004 DOWN - RTT:.58. - IP_TTL_EXPIRED_TRANSIT
Thu Sep 16 08:45:14 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 16 13:40:11 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 16 13:50:35 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 16 23:16:33 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 16 23:16:36 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Fri Sep 17 05:34:23 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Fri Sep 17 05:34:26 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Fri Sep 17 05:50:02 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Fri Sep 17 05:50:05 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
PLEASE help me make this stop... if I set it to every change in status that is what I want it to do.
I should say the log above is for Pine Hill AL Gateway and for smtp.log I did not include Phoenix.. it is below
Fri Sep 17 04:58:25 2004 DOWN - RTT:.8. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:01:25 2004 DOWN - RTT:.8. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:06:12 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:09:12 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:13:56 2004 DOWN - RTT:.0. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:17:01 2004 DOWN - RTT:.0. - Connection to host timed out
Fri Sep 17 05:21:48 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:24:48 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:30:53 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:33:53 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:39:02 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:42:02 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:46:51 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:49:51 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:54:47 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Fri Sep 17 05:57:47 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
Regards,
Forum Administrator
FYI running all day.. no duplicate alerts.. perhaps new build fixed the issue.. I will keep you posted.
Thank you for your support.