Multiple Alerts Very Annoyoning

[Deleted User][Deleted User]

I have sever checks that alert me multple times of a down check and it seems to be random.. All my checks are set to alert on each stat change but yet just tonight I got back to back alerts two checks that have been down all day.. and I got several alerts today telling me they were down.. but they never came up... I have to say at midnight it is very annoying to get back to back alerts that you know have been down all day.. has anyone else experienced this? The new machine has not helped this situation.. this has been an issue for me for as long as I have been using SC.

Here is a sample of the check log..

Thu Sep 2 21:48:38 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 21:57:06 2004 DOWN - RTT:.16. - Connection to host timed out
Thu Sep 2 21:59:06 2004 DOWN - RTT:.16. - Connection to host timed out
Thu Sep 2 22:07:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:09:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:17:59 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 22:19:59 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 22:28:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:30:34 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:38:32 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:40:32 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:48:59 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:50:58 2004 DOWN - RTT:.14. - Connection to host timed out
Thu Sep 2 22:59:31 2004 DOWN - RTT:.15. - Connection to host timed out
Thu Sep 2 23:01:31 2004 DOWN - RTT:.15. - Connection to host timed out

and here you can see the alert SMTP... The QC check is an example but there are several this is is occuring on..

Thu Sep 2 18:24:02 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:20:02 2004 SNMP - GATES AMBIENT is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:20:05 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:51:02 2004 SNMP - GATES AMBIENT is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
Thu Sep 2 22:51:04 2004 QC Remote Host is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .

THanks

Comments

  • [Deleted User][Deleted User]

    The issue is the software sends an alert that a check is down.. that is good.. then later sends it again and again... but it is random on its time.. it is like it will "forget" that the alert was already sent... even though it is supposed to only alert each status change.

  • AdministratorAdministrator
    Do you have dependency defined for your checks? Have you setup ServersCheck to alert you on each status change for those 2 rules or "only when down"



    If it were random then it would occur like that. Also it seems to happen at the same time.



    Forum Administrator
  • [Deleted User][Deleted User]

    No dependancy.

    No I have set to alert on each status change...

    SNMP|X|SNMP - TRINITY AMBIENT|X|1|X|120|X||X||X||X|true|X|true|X|lt|X|1|X||X||X||X|trinity<X>public<X>161<X>1.3.6.1.4.1.674.10892.1.700.20.1.5.1.6|X||X|98|X|no|X|120|X||X||X|On each status change|X|General Alert Team|X||X||X|Hardware Group|X||X|<X>|X||X|yes|X|<X><X><X>|X|<X><X>|X||X||X|no|X||X||X|yes|X|

  • AdministratorAdministrator
    The sample of the conf file you are showing is for SNMP however the check log file seems to be a PING check.



    Can you give the content of the "SNMP - Trinity Ambient" log file?



    Also what is the content of the log file in the logging directory that was created around the time of the error (this file stores all monitoring activity so that we can compare it to the alerts).
  • AdministratorAdministrator
    There is a new build of the monitoring_rule that has some additional logging capabilities so that we can help you tracking the issue.



    Please download it from

    http://www.serverscheck.com/files/monitoring_rule.zip



    Let it run and when the multiple alerts occurs again, then please do send the latest log file in the logging subdirectory to [email protected]
  • [Deleted User][Deleted User]

    I will do that...

    Thank you

  • [Deleted User][Deleted User]

    I tried to get this file because I have a several ping checks that when down for hours send multiple alerts.. but the file is not there...

    See sample from log below.. .this should alert every change...

    Fri Sep 17 05:24:48 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:30:54 2004 DOWN - RTT:.31. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:33:54 2004 DOWN - RTT:.31. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:39:02 2004 DOWN - RTT:.33. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:42:02 2004 DOWN - RTT:.33. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:46:52 2004 DOWN - RTT:.50. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:49:52 2004 DOWN - RTT:.50. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:54:47 2004 DOWN - RTT:.58. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:57:47 2004 DOWN - RTT:.58. - IP_TTL_EXPIRED_TRANSIT

    Thu Sep 16 08:45:14 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Thu Sep 16 13:40:11 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Thu Sep 16 13:50:35 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Thu Sep 16 23:16:33 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Thu Sep 16 23:16:36 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Fri Sep 17 05:34:23 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Fri Sep 17 05:34:26 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Fri Sep 17 05:50:02 2004 Phoenix AZ Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .
    Fri Sep 17 05:50:05 2004 Pine Hill AL Gateway is DOWN -> Sending SMTP alert through external proc mail server 10.1.1.100. Returned: OK .

    PLEASE help me make this stop... if I set it to every change in status that is what I want it to do.

  • [Deleted User][Deleted User]

    I should say the log above is for Pine Hill AL Gateway and for smtp.log I did not include Phoenix.. it is below

    Fri Sep 17 04:58:25 2004 DOWN - RTT:.8. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:01:25 2004 DOWN - RTT:.8. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:06:12 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:09:12 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:13:56 2004 DOWN - RTT:.0. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:17:01 2004 DOWN - RTT:.0. - Connection to host timed out
    Fri Sep 17 05:21:48 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:24:48 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:30:53 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:33:53 2004 DOWN - RTT:.34. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:39:02 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:42:02 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:46:51 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:49:51 2004 DOWN - RTT:.40. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:54:47 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT
    Fri Sep 17 05:57:47 2004 DOWN - RTT:.24. - IP_TTL_EXPIRED_TRANSIT

  • AdministratorAdministrator
    So you mean that the logging subdirectory of ServersCheck is empty?



    Regards,



    Forum Administrator
  • [Deleted User][Deleted User]
    I was never able to get the file.. web server told me the file did not exist when I followed the URL
  • AdministratorAdministrator
    We uploaded it again for your convenience.
  • [Deleted User][Deleted User]
    I have it running now...
  • AdministratorAdministrator
    You will notice that the release you downloaded also includes a fix for the wait queue while being down. Can you confirm that CPU load has reduced with new version?
  • [Deleted User][Deleted User]

    FYI running all day.. no duplicate alerts.. perhaps new build fixed the issue.. I will keep you posted.

    Thank you for your support.

This discussion has been closed.