[j-nsp] Solarwinds Monitoring Problem

Paul Stewart paul at paulstewart.org
Mon Jun 7 10:53:13 EDT 2010


Thank you - appreciate the information...

We're looking at it currently and yes it only sends 1 ICMP message by
default .. we'll adjust and go from there..

Thanks again!

Paul


-----Original Message-----
From: Jensen Tyler [mailto:JTyler at fiberutilities.com] 
Sent: Monday, June 07, 2010 10:45 AM
To: Paul Stewart
Cc: 'juniper-nsp'
Subject: RE: [j-nsp] Solarwinds Monitoring Problem

I have seen the same issue with Solarwinds across many devices. I think
Solarwinds only sends 1 ICMP message. If that message is lost it declares
the node down. Ours has come back up on the next polling interval though. We
also run NSM Express and haven't seen an issue with false alarms.

On a side note solarwinds has a knob for tuning your polling settings. Might
look at your timeouts.

Jensen Tyler
Network Engineer
Fiberutilities Group, LLC

-----Original Message-----
From: juniper-nsp-bounces at puck.nether.net
[mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of Paul Stewart
Sent: Sunday, June 06, 2010 7:43 AM
To: 'Jeff Cadwallader'
Cc: 'juniper-nsp'
Subject: Re: [j-nsp] Solarwinds Monitoring Problem

Great... and guess what we're getting ready to deploy? ;)  We have an NSM
Express system sitting in the box ready to go soon...



Our problem though doesn't appear to be SNMP itself - just problems pinging
the hosts..... during  the time that Solarwinds says "site is down" you
can't ping the box however SNMP still functions...



Cheers,



Paul





From: Jeff Cadwallader [mailto:wompus at gmail.com]
Sent: June-05-10 8:24 PM
To: Paul Stewart
Cc: juniper-nsp
Subject: Re: [j-nsp] Solarwinds Monitoring Problem



Paul

We have seen the same thing on our ex series 3200 and 4200. we have not seen
it on our MX480's yet. Our logs showed that the SNMP daemon had stopped.
Opened a case with jtac and they mention (after 2 months I might add) that
if you used Juniper's NMS (which we are) that that might cause those
symptoms due to excessive polling. We junked the NMS and it hasn't seemed to
happen since.

Jeff

On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart <paul at paulstewart.org> wrote:

Hi folks...



I'm starting here to see if anyone has seen this behaviour before by
chance....



We're in a migration to Solarwinds for monitoring of our network resources.
On the network are several Juniper devices (and lots more coming soon).



Every so often (about once a month or so), the Solarwinds system triggers
with a "node down" alarm.  When this occurs, it's showing a Juniper device
(which varies) as "down".  Definition of "down" simply means it's not
pingable.



The behaviour we're seeing is that from the Solarwinds server we suddenly
cannot ping the remote Juniper device - however - we continue to monitor
SNMP successfully on that device.  These Juniper devices have been MX480,
EX3200 and EX4200 to date.  During these outages I have been able to ping
these devices from any other location on our network except the Solarwinds
server.



If I reboot the Solarwinds server, the alarm clears so I thought this is
clearly an issue with the monitoring system ... but ... recently I rebooted
one of the Juniper switches and the issue cleared as well....



Logs on the Juniper devices are clean - nothing indicating a problem.
Solarwinds systems doesn't show anything of interest...



Thoughts? ;) I'm thinking of setting up another open source monitoring
solution just to further eliminate the Juniper side of this...



Paul







_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp



_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list