[j-nsp] Solarwinds Monitoring Problem

Mon Jun 7 10:45:01 EDT 2010

I have seen the same issue with Solarwinds across many devices. I think Solarwinds only sends 1 ICMP message. If that message is lost it declares the node down. Ours has come back up on the next polling interval though. We also run NSM Express and haven't seen an issue with false alarms.

On a side note solarwinds has a knob for tuning your polling settings. Might look at your timeouts.

Jensen Tyler
Network Engineer
Fiberutilities Group, LLC

-----Original Message-----
From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of Paul Stewart
Sent: Sunday, June 06, 2010 7:43 AM
To: 'Jeff Cadwallader'
Cc: 'juniper-nsp'
Subject: Re: [j-nsp] Solarwinds Monitoring Problem

Great... and guess what we're getting ready to deploy? ;)  We have an NSM
Express system sitting in the box ready to go soon...

Our problem though doesn't appear to be SNMP itself - just problems pinging
the hosts..... during  the time that Solarwinds says "site is down" you
can't ping the box however SNMP still functions...

Cheers,

Paul

From: Jeff Cadwallader [mailto:wompus at gmail.com]
Sent: June-05-10 8:24 PM
To: Paul Stewart
Cc: juniper-nsp
Subject: Re: [j-nsp] Solarwinds Monitoring Problem

Paul

We have seen the same thing on our ex series 3200 and 4200. we have not seen
it on our MX480's yet. Our logs showed that the SNMP daemon had stopped.
Opened a case with jtac and they mention (after 2 months I might add) that
if you used Juniper's NMS (which we are) that that might cause those
symptoms due to excessive polling. We junked the NMS and it hasn't seemed to
happen since.

Jeff

On Sat, Jun 5, 2010 at 8:23 AM, Paul Stewart <paul at paulstewart.org> wrote:

Hi folks...

I'm starting here to see if anyone has seen this behaviour before by
chance....

We're in a migration to Solarwinds for monitoring of our network resources.
On the network are several Juniper devices (and lots more coming soon).

Every so often (about once a month or so), the Solarwinds system triggers
with a "node down" alarm.  When this occurs, it's showing a Juniper device
(which varies) as "down".  Definition of "down" simply means it's not
pingable.

The behaviour we're seeing is that from the Solarwinds server we suddenly
cannot ping the remote Juniper device - however - we continue to monitor
SNMP successfully on that device.  These Juniper devices have been MX480,
EX3200 and EX4200 to date.  During these outages I have been able to ping
these devices from any other location on our network except the Solarwinds
server.

If I reboot the Solarwinds server, the alarm clears so I thought this is
clearly an issue with the monitoring system ... but ... recently I rebooted
one of the Juniper switches and the issue cleared as well....

Logs on the Juniper devices are clean - nothing indicating a problem.
Solarwinds systems doesn't show anything of interest...

Thoughts? ;) I'm thinking of setting up another open source monitoring
solution just to further eliminate the Juniper side of this...

Paul

_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp