[j-nsp] strange problem on chassis cluster
Matthias Brumm
matthias at commy.de
Sat Sep 4 06:50:57 EDT 2010
HI!
We have a very strange problem on two chassis clusters with 10.0R3.10
(will try updating to R4.7 today).
One chassis cluster (2x J6350) is our main system
The other (2x J4350) is a system located on the site of our customer.
The two clusters are speaking BGP with each other. For the customer
system, this is the only BGP session. Our main system has a full BGP
mesh to our other locations and edge systems. For understanding the
problem, I would compress this to three BGP sessions:
A) BGP session to AMS-IX over VLAN 1
B) BGP session to ECIX over VLAN 1
C) BGP session to ECIX over VLAN 2
Involved are two switches. VLAN 1 is configured on both switches to make
it available in Amsterdam and Düsseldorf. VLAN 2 is only configured on
the switch, faced to Düsseldorf, to have a backup in the case the first
switch is dead.
The day before yesterday, I started to pings to the ECIX router. One
from my local workstation, the other from the main cluster.
If I cofigure something on the redundant interfaces, as soon as I do the
commit, the first ping stays normal, the second junps to +30ms (normal
around 6ms). 2-3 minutes later, both pings stop. The BGP session drops.
This is the only BGP session that is dropped, due to Hold time
expiration. After a few minutes, the pings and the BGP session come
back. Every other BGP session even the one to Düsseldorf over VLAN 2
stays up.
I switched the main load to Düsseldorf to VLAN 2. That time, that BGP
session was dropped, while the other stays up. The session to Düsseldorf
is taking the main load with around 260000 prefixes.
Matthias
More information about the juniper-nsp
mailing list