[j-nsp] Cluster issue with SRX550
Ali Sumsam
ali+junipernsp at eintellego.net
Sat May 24 12:33:14 EDT 2014
Hi All,
*Scenario*
We have a cluster of two SRX550 and a MX5-T router.
An aggregated link (LACP) is connecting node0 (primary node) of SRX 550
cluster to MX5-T router.
The aggregate link consists of two ports. Copper/Ethernet.
The devices are in the same rack and directly connected (in case someone
doubts the physical connectivity)
*Junos*
SRX550 Cluster 12.1X44-D30.4
MX5-T 11.4R7.5
*Problem*
Sometimes it is observed that the LACP goes down on the MX5-T router. At
that time following logs are seen.
LACPD_TIMEOUT: ge-1/0/4: lacp current while timer expired current Receive
State: CURRENT
/kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: new
state is 0 cifd ge-1/0/4
/kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: new
state is 0 cifd ge-1/0/5
/kernel: ae_bundlestate_ifd_change: bundle ae1: bundle IFD minimum links
not met 0 < 1
At this time, the SRX cluster tries to failover and following logs appear.
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 2 transitioned from
'secondary' to 'primary' state due to Remote node is in secondary hold
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from
'primary' to 'secondary-hold' state due to Monitor failed: IF
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 2 transitioned from
'primary' to 'secondary-hold' state due to Monitor failed: IF
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from
'secondary-hold' to 'secondary' state due to Back to back failover interval
expired
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from
'secondary' to 'primary' state due to Remote node is in secondary hold
jsrpd[1393]: JSRPD_RG_STATE_CHANGE: Redundancy-group 2 transitioned from
'secondary-hold' to 'secondary' state due to Back to back failover interval
expired
Minutes later, the node0 comes back as primary and service is restored.
Besides, following logs on the SRX550 are coming.
/kernel: Process with Most Children- 1:init - Children - 211
/kernel: maxproc limit exceeded by uid 0, please see tuning(7) and
login.conf(5).
/kernel: nearing maxproc limit by uid 0, please see tuning(7) and
login.conf(5).
/kernel: Process with Most Children- 1:init - Children - 211
/kernel: maxproc limit exceeded by uid 0, please see tuning(7) and
login.conf(5).
/kernel: Process with Most Children- 1:init - Children - 211
/kernel: maxproc limit exceeded by uid 0, please see tuning(7) and
login.conf(5).
/kernel: Process with Most Children- 1:init - Children - 211
/kernel: maxproc limit exceeded by uid 0, please see tuning(7) and
login.conf(5).
/kernel: nearing maxproc limit by uid 0, please see tuning(7) and
login.conf(5).
/kernel: Process with Most Children- 1:init - Children - 211
We have been recommended to change following
set interfaces reth1 redundant-ether-options lacp periodic fast
to
set interfaces reth1 redundant-ether-options lacp periodic slow
If someone had similar experience, would appreciate your help.
Regards,
*Ali Sumsam CCIE - *eintellego Networks Pty Ltd
Senior Network Engineer
ali at eintellegonetworks.com ; www.eintellegonetworks.com
Phone: 1300 239 038; Cell +61 (0)450 609 592 ; skype://sumsam.ali80
facebook.com/eintellegonetworks ; <http://twitter.com/networkceoau>
linkedin.com/in/alisumsam
The Experts Who The Experts Call
Juniper - Cisco - Cloud
More information about the juniper-nsp
mailing list