[j-nsp] SSB/RE Problem

Elian Scrosoppi escrosoppi at ifxcorp.com
Thu Jun 15 14:10:40 EDT 2006


Hi guys,

Yesterday we had some problems with the SSB/RE of our M20 router. I have extracted the following logs and information. Anyone can help me to determine the problem?

--
Model: m20
2 RE
2 SSB
JUNOS Base OS boot [7.0R2.7]
JUNOS Base OS Software Suite [7.0R2.7]
JUNOS Kernel Software Suite [7.0R2.7]
JUNOS Packet Forwarding Engine Support (M20/M40) [7.0R2.7]
JUNOS Routing Software Suite [7.0R2.7]
JUNOS Online Documentation [7.0R2.7]
JUNOS Crypto Software Suite [7.0R2.7]
--

content of /var/log/mastership :

Mar 16 14:30:17 event = E_NO_IPC, state = backup, param = 0x0x0
Mar 16 14:30:17 No response from the other routing engine for the last 2 seconds.

Mar 16 14:30:17 Currentstate backup NextState backup reason_code 0
Mar 16 14:30:17 new state = backup
Mar 16 14:30:17 Keepalive timeout of 2 seconds expired.  Assuming RE mastership.

Mar 16 14:30:17 event = E_CMD_F, state = backup, param = 0x0x0
Mar 16 14:30:20 The local RE becomes the master, retry = 0.
Mar 16 14:30:20 Currentstate backup NextState master reason_code 2
Mar 16 14:30:20 timestamp: Thu Mar 16 14:30:20 2006
Mar 16 14:30:20 new state = master

(lot of this)
Mar 16 14:30:26 failed to send RE info/keepalive: errno=0, total=2 in the last 20 sec
Mar 16 14:30:26 failed to send RE info/keepalive: errno=65, total=2 in the last 20 sec
Mar 16 14:30:40 failed to receive keepalives from other RE for the last 20 sec

(then)

Mar 16 14:35:37 received version 1, "claim mastership" request
Mar 16 14:35:37 event = E_REQ_C, state = master, param = 0x0x0
Mar 16 14:35:37 send "claim mastership" negative acknowledgement
Mar 16 14:35:37 Currentstate master NextState master reason_code 1
Mar 16 14:35:37 new state = master
Mar 16 14:36:02 event = E_ORE_B, state = master, param = 0x0x835cca8
Mar 16 14:36:02 Currentstate master NextState master reason_code 1
Mar 16 14:36:02 new state = master
Jun 14 15:43:07 event = E_ORE_M, state = master, param = 0x0x835cca8
Jun 14 15:43:07 Duplicate Master Routing Engine
Jun 14 15:43:07 mcontrol_disabled_exit
Jun 14 15:43:07 mcontrol_shutdown
Jun 14 15:43:07 mcontrol_notmaster
Jun 14 15:43:10 *** mcontrol init V01 ***
Jun 14 15:43:10 soft-restart: is not a master
Jun 14 15:43:10 Socket = 0x00000011
Jun 14 15:43:10 event = E_CFG_B, state = init, param = 0x0x0
Jun 14 15:43:10 Currentstate init NextState backup reason_code 0
Jun 14 15:43:10 new state = backup
--

content of /var/log/messages :

Jun 14 15:42:15  JM20 init: mib-process (PID 13778) terminated by signal number 15!
Jun 14 15:42:15  JM20 init: ntp (PID 13776) exited with status=0 Normal Exit
Jun 14 15:42:15  JM20 init: chassis-control (PID 2578) exited with status=6
Jun 14 15:42:15  JM20 init: chassis-control (PID 13816) started
Jun 14 15:42:15  JM20 init: failure target for routing set to target 1
Jun 14 15:42:15  JM20 init: routing (PID 13779) SIGTERM sent
Jun 14 15:42:15  JM20 init: failure target for routing set to target 1
Jun 14 15:42:15  JM20 init: routing (PID 13779) SIGTERM sent
Jun 14 15:42:15  JM20 chassisd[13816]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(0)
Jun 14 15:42:15  JM20 chassisd[13816]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(1)
Jun 14 15:42:15  JM20 rpd[13779]: RPD_SIGNAL_TERMINATE: second termination signal received
Jun 14 15:42:15  JM20 chassisd[13816]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(2)
Jun 14 15:42:15  JM20 chassisd[13816]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(3)
Jun 14 15:42:15  JM20 chassisd[13816]: CHASSISD_IFDEV_DETACH_ALL_PSEUDO: ifdev_detach(pseudo devices: all)
Jun 14 15:42:15  JM20 rpd[13779]: RPD_EXIT: Exit rpd[13779] version 7.0R2.7 built by builder on 2005-01-06 06:58:43 UTC, caller 80b14c3
Jun 14 15:42:16  JM20 alarmd[2579]: chassisd connection succeeded after 1 retries
Jun 14 15:42:16  JM20 craftd[2580]: chassisd connection succeeded after 1 retries
Jun 14 15:42:16  JM20 alarmd[2579]: resending alarm state
Jun 14 15:42:16  JM20 init: routing (PID 13779) exited with status=0 Normal Exit
Jun 14 15:42:17  JM20 syslogd: sendto: No route to host
Jun 14 15:42:17  JM20 craftd[2580]: attempt to delete alarm not in list
Jun 14 15:42:17  JM20 craftd[2580]: forwarding display request to chassisd: type = 4, subtype = 44
Jun 14 15:42:23  JM20 /kernel: mastership: routing engine 1 becoming master
Jun 14 15:42:23  JM20 /kernel: mastership: routing engine 1 becoming master
Jun 14 15:42:23  JM20 rshd[13894]: root at re0 as root: cmd='rcp -T -f /var/db/dcd.snmp_ix'
Jun 14 15:42:24  JM20 syslogd: sendto: No route to host
Jun 14 15:42:24  JM20 chassisd[13816]: CHASSISD_SNMP_TRAP10: SNMP trap generated: redundancy switchover (jnxRedundancyContentsIndex 6, jnxRedundancyL1Index 1, jnxRedundancyL2Index 0, jnxRedundancyL3Index 0, jnxRedundancyDescr SSB 0, jnxRedundancyConfig 2, jnxRedundancyState 2, jnxRedundancySwitchoverCount 1, jnxRedundancySwitchoverTime 777978144, jnxRedundancySwitchoverReason 2)
Jun 14 15:42:24  JM20 chassisd[13816]: CHASSISD_SNMP_TRAP10: SNMP trap generated: redundancy switchover (jnxRedundancyContentsIndex 6, jnxRedundancyL1Index 2, jnxRedundancyL2Index 0, jnxRedundancyL3Index 0, jnxRedundancyDescr SSB 1, jnxRedundancyConfig 3, jnxRedundancyState 3, jnxRedundancySwitchoverCount 1, jnxRedundancySwitchoverTime 777978144, jnxRedundancySwitchoverReason 2)
Jun 14 15:42:24  JM20 init: failure target for routing set to target 1
Jun 14 15:42:24  JM20 init: interface-control (PID 13814) terminate signal sent
Jun 14 15:42:24  JM20 init: ntp (PID 13897) started
Jun 14 15:42:24  JM20 init: snmp (PID 13898) started
Jun 14 15:42:24  JM20 init: mib-process (PID 13899) started
Jun 14 15:42:24  JM20 init: routing (PID 13900) started
Jun 14 15:42:24  JM20 init: sonet-aps (PID 13901) started
Jun 14 15:42:24  JM20 init: vrrp (PID 13902) started
Jun 14 15:42:24  JM20 init: sntpsync (PID 13810) SIGTERM sent
Jun 14 15:42:24  JM20 init: pfe (PID 13813) terminate signal sent
Jun 14 15:42:24  JM20 init: sampling (PID 13903) started
Jun 14 15:42:24  JM20 init: ilmi (PID 13904) started
Jun 14 15:42:24  JM20 init: remote-operations (PID 13905) started
Jun 14 15:42:24  JM20 init: class-of-service (PID 13906) started
Jun 14 15:42:24  JM20 init: network-access (PID 13907) started
Jun 14 15:42:24  JM20 init: ipsec-key-management (PID 13908) started
Jun 14 15:42:24  JM20 init: helper (PID 13909) started
Jun 14 15:42:24  JM20 init: remote-hello (PID 13910) started
Jun 14 15:42:24  JM20 init: link-management (PID 13911) started
Jun 14 15:42:24  JM20 init: kernel-replication (PID 13811) SIGTERM sent
Jun 14 15:42:24  JM20 init: firewall (PID 13815) terminate signal sent
Jun 14 15:42:24  JM20 init: internal-routing-service (PID 13912) started
Jun 14 15:42:24  JM20 init: routing-socket-proxy (PID 13913) started
Jun 14 15:42:24  JM20 init: pic-services-logging (PID 13914) started
Jun 14 15:42:24  JM20 init: adaptive-services (PID 13915) started
Jun 14 15:42:24  JM20 init: pgm (PID 13916) started
Jun 14 15:42:24  JM20 init: neighbor-liveness (PID 13917) started
Jun 14 15:42:24  JM20 init: service-deployment (PID 13918) started
Jun 14 15:42:24  JM20 init: failure target for routing set to target 1
Jun 14 15:42:24  JM20 init: interface-control (PID 13814) terminate signal sent
Jun 14 15:42:24  JM20 init: sntpsync (PID 13810) SIGTERM sent
Jun 14 15:42:24  JM20 init: pfe (PID 13813) terminate signal sent
Jun 14 15:42:24  JM20 init: kernel-replication (PID 13811) SIGTERM sent
Jun 14 15:42:24  JM20 init: firewall (PID 13815) terminate signal sent
Jun 14 15:42:24  JM20 init: firewall (PID 13815) exited with status=0 Normal Exit
Jun 14 15:42:24  JM20 init: firewall (PID 13920) started
Jun 14 15:42:24  JM20 init: pfe (PID 13813) terminated by signal number 15!
Jun 14 15:42:24  JM20 init: pfe (PID 13921) started
Jun 14 15:42:24  JM20 init: kernel-replication (PID 13811) exited with status=0 Normal Exit
Jun 14 15:42:24  JM20 init: sntpsync (PID 13810) terminated by signal number 15!
Jun 14 15:42:24  JM20 chassisd[13816]: snmp_ipc_try_connect: connect to master (unix sock) failed: Connection refused, retry in 1
--

The uptime of the SSB was 1 minute, and RE's was not affected. Then, one hour after, this problem have repeated.

If more information is needed please tell me.

Thanks in advance,

--
Elian Scrosoppi
escrosoppi at ifxcorp.com




More information about the juniper-nsp mailing list