[nsp] AS53192 woe

Cisco Geek Rotation cisco@peakpeak.com
Thu, 05 Sep 2002 14:56:41 -0600


OK here is a mindbender.  Working on a AS53192 with 12.2.2XB7 code and the 
latest (2.9.4.0) portware. Things are not going too well, and this looks 
like a potential software problem.

Many users log in and only transfer 181 octets and get a "zombie" session 
where their IP address doesn't show up in the route table with show ip 
route and they can't transfer any data.  They show up with show caller ip, 
but the IP isn't pingable, and isn't installed when you so a show ip 
route.  They disconnect, and log back in, and voila, they are online.

We were pulling our hair out trying to track this to a client modem 
incompatibility or something, and it wasn't until we did a complicated 
RADIUS database query that we happened to find out that this phenomena is 
tracking the modem/line itself, not the IP address or anything else.

A show modem (abbreviated) shows this:

         Avg Hold     Inc calls     Out calls    Busied   Failed    No     Succ
   Mdm     Time      Succ   Fail   Succ   Fail    Out      Dial   Answer   Pct.
   1/0   00:08:37     382     17      0      0       0        0       0     96%
   1/1   00:30:41     201      9      0      0       0        0       0     96%

It turns out, modem 1/0 (line 1) is bad.  In that a high percentage of the 
calls (according to RADIUS) it takes result in these zombie 
sessions.  According to the database, 45 of these calls (at least) were 
zombies.  Yet, the show modem test log is clean for this modem. If you look 
above, it took 382 calls whereas its neighbor took 201 calls.  If you look 
through the rest of the show modem it turns out that any modem that took a 
disproportionate number of calls relative to its neighbors also has this 
zombie problem.

In all, there are 22 modems scattered across the 192 modems in the box that 
have this problem.  For now, I did something like a line 1 modem busyout to 
kill the modem entirely.

But what could be causing this? These modems aren't going into the classic 
B state to indicate they are bad.  So modem recovery never kicks in to try 
to revive the modem with a download of the firmware.  Also, the show 
controller T1 statements show zero errors (which makes sense because the 
box is colocated with the telco, there is no miles of underground copper to 
content with, it's all fiber to a mux to the 53192).

I need some ideas on things to look at.

Thanks,

Chris