[j-nsp] Slow performance of the KRT queue

Brad Fleming bdflemin at gmail.com
Fri Feb 5 11:15:15 EST 2016


Welcome to running a full table on the MX104. This is exactly what we found when lab testing the devices. After months of working with JTAC we never found a workaround. After several software updates and major configuration changes there was never a way to resolve the issues. During a major convergence event impacting a significant amount of the routes in a full table it took many minutes to get RIB and FIB sync’d. In the meantime traffic was getting blackholed. In the end we had to give up and roll bigger MX gear with much bigger REs (and much more expensive).



> On Feb 3, 2016, at 3:21 PM, Vincent Bernat <bernat at luffy.cx> wrote:
> 
> Hey!
> 
> I have a pair of MX104. Each one is receiving a full view and a default
> through an external BGP session. They share an iBGP session. They
> redistribute the default in OSPF (with a higher metric when the default
> comes through the iBGP session). Nothing fancy.
> 
> If I shut the upstream port of one of the MX, the session goes down and
> the RIB is quickly updated. Unfortunately, the KRT is quite slow to be
> updated. A "show krt queue" shows there are many
> deletion/addition/changes queued and they take about 2 minutes to be
> processed.
> 
> Unfortunately, during this time, I have a lot of more specific routes
> still pointing to a non-existant hop:
> 
> vbe at net-edge004.dk2# run show route 138.231.136.1 extensive table public.inet.0 | no-more
> 
> public.inet.0: 571546 destinations, 996364 routes (425305 active, 321183 holddown, 571058 hidden)
> 138.231.0.0/16 (2 entries, 1 announced)
> TSI:
> KRT queued (pending) change
>  138.231.0.0/16 -> {1.1.1.1}=>{indirect(1048578)}
> Page 0 idx 1, (group v4-IBGP type Internal) Type 3 val 22b9ccb8 (grp rto)
>   Advertised metrics:
>     No metrics
>     (Queued)
>   Enqueued metrics 1: (for peers 00000001 3.3.3.3)
>     Flags: Nexthop Change
>     Nexthop: Self
>     MED: 10
>     Localpref: 100
>     AS path: [61098] 25091 2200 2426 I
>     Communities: 25091:22413 25091:24115
> [...]
> Path 138.231.0.0 from 159.100.255.231 Vector len 4.  Val: 1
>        *BGP    Preference: 140/-101
>                Next hop type: Indirect
>                Address: 0x177743a0
>                Next-hop reference count: 877603
>                Source: 3.3.3.3
>                Next hop type: Router, Next hop index: 1048577
>                Next hop: 2.2.2.2 via xe-2/0/3.100
>                Session Id: 0x18
>                Next hop: 2.2.2.0 via xe-2/0/2.100, selected
>                Session Id: 0x17
>                Protocol next hop: 3.3.3.3
>                Indirect next hop: 0x19ec4b2c 1048578 INH Session ID: 0x1b
>                State: <Active Int Ext>
>                Age: 16:57      Metric: 10      Metric2: 0
>                Validation State: unverified
>                Task: BGP_61098_61098.3.3.3.3+50640
>                Announcement bits (3): 2-KRT 3-BGP_RT_Background 4-Resolve tree 2
>                AS path: 8218 2200 2426 I
>                Communities: 8218:102 8218:20000 8218:20110
>                Accepted
>                Localpref: 100
>                Router ID: 3.3.3.3
>                Indirect next hops: 1
>                        Protocol next hop: 3.3.3.3
>                        Indirect next hop: 0x19ec4b2c 1048578 INH Session ID: 0x1b
>                        Indirect path forwarding next hops: 2
>                                Next hop type: Router
>                                Next hop: 2.2.2.2 via xe-2/0/3.100
>                                Session Id: 0x18
>                                Next hop: 2.2.2.0 via xe-2/0/2.100
>                                Session Id: 0x17
>                        3.3.3.3/32 Originating RIB: public.inet.0
>                          Node path count: 1
>                          Forwarding nexthops: 2
>                                Nexthop: 2.2.2.2 via xe-2/0/3.100
> 
> So, I have three questions:
> 
> Is it expected for a route to be flagged "active" while it is still
> queued to KRT?
> 
> Is there a way to delete those invalid routes in a more speedier manner
> to let packets use the default route during the convergence time?
> 
> Is there some way to not advertise the default route in OSPF during the
> convergence time? Like a criteria: don't advertise this route when the
> KRT queue has 1000+ elements and until it reaches 0 (to avoid flapping).
> 
> I am running 13.3R8.7.
> 
> Thanks!
> -- 
> Treat end of file conditions in a uniform manner.
>            - The Elements of Programming Style (Kernighan & Plauger)
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list