[j-nsp] Templates for logging from EX series

Wed Jun 23 12:51:00 EDT 2010

On Wed, Jun 23, 2010 at 12:18:32PM +0200, Wouter van den Bergh wrote:
> Hi,
> 
> We are quite new with Juniper so excuse me if these exists and i
> haven't been able to find them.
> 
> We have rolled out several networks with EX series chassis and
> switches(82xx, 42xx). Currently all logging ends up in
> /var/log/messages which becomes a terrible mess, especially when you
> try troubleshoot a problem. Also the messages log contains a whole lot
> of junk which generally isn't interesting to view.
> 
> Now I've seen at our EX training that you can create templates for the
> logging to be split out in other files to keep it clean and organised.
> Pretty much what i would like is to build a cisco alike logfile with
> just useful info, rather than all sorts of Linux kernel messages and
> what not.
> 
> I have tried in our LAB to play around with this, but i don't seem to
> understand the concept behind this.
> 
> Does anyone have any templates they want to share, or can point me to
> documentation with lots of examples?

JUNOS logging is either really really good, or really really bad,
depending on what you're trying to do. I prefer to take a multi-stage
approach to monitoring, with a generic "messages" file which is intended
to capture human interesting events (i.e. everyone working on the router
should be monitoring it in real time), a "debugging" file which records
* locally for when stuff breaks, and a syslog export to a remote 
collector which stores everything in a database.

Stuff to watch out for:

1) The log message format is not always consistent, particularly when
logical routers/systems are involved, which can make it a real pain to
parse.

2) I disagree with a huge number of the default prioties for a lot of
events. For example, an IGP DOWN event is a priority 5 / notice, while
an IGP UP is a priority 6 / info. If you actually want to match up your
igp flaps and know if a circuit has come back up you'll need to monitor
all the info stuff, in which case be prepared for a flood of useless
crap messages.

3) The developers are really sloppy about leaving in debugging messages 
which will flood the everloving crap out of the logs for no good reason. 
Basically you'll have to block them, or slowly lose your sanity. You can 
use the regexp format below to do this.

IMHO the only sensible way to monitor JUNOS logs in any kind of scale is
to run everything through a centralized collector, parse the messages
down to their raw components, and then maintain your own set of rules to
match certain events and reclassify them to the priority levels that
make sense for you. We dump the parsed messages to db, and then have a 
network wide "tail" which queries from that.

At any rate, here is an example of a syslog config:

syslog {
    user * {
        any emergency;
    }
    host 1.2.3.4 {
        any notice;
        kernel error;
        authorization error;
        explicit-priority;
    }
    file debug {
        any verbose;
    }
    file messages {
        any notice;
        daemon info;
        authorization error;
        kernel error;
        /* Block some of the annoying JUNOS msg floods */
        match "!(.*status 255.*)|(.*Accepted 
password.*)|(.*MPLS.stats.*)|(.*MPLS.*_BANDWIDTH_CHANGE.*)|(.*dfwd_rpd_communication.*)|(.*xntpd.*)|(.*callback_lock.*)|(.*nput 
queuing.*)|(.*nitializ.*)|(.*Read acess 
profile.*)|(.*dynamic-profiles.*)|(.*configuration is 
empty.*)|(.*_REG_ERR.*first cell drops.*)|(.*MC_AE_OPTIONS.*)|(.*STL 
library initialized.*)|(.*Synchronized commit 
processing.*)|(.*reroute.*)|(.*re-route.*)|(.*encaps_ohead.*)|(.*[lL]icense.*)";
        explicit-priority;
    }
    file auth {
        authorization info;
        explicit-priority;
    }
    file pfe {
        pfe info;
        explicit-priority;
    }               
    file daemon {   
        daemon info;
        explicit-priority;
    }               
    file firewall { 
        firewall any;
        explicit-priority;
    }               
    time-format millisecond;
}                   

You can also block a lot of the junk messages from being logged inside 
of an event policy, for example:

event-options {
    policy IGNORE {
        events [ rpd_mpls_lsp_bandwidth_change rpd_mpls_lsp_change 
rpd_rsvp_bypass_down rpd_rsvp_bypass_up ui_commit_empty_container 
ui_load_junos_default_file_event l2cpd_task_reinit rpd_task_reinit 
chassisd_parse_complete ping_test_completed ping_test_failed 
ping_probe_failed ui_child_exited ];
        then {
            ignore;
        }
    }
}

I've been begging for the ability to reset event priorities to more
sensible values in the event policies for ages, but so far there hasn't
been any interest.

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)