[Outages-discussion] Mailman web interface broken

Tue Apr 16 13:23:54 EDT 2013

On Tue, Apr 16, 2013 at 12:48:13PM -0400, Jared Mauch wrote:
> 
> On Apr 16, 2013, at 12:43 PM, Jeremy Chadwick <jdc at koitsu.org> wrote:
> 
> > It looks like mailman broke.  Again.  :-(
> > 
> > This is also the cause of all the influx of mails within the past few
> > minutes.  It takes careful review of SMTP headers (specifically
> > reviewing Received: lines, and/or looking at the Date: header (which is
> > sent by the client)) to figure out.
> > 
> > Jared, have you talked to the mailman folks about this recurring
> > problem?
> > 
> > freebsd.org's mailing lists are all Mailman 2.1.14 and have never seen
> > this kind of issue.  Whether or not 2.1.15 has a problem that 2.1.14
> > lacks is unknown to me.
> > 
> > All I could find recently on the mailman-users list is these:
> > 
> > http://mail.python.org/pipermail/mailman-users/2013-April/074920.html
> > http://mail.python.org/pipermail/mailman-users/2013-April/074960.html
> 
> What's happening is that some of the files are getting owned by the wrong user.  This last case it was because of an upgrade from 2.1.14 -> 2.1.15.  I didn't check every list, but for some reason the $mailman/list/outages* stuff ended up owned by root when the others (eg: cisco-nsp, cisco-voip, etc..) were working just fine.

Sadly I'm not familiar with mailman, specifically what program (either
mailman or sendmail (which you seem to be running)) is writing the files
to the filesystem.  sendmail's stuff tends to be painful when it comes
to switching uid/gid for certain things -- postfix is a blessing in
comparison.

A crappy workaround: if there's a log that gives indication of the
problem happening: write a small cronjob that grep -q's, and if $? == 0,
then chown the files, then bin/unshunt.  Alternately you could use
find(1) and see if there are any root-owned files in that dir, and if
so, chown + bin/unshunt.

I've had to write such band-aids on systems I've managed in the past,
particularly httpd (Apache) wedging hard (daemon running, listening on
ports, but not responding to TCP SYN) after receiving SIGUSR1 during
daily log rotation.  The issue there, just for the record, had to do
with PHP and its extensions that the customer was using, as other
systems with the exact same hardware/filesystems/config/software
versions, barring those extensions, worked flawlessly.

I tell that story because I do not condone such band-aids -- they're
crappy workarounds.

> I'm very frustrated with this as I didn't see these types of problems on FreeBSD but w/ Linux I seem to have this happen fairly often.

Assuming you were using sendmail on FreeBSD as well, not some alternate
MTA?

I'd find it very strange if a system couldn't setuid() or seteuid()
"randomly", and even when failing to do so, continued on its merry way
as if it had.

> The reason all these messages came out of the woodwork is I ran a bin/unshunt to get them out.
> 
> Normally I break it around the first of the month when I manually remove the users that mailman can't auto-remove and I get from 50-1000 bounced emails to the -owner alias based on the lists and monthly reminders.

Ugh.  I had no idea there was that much bounceback.  I feel sorry for
you having to do this kind of maintenance.  :-(

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |