[okfn-br] Listas de e-mail da OKF voltaram a funcionar

Everton Zanella Alvarenga everton.alvarenga em okfn.org
Terça Outubro 22 16:40:12 UTC 2013


Pessoal, alguns devem ter reparado que vários e-mails de listas da OKF
chegaram tudo de uma vez. Ocorreu um problema nos servidores que rodam as
listas de e-mail e a explicação pode ser encontrada abaixo.

É por isso que não receberam, por exemplo, o anúncio do evento da Escola de
Dados no final da semana passada (Yaso, o evento já ocorreu, Gisele, por
isso não recebey) ou eu tive que mandar e-mails individuais para a lista da
Escola de Dados. Os e-mails ficavam no arquivo, mas não chegam na nossas
caixas - fomos afetados também na lista da escola de dados e ciência
aberta, ou listas internacionais, para quem participa.

É isso, o problema já foi resolvido!

Tom

---------- Forwarded message ----------
From: Nick Stenning <nick.stenning ARROBA okfn.org>
Date: 2013/10/22
Subject: [list-admins] Mailing list delivery delays
To: Nick Stenning <nick.stenning ARROBA okfn.org>


Dear all,

You're receiving this email because you administer a mailing list hosted
on the Open Knowledge Foundation's list server, AKA "lists.okfn.org" (or
because you work for the Foundation).

Some of you may have noticed that delivery of mail sent to your mailing
list(s) was delayed between shortly after 1300UTC on Wednesday 16th and
some time after 1530UTC on Sunday 20th. The exact time at which delayed
mail will have been delivered will vary due to the size of the backlog
that had to be processed.

I'm writing to apologise for the inconvenience this may have caused to
you and your list members, and to explain what happened and what steps
we'll be taking to ensure it doesn't happen again. It's also important
to make clear that as far as we can determine, NO mail was lost during
this outage.

Root cause
==========

A command run manually on our lists mailserver shortly after
2013-10-16T13:00Z caused both our mail transfer agent (MTA) and our list
queue runner to stop working. Some time later it was noticed that the
MTA had crashed and this was manually restarted. Unfortunately, in the
process of diagnosing this problem, the issue with the list queue runner
was not noticed: the queue runner continued to run apparently normally,
but it was not processing mail.

This latter problem remained undiagnosed until the following Sunday
afternoon, when it was also manually resolved.

Aggravating factors
===================

Both the MTA failure and the failure of the queue runner should have
been noticed by automated monitoring systems. Unfortunately, these were
not appropriately configured, and so detection of both failures was
entirely manual.

Mitigations
===========

Appropriate automatic monitoring has now been put in place for both the
presence of the MTA, and the length of the outstanding queue of messages
scheduled for delivery to mailing lists (which gives an indication of
whether the queue runner is operating normally).

In addition, we've learnt some things about how not to invoke the queue
runner from the command line :)

Summary
=======

We're satisfied that we have taken all reasonable steps to ensure that
this doesn't happen again. We will be continuing to review the situation
as we migrate and upgrade our mail services, both internal and public.

It is worth noting at this point that while we took the opportunity last
week to migrate our staff email server to a new location, we are
confident that there is NO causal relationship between this migration
and the problems described here. It is merely unfortunate that the two
events were temporally correlated.

------

If anyone has any questions or comments, I'll be happy to answer them.
Please simply reply to this email and I'll get back to you as soon as I can.

Please accept again my apologies for this interruption of service.

Yours,
Nick

--
Nick Stenning
Technical Director, Open Knowledge Foundation
GitHub/Twitter/Skype: nickstenning

_______________________________________________
list-admins mailing list
list-admins em lists.okfn.org
http://lists.okfn.org/mailman/listinfo/list-admins



-- 
Everton Zanella Alvarenga (also Tom)
OKF Brasil - Rede pelo Conhecimento Livre
http://br.okfn.org
-------------- Próxima Parte ----------
Um anexo em HTML foi limpo...
URL: <http://lists.okfn.org/pipermail/okfn-br/attachments/20131022/0c801364/attachment-0002.html>


Mais detalhes sobre a lista de discussão okfn-br