![]() |
2025-05-15 1530Z: I am aware of no service-affecting issues at this time. |
2025-05-13 1030Z: Working services (this list is partly for my benefit): firewall, user accounts, home directories, stored email, dovecot (reading stored email), sendmail (sending and receiving email), fail2ban filtering, ntp (time service), named (DNS service), munin and nrpe (monitoring), most of the websites.
2025-05-12 1600Z: Teaparty has been rebuilt, changing the setup we think most likely to have been the root cause of the issue. Above, I'll list the services that are believed to be running properly. If you need a service that's not yet listed above, please hold your fire; I'll restore everything as I have time. If any service listed above doesn't seem to be working, please let me know.
2025-05-11 1605Z: The root FS has been repaired a sufficiently-large number of times that data loss has been profound. I'm attempting to clarify the likeliest source of this corruption, and will then rebuild teaparty from scratch. The most recent good backup was at 0100 on 9/5/25 - basically, it died during the course of making the backup on 10/5 - so that backup will be restored. That means about 24 hours of data loss, for which I apologise, but that now seems to be unavoidable: the live FS has been so badly damaged a number of the home directories are completely missing. I cautiously anticipate service will start to be restored late on 12/5.
2025-05-11 1050Z: The root FS is re-roaching itself every time I boot into Debian. Current leading hypothesis is that the mobo UEFI implementation is poor, and the OS (which is UEFI booting) is not talking well to the hardware as a result. Second likeliest candidate is the SATA cables, though I note that whenever I boot into rescue mode from the CD - which is a non-UEFI boot - it talks to the hardware just fine, and fixes the FS without complaint. At any rate, I've de-racked the system and am taking it back home to investigate further. We will therefore be down until tomorrow at the earliest.
2025-05-11 0855Z: Teaparty is having a recurrence of the problem where its own root FS goes read-only, and has been doing so since yesterday morning. That pretty much stops everything: mail in, mail out, changes to stored mail, web activity, the lot. Right now I am at the colo in SVG trying to fix the root FS, but the errors being reported are far more extensive than they were last time this happened. It may be that I have to unrack it and take it back to Cambridge for reinstallation. We do have backups, so nothing from before about 0100 10/05/2025 should be lost.
I was on the motorbike yesterday, riding back from Richmond, Yorks., so diagnostics, fixes, and updates to this site were off the cards for yesterday. Sorry.
2025-04-19 0012Z: teaparty has remounted its own file system read-only and as a result no changes can be made. This includes, but is not limited to, receiving email.
2024-10-12 1745h: It's clear that the system has power, and the network is certified fine by the colo guys. I suspect the kernel hit some edge case and panicked, but I won't know for sure until I go by the colo tomorrow. News as it's made.
2024-10-12 1559h: teaparty.net appears to be completely unreachable. I do not yet know why.
2024-03-22 0830h: Everything is down while I reinstall teaparty, yet again.
2024-03-21 0840h: All user-facing services except the photo gallery and webmail are back, but for performance reasons we're going to have to do this all again; we will be down all of Fri 22/3 while I re-reinstall.
2024-03-13 1300h: All user-facing services except mailman (the mailing list manager), webmail, and the photo gallery are now back and working. The system is running slowly, and a hardware upgrade has been put in progress to remediate that over the next 2-3 weeks. We have definitively established that mailman3 can't be made to run on teaparty without significant upheaval. I'm starting to try to set it up elsewhere, and migrate the lists. Webmail is also an unsolved problem, and may not come back. The photo gallery should eventually return, but it'll be a while.
2024-03-12 1750h: I think I have a handle on the performance issue, and it's now under control, though still not great. Teaparty will be slower than you're used to, but mail in and out should now be working with the exception of the mailing lists.
2024-03-12 1455h: A serious performance issue has been discovered with the new configuration. It is sufficiently bad that teaparty is borderline unusable. I am trying to come up with an optimum plan to proceed from here, but it must be expected that teaparty will be down for some days.
2024-03-11 0645h: Teaparty is expected to be substantially or completely down 11-13/3 for a complete reinstall and OS upgrade.
2023-11-02 0937h: teaparty is currently unreachable from my monitoring server via ipv6 only; ipv4 access remains fine. I also have several other test endpoints that can't reach teaparty via ipv6. A call is open with my colocation provider, and we're trying to get to the bottom of it. Meanwhile, if you experience any problems with ipv6 access, I recommend falling back to ipv4.
2022-02-03 1000h: we expect teaparty to be more down than up until 1100h, while it is moved to a different rack in the colo.
2020-12-12 1130h: we're currently back on-air, but I'm told that only a single fibre strand has been reconnected, and we are therefore still very much at-risk of both intermittent service and sudden repeated outage.
2020-12-12 1025h: current best time-to-fix is midday.
2020-12-12 1010h: emergency fibre maintenance works near the colo, which should have affected only one of the dual paths in, have affected both. Nobody is quite sure why, yet. There is no estimated time to fix.
2020-12-12 0940h: the issue continues. The backhaul provider, Vorboss, are confirming the loss of multiple links, so at the moment it's looking like someone put a JCB through a conduit close to the A1(M), or equivalent disaster.
2020-12-12 0848h: the colo confirms a routing issue.
2020-12-12 0840h: there is an issue accessing teaparty.net. I have no confirmation from the colo yet, but the loss of access to other equipment in an adjacent netblock suggests it's a network failure rather than teaparty itself. News as it's made.
2020-10-19 0905h: IPv6 is currently not routing correctly to or from teaparty.net. IPv4 service is unaffected. A service issue has been logged with the hosting company and their response is awaited.
2020-10-13 1045h: teaparty went through a period of poor availability, but things seem to be better now. I'll be holding a watching brief for the rest of the morning.
2020-10-13 0934h: the data centre that hosts teaparty.net is experiencing network instability. See their status page for up-to-date information. At the moment this doesn't appear to be impacting signficantly on teaparty's performance.
2020-07-31 1030h: I think we're back in business. I'll be at the colo for another 30-60 minutes, CALL ME ON MY MOBILE if you have issues.
2020-07-31 0745h: the restore seems to have gone fairly well overnight (FAOD: all user data is safe). There are bugs, and I'm working through them, but no show-stoppers as yet. At the moment I'm intending to take teaparty back to the colo sometime this morning, which (in an ideal world) would mean it was up and running by noon. This is not a promise and could change at any minute.
It turns out that we were just early discoverers of a massive thumping bug that's trashing UEFI systems left, right, and centre. I'm not sure if that makes me feel better or not.
2020-07-30 1655h: The issue seems to have been related to bad UEFI implementation in the hardware, which turned a bog-standard kernel upgrade into a brick-the-system episode. I am reinstalling the system without UEFI support. This will take some hours, and even when done, I intend to watch the system like a hawk before taking it back to the colo. If all goes very well: late tomorrow. If all doesn't: could be days. Possibly a week.
2020-07-30 1205h: System data (stored emails, etc.) seem to be fine, but the system won't boot, and resists attempts to make it do so. I am taking it home, because it's easier to work on there. Service is very unlikely to be restored today. Tomorrow is possible but don't in any way shape or form rely on it.
2020-07-30 1055h: teaparty.net is completely down. Tom is onsite at the colo investigating. There is no time-to-fix yet.