Server Admin Log/Archive 7 - Wikitech
Jump to content
From Wikitech
Server Admin Log
(Redirected from
Server admin log/Archive 7
June 30
06:28 Tim: profiling set up at
05:40 Tim: set up bart as a home directory server, under the virtual hostname noc.wikimedia.org
03:50 Tim: upgrading APC to 3.0.11dev (plus my early/late warning patch) to fix an
incredibly stupid block header overwrite bug
in the shared memory allocator. I'm not sure how it was working at all, it certainly explains the majority of segfaults on x86_64 servers.
June 29
05:30 brion: running refreshLinks on enwiktionary after big namespace changes
05:03 Tim: Set up srv81-86 for external storage cluster 6 and 7.
00:30 mark:
PowerDNS
on
browne
(moved from
zwinger
) is unstable, and seems to crash on
update
s. Be careful with it, I'll look at it soon.
00:01 brion: set up apache redirects for bugzilla.mediawiki.org and it.wikimedia.org; but they need to be added to master DNS
June 28
23:53 brion: set import sources for some it, no wikis
20:40 brion: removed cluster5 from default storage (just cluster4 now), deleted a few gigs of old binlogs from srv76, need to investigate cluster5 slaves to check replication
20:37 brion: srv76 out of disk space. THIS IS A MASTER FOR EXTERNAL STORAGE CLUSTER 5
20:10 mark: Enabled subpages for NS_MAIN on wikimaniateamwiki
18:59 brion: running links refresh on en.wiktionary (on srv31)
18:52 brion: added a bunch of namespaces to en.wiktionary
17:31 brion: restarted srvs 107, 100, 109, 84, 35, 113 for segfaulting
17:10ish brion: restarted dewiki dump; appears to have hung during full-history dump, all dbs cut off for a while.
06:40ish brion: added importsources for fr(wiki|wikisource|wiktionary)
June 27
21:18 brion: db3 was rebooted. up, not sure if ready for use
20:20 brion:
db3
is down; can't ping it. took out of rotation
15:14 brion: has to restart lighty on benet, was down for unknown reason
06:20 brion: started new dump run (thread 2: starting dewiki), using dbzip2 and building search. hope it works!
05:something brion: restarted apache on friedrich; think that domas killed it accidentally earlier when doing APC stuff
03:37 brion: installed tidy on bart, was missing, breaking some pages on ssl interface
03:30 Tim: uninstalled APC on srv31, was spamming apache error log
03:14 Tim: Got
working, by installing gmetad from source.
June 26
23:30 jeluf: restarted httpd on goeje; mail web server was down mysterious
21:40 brion: restarting apaches; reports of db errors from what looks like apc corruption ('img_mame')
17:16 brion: benet rebooted with many gigs of space recovered. yay
17:15 brion: unable to find and kill what's holding the giant file open on benet; rebooting in the hopes it will recover
16:53 brion: restarted httpd on goeje; mail web server was down mysterious
16:50 brion: benet disk filled up due to accidental decompression of enwiki full history dump; trying to kill process and recover space
16:32 brion: set up /home backup refresh to khaldun as daily cronjob
10:30 domas: modified browne firewall to allow port53 udp packets.
06:20 brion: changed smtp.pmtpa.wmnet to point at goeje instead of zwinger for outgoing mail; gave goeje the .234 address temporarily while dns caches time out
04:50 Tim: got pdns running on browne, assigned ns0 and ns1
03:06 brion: setting up on-site backup of /home for the future
02:25 brion: fixed bart's sysconfig so it will route right on next boot
02:21 brion: switched bart's routing from (dead) zwinger to proper local router. its nfs was hung and nagios was reporting every server as offline.
01:48 brion: very high load on zwinger, some funky ganglia stats. report of slow on yaseo 20 mins ago. investigating -> disk failure. attempting shutdown
June 25
22:50 brion: added otrs-en-l and unblock-en-l lists; mindspillage admin
22:38 brion: restarted apache on srv22; segfaulting.
17:50 domas: i386 apc upgraded to 3.0.11-devel, 64-bit boxes still at 3.0.10-release
17:41 brion: enwiki experimental dump completed this morning (complete runtime 3 days, 4 hours, 26 minutes, over a 2x speedup over traditional running, though data rate dropped significantly towards the end). Starting bz->7z conversion
15:47 Tim: didn't help, they started segfaulting again after about 5 minutes.
15:32 Tim: we were still getting about 20 segfaults per second altogether from random servers, no user reports though. I reverted MediaWiki to r15030, this stopped the segfaults.
14:30 Tim: upgrading APC to 3.0.10 to prevent segfaults which occur in current MW HEAD.
05:00 brion: got libxml2/librsvg fixed up everywhere in mediawiki-installation. Still 2.14, but now renders illustrator output files consistently
04:51 Tim: started apache on srv34
(I had this off for a few minutes to build rpms. Sorry --
Brion
04:00 brion: more modestly upgrading various boxen to libxml2 2.6.23 to fix the problems rendering files produced by Adobe Illustrator
01:48 brion: spent a couple hours fighting with librsvg 2.15.0 on bart. able to build RPMs for it but still can't get it to read the illustrator-export files, even though they work compiled at home
June 24
23:30 brion: testing rsvg problems on half of servers
21:45 brion: adding check for random-large-number spammer to the captcha on smaller wikis
07:17 brion: added mime type for .djvu to amane's lighty
June 23
22:00 brion: enabling Poem extension on wikisource, test
18:35 brion: running experimental dump of frwiki + commons used images, going at
[1]
(completed around 21:00ish)
18:00 Tim: improved jobs-loop.sh to give more idle time and less DB load
14:22 Tim: changed wikiadmin password on yaseo
04:32 Tim: fixed ntpd on holbach, was erroneously reporting 6 seconds lag
June 22
23:50ish brion: setting up sugarcrm on office.wm.o
23:01 brion: upgrading apache & php on friedrich
22:27 brion: set up daily data backup on friedrich
21:55 brion: adding office.wm.o dns
20:15 brion: had to restart apache on srv94, was segfaulting rapidly
20:05 brion: installed spam whitelist update for wikis
18:27 brion: restarted lighty on benet, was mysteriously down
06:40 brion: installed drsport skin for dawiki
02:20 brion: doing an
experimental run of enwiki history dump
on srv31 using 7zip prefetch and dbzip2 compression for increased speed, to evaluate how well it runs over time. Using dbzip2d daemons on several db servers
01:50 brion: running fixSlaveDesync on enwiki.
a broken page reported
which has been fixed. Was borked on 'server 3'
00:06 brion: aborted the doomed enwiki dump. a db hiccupt killed the .bz2 history dump, and the .7z was going to fail when it got to the end of that incomplete file. doing some more testing before restarting it with some better error recovery in a few hours
June 21
07:18 Tim: restarted apache on srv104 and srv23, segfaulting
06:39 Tim: changed the cache administrator from wikidown@bomis.com to "nobody".
01:50 brion: dropped the public log entries for
w:meta:Oversight
June 20
22:05 brion: reenabled LinkSearch after patching up the query
08:20 domas: disabled LinkSearch extension, caused too much load in case of wikipedia.org lookups.
June 19
23:10 brion: installed LinkSearch extension
03:00 brion: running clamav scan on amane
01:26 brion: running dumps (now including stubs in public set)
00:10 brion: migrating older dump files from benet to amane
June 18
06:45 brion: adding comcom.wm.o
June 17
11:55 brion: rifling through an old commons image dump to search for files reported missing in
bugzilla:5828
06:05 Tim: reconfigured all apaches to send their syslog to suda. Added to the setup-apache script, not sure if it should be in setup-general as well.
05:56 Tim: removed dba and proctitle extension statements from /home/wikipedia/conf/php5-x86_64.ini. I assume this was done before to some of the apaches, but not to the master file, thus when I changed the memory limit earlier, it was changed back.
03:20 Tim: restarted apache on srv56, was segfaulting
03:00 Tim: increased PHP memory_limit on x86_64 machines to 100MB
June 16
23:32 brion: calculating size of image directories to plan new image dumps
04:46 Tim: in response to the rising job queue length on enwiki, reinstalled srv42 and added it to mediawiki-installation. Started three run-jobs threads.
03:50 Tim: upgraded ImageMagick to 6.2.8-Q8.nothread on all mediawiki-installation servers in pmtpa
03:40 brion: image archive/undeletion now live sitewide
02:48 brion: testing image archiving/undeletion on meta
02:40 brion: forgot to apply filearchive table patch before installing new software; some admin views showed errors when checking for existence
02:32 brion: clearing space off
benet
June 15
00:52 brion: upgraded
engine on
yf1017
00:08 brion: investigating frequent nagios reports of timeouts on yf1006 and yf1010... lots of Apache processes on yf1006 stuck in __lll_mutex_lock_wait, some under zend_getenv->?->mallopt->ap_month_snames, some from shutdown_destructors->mallopt->output_globals. Restarted apache, may be better.
June 14
08:05 brion: restarted mwsearchd on coronelli and rabanus manually; it didn't restart properly on the hourly cronjob
06:00 Kyle:
sq3
has a new OS, if this doesn't solve its stability issues then I'll send it back.
June 13
20:10 brion: restarted mwsearchd on coronelli manually; it didn't restart properly on the hourly cronjob
09:26 brion: added mw.o to mail server's domains, so security at mw.o addy works
09:12 brion: retarted gmond on apaches and gmetad on zwinger, seems happy again
08:59 brion: ganglia stats for pmtpa apaches down since about noon UTC yesterday. investigating
June 12
22:25 brion: fixed lucene restart script (/etc/cron.hourly/search-restart; master now in /home/wikipedia/conf/lucene)
21:57 brion: fixed broken ports for lucene
21:50 brion: fixed broken groups for perlbal
21:19 brion: set up a second perlbal group for search on port 8124 for 1gig machines, sending non-enwiki requests there
15:50 mark: Squid 2.6 on
clematis
keeps crashing, depooled
15:30 mark: Branched the Squid RPM in SVN and made 2.6 packages using (almost vanilla) Squid 2.6.PRE2. Testing on
clematis
. Squid 2.6 is not fully backwards compatible, and has quite a few changes, notably for accelerator setups like ours.
05:10 Tim: Renamed mwdaemon to mwsearchd. Installed it as a bundle.
02:29 Tim: Fixed mwdaemon startup scripts and pidfile handling. Using
daemonize
June 11
23:00 Tim: many apaches were running the Q16 version of ImageMagick from FC3-4. Upgraded them all to our own 6.2.6-Q8 RPM, changed the relevant package script.
21:40 brion: upgraded leuksman.com to apache 2.2.2 and php 5.1.4
20:19 brion: restarted apache on friedrich; php segfaulting
03:49 Tim: fixed issue with MWDaemon dying on startup if the HOME environment variable wasn't set
01:15 Tim: changed nagios config to use the HTTP plugin for lucene
01:09 brion: reinstalling mono from rpms on maurus for consistency
00:22 Tim: updated nagios configuration to reflect recent search changes. Started search daemons on almost all search servers, it had crashed on all except coronelli, rabanus and rose.
June 10
23:30 brion: started search daemon on maurus. syncing indexes to the various thingies
22:50 brion: while checking over lucene updates, found that sourceswiki 5-31 current-meta dump was broken without being reported. should examine this. (building its search from 5-23 dump)
21:42 brion: refreshed messages on it*
9:30 domas: started yet another copy lomaria to db1.
9:30 hashar: refreshed fiwikisource statistics
[2]
8:44 hashar: switched zhwikisource logo.
00:29 brion: set up restart cronjob on rose and rabanus; daemons died a while ago due to memory leak
June 9
23:40 brion: tweaking squid config to allow google feedfetcher
19:40 river: gave root access on zedler/hemlock to Wegge and Rob Church
17:00 jeluf: redirected
to
09:00 domas: increased apache children concurrency. MaxClients 40, MaxSpare 20
07:28 brion: moved rose and rabanus from apache to search dsh groups, woops
-02:06 Tim: set up
perlbal
on
diderot
for load balancing of search requests.
00:04 brion: taking
rose
and
rabanus
for
lucene
June 8
23:50 brion: cleaned up favicons again. docroot favicons no longer symlink to /apache/common/favicon.ico, which tended to get accidentally overwritten
22:59 brion: raised lucene result cache time from 15 minutes to 3 days per tim's suggestion
15:00 domas: experienced a yet another APC crash. this time it was... homogenized by apc3.0.8 ;-).
11:35 brion: experimentally raised the thread limit on humboldt's search from 10 to 20 threads to see how it compares with others. leave for a while to generate stats history...
11:24 brion: ganglia search stats
rate
service time
actually work now
maurus is currently rebuilding indexes, with the daemon off
coronelli handles many more than others. why? is it due to higher ram, or bad selection algo?
08:35 brion: adding search server stats to ganglia output
June 7
22:25 brion: preparing to turn
humboldt
hypatia
, and
kluge
into supplementary
lucene
servers (old 1gb apaches)
22:05 brion: rebuilding search indexes on maurus
21:00 domas: redirected profiling to (/usr/local/sbin/collector/) at webster:3811,
web interface here
20:07 river: installed the new zedler login server (hemlock.knams.wikimedia.org) with debian
14:00 domas: private static arrays caused most (not all!) of APCs on FC3s (all FC4s were sane) to segfault.
Update:
non-crashy boxen seem to have fresh APC binary.
Update:
stable boxes did have 3.0.8, unstable: 3.0.9-dev. WTF!
00:05 brion: set enwiki job run rate to 0.0001 (down from 0.02). When the database sticks, the job queue is a great way to turn safe read-only requests into hideous stuck death for hundreds of simultaneous processes. Temporarily had it at 0 for a couple minutes to clear out a full processlist.
June 6
03:04 brion: jawiki dump stuck in some kind of infinite loop in xml parser. stopped, restarting
June 5
19:32 brion: raised the hacked-in max regex size for spam blacklist from 25000 to 50000 bytes; it was over 26000 legitimately and failing
05:23 Tim: running fixUserRegistration.php on enwiki
05:00 Tim: restarted nagios_wm IRC bot
04:50 Tim: reinstalled applications on srv57, back into rotation, watching for segfaults
04:30 Tim: setting up srv51
June 4
21:37 brion: hoping domas finished "copying lomaris tablesspaces to db1, ixia, thistle, adler" as dumps are running from db1. a couple dozen of them failed during the unannounced breakage.
19:28 Kyle:
srv67
and
srv110
rebooted. (srv67 had a raid crash, I may replace its motherboard if it happens again)
19:23 Kyle:
srv51
has a new drive and a new OS. (Needs Config)
17:00 jeluf: created wikibooksDE-l
09:00 domas: copying lomaria tablespaces to db1, ixia, thistle, adler
05:19 brion: fiddling with dump stuff
05:15 brion: removed squid from logrotate config on benet so it stops filling the mail queue with cron errors
June 3
03:50 brion: running dump decompression tests on amane to double-check on some reported data corruption in dump files
June 2
22:45 jeluf: created the following wikis:
00:29 brion: added incubator.wm.o. preparing to move pages over from meta
June 1
23:14 brion: setting up dns for incubator.wm.o (for "test wikipedias")
22:30 brion: zedler munching:
disk filled up due to massive php error reportage from magnus' xml script running for several hours pumping out errors for every byte of a many-megabyte string repeatedly
killed the 154-gigabyte 'error' log file from apache logs dir. replaced it with an extract of the last 50 megabytes or so from it
restarted apache to kill everything with the old file open. might or might not have done it right
renamed magnus's php.cgi to php.cgi.broken to make sure it's not coming back until problem is resolved
May 31
23:25 brion: disabled apache on
srv57
, it segfaults a lot despite multiple restarts. needs to be checked for memory problems
15:11 mark:
zwinger
was out of memory and killed some processes, including ganglia. Started ganglia.
05:00ish brion: installed
dbzip2d
on various places; running on pmtpa database boxes for performance testing
04:00 Kyle:
srv119
has a new drive and OS.
03:56 Kyle:
sq3
Now has a new motherboard! Should be stable for squid serving.
May 30
09:13 brion: stabbed domas on #wikimedia-tech
09:12 domas: greeted brion on #wikimedia-tech
09:09 brion: added
bart
to mediawiki-installation group. Someone made this a MediaWiki server and didn't add it to the group, causing it to fail to receive various updates such as db.php changes (causing
bugzilla:6140
et al)
07:14 brion: ran namespaceDupes on zhwiki. Someone added a "Portal" namespace at some point but didn't check for dupes, stranding 48 portal pages in never-never land
May 29
19:00 mark: Replaced Bind by
pdns_recursor
on
pascal
and
mayflower
: more secure, faster, simpler, better.
12:10 mark: Domas called and disturbed my peaceful breakfast. Brought
pascal
back up, and all knams squids one by one.
12:00 domas: power down in whole Amsterdam, UPS'ies didn't survive. trying to get into knams and fix something
May 28
21:50 jeluf: fixed spamassassin and OTRS config on bart, apparently mail setup was changed to route OTRS mails to bart, not processing them at goeje any more.
12:20 domas: switched master to samuel
10:02 brion: shoved the old mail queue aside on maurus, should shut up the old resends
09:30ish brion: reverted message transformation changes to try to fix up continued zhwiki breakage. updated cacheepoch, but bad data may still be in caches
09:00 Kyle:
srv118
available for service.
08:18 brion: disabling display_errors from CommonSettings.php, we have log for this
02:55 brion: udpated $wgCacheEpoch on zhwiki; reports of cached breakage from a message parser bug
May 27
18:45 brion: tweaked up mail config on bart so it knows it handles only ticket.wikimedia.org, not all mail. Forwarded mail from OTRS to lists now goes properly through the main server.
18:19 brion: moved /u/l/mailman to /u/l/mailman-broken on bart to make sure this non-functional installation isn't accidentally sent to
18:17 brion: mail configuration on bart is borked, causing forwards to wikimedia addresses to fail (it tries to handle them locally)
17:56 brion: synced files on bart to fix
bugzilla:6110
EVERYONE CHECK YOUR SSH KEY LIST
, bart was changed and you may have conflicts which break ssh'ing to it so its files don't get updated.
09:45 domas: started main db pool master rebuilding.. taking out lomaria and samuel, rebuilding innodb pool on samuel.
May 26
22:06 brion: enabled $wgIgnoreImageErrors so thumbnail error messages won't muck up cached wiki pages
07:00 brion: set up jobs-loop to run runJobs.php through a setuid wrapper that runs it under the apache user. Hopefully this should keep permissions on rendered thumbs mostly correct.
04:41 brion: set
new logo for wikisource
02:44 brion: manually restored image
[3]
from old image archive, for some reason it had vanished from main directories and was filling the error log
01:10 brion: enabled thumbnail.log for thumbnailing failures
May 25
22:30 brion: oversight extension active on test/meta/en, not yet in use by anyone
18:58 brion: working on setting up cawikinews; interwiki updater system is broken by the db split
16:57 mark: Setting
cache_mem
to a low 400 MB and letting the Linux disk cache handle it had a surprisingly positive effect on
ragweed
, which has disks: it can now cope with equal load as the other diskless squids. Testing the same setting on
sq1
and
yf1000
13:55 mark: Changed the Squid configurator script to:
Add
connect-timeout=5
to all
cache_peer
lines, in order to have faster failover between squids
Not add a
cache_peer
line for the squid itself
Split knams in text and upload cluster
Enable
pipeline_prefetch
May 24
17:05 mark: Apparently database enwiki disappeared from
samuel
ixia
lomaria
and
thistle
, taking them out of rotation for enwp.
WTF
is this
not documented
!?
07:40 brion: disabled .xls uploading on commons
May 23
13:30 jeluf: Duplicate IP in network. Removed secure.wikimedia.org IP from goeje.
13:00 mark: Removed two pmtpa squid service IPs from DNS, to better balance the load (2 ips for every squid)
8:17 Kyle: Bart back up, ready for service.
5:35 jeluf: bart is not responding to ping. Moved secure.wikimedia.org IP to goeje. Slow, but up.
May 22
14:00 domas: fixed srv76 storage (moved binlogs to /var/backup/ on srv94), decrashed the cluster, restarted all apaches on the way. also restarted srv6/srv7 squids.
06:06 ævar: switched $wgAllowTitlesInSVG to
true
globally after asking brion on the channel, see also
bug 4388
May 21
20:34 hashar: restarted apache on
srv92
&&
srv94
. Both had a httpd.pid dated May 20 21:00. Are they stuck daily ? :/
09:07 brion: found
inconsistency in page table
between ariel (enwiki master) and db4 & db3 (enwiki slaves). May date to
#April 15
db3 crash / master switch to ariel. Corrected this entry (update to page_latest and page_len), but we should check for others.
Use maintenance/fixSlaveDesync.php, it was designed to fix exactly this problem. It needs to be updated to take account of the enwiki split, it should only check slaves with non-zero load. But otherwise it's ready to run. --
Tim
May 20
21:34 hashar: updated BotQuery extension from r14130 to r14316
21:10 hashar: sent yurikbot the botquery.log (archive kept in the log directory).
20:57 hashar: recompiled texvc and scaped site for
bug #2015
May 19
21:34 Tim: Verio still reports ns1=.208. Gave this IP address to zwinger and reconfigured pdns to listen on it, in addition to .207.
20:16 Tim: installed ganglia on srv103 and 116, brought them into rotation (srv116 probably was already)
19:48 Tim: upgraded nagios to 2.3.1
18:29 brion: reclaimed wiki-offline-reader-l mailing list
08:19 Tim: setting up srv116
07:20 Tim: setting up srv103
May 18
23:18 brion: enwiki dump restarted. will likely die again, as systemic problems have not yet been fixed, but at least it'll get a current dump done out.
00:53 brion: migrated WikiSpecies from species.wikipedia.org to species.wikimedia.org. Gave it a temp favicon.
May 17
23:10 brion: setting up dns for species.wikimedia.org per
bugzilla:5158
22:19 hashar: namespaces 104 to 111 now have subpages enable by default (bug 5978)
00:16 Kyle:
srv117
is up. I had a name conflict with two srv116's. Sorry :(
May 16
23:58 Kyle:
srv116
is just needs to be configured for service.
23:52 Kyle:
srv103
up.
22:44 brion:
srv116
seems to be broken. was it reinstalled?
20:50 domas: deleting 68 gigs of 'too many open files' in mwdaemon's log on coronelli...
11:35 mark: Enabled subpages for article name space on chapcomwiki
May 15
13:25 mark: Found
vandale
running with the slow select loops / 100% cpu problem, restarted it with a Squid with an alternative malloc lib
10:19 brion: created wikivi-l, wikihu-l
May 14
19:13 ævar: Synced InitialiseSettings.php, see bug 5925.
16:56 mark: Splitting up knams in 'text squids' and 'images/upload squids' clusters using a new version of PyBal. Using LVS ip
.156
08:06 brion: made 'Xenu' ua check in squid conf case-sensitive; was hitting 'xenU' for someone using elinks on a Xen linux kernel
06:17 brion: installed librsvg on bart
06:10 brion: installed latex and ploticus on bart
06:03 brion: installed imagemagick on
bart
, investigating other possible problems
06:00 brion: bart is fucked up, missing imagemagick and other dependencies. who moved secure.wikimedia.org to this without making it properly set up?
May 12
22:50 jeluf,mark: Changed yf1019 into a squid, added it to the LVS load balancing. Broke the site for a few minutes, but fixed it finally.
19:05 brion: added Wikipedia:Copyright_problems to robots.txt blacklist to cut down on whining
06:00 Tim: put a placeholder in /mnt/upload3/wikipedia/commons/thumb/c/ce/Approaching_in_Beijing.gif/120px-Approaching_in_Beijing.gif to stop the repeated thumbnailing attempts
05:03 Tim: finished setting up nz.wikimedia.org
04:37 Tim: moved nl.wikimedia.org and pl.wikimedia.org from special.dblist to wikimedia.dblist
00:03 brion: resynced metawiki search
May 11
23:27 brion: rebuilding metawiki search index, broken.
23:02 brion: restarted frwiki dump again. broken dump left holes which probably came through prefetched; so changed prefetch to refetch on blank entries.
22:05 hashar: restarted apache on kluge, srv2, srv3, srv4, srv11, srv12, srv13, srv14, srv15, srv16, srv17, srv18, srv21, srv22, srv23, srv25, srv27, srv28, srv30. Same problem as below although the server were still serving pages.
21:49 hashar: restarted apache on
srv0
srv19
srv29
rose
and
hypatia
. They had some 'convert -size 120' process stucked on commonswiki images
Approaching_in_Beijing.gif
View_of_Earth_from_New_Horizons_-_20150714.gif
New_Horizons_trajectory_animation.gif
. Bug in convert with gif ?
Tim: database for nz.wikimedia.org created, but apache configuration and DNS still need to be updated
14:40 mark: Reduced
ragweed
's cache_dirs to 2x 3000 MB, let it purge and repooled it
11:00 mark: Rebooted, and depooled
ragweed
: it's too slow as a disk squid
10:00 mark: Restarted
vandale
, it was running Squid at 100% CPU
08:30 Tim: srv89, srv97-120 in service
04:20 Tim: setting up srv97-120
103 and 117 down, in addition to the expected 118 and 119
May 10
22:46 brion: restarted frwiki dump; was broken due to bad php files (from disk full incident)
19:20 mark: Set up geodns for images cluster, testing with NL and DE
17:00 Tim: testing LVS wlc weighted by CPU count for apache cluster.
Seems to work, CPU usage is much more even. Miss service time dropped by about 15-20%.
05:42 Kyle:
srv120
is up.
05:02 Kyle:
srv89
, and
srv105
-[srv117]] are up. (The last three have hardware issues)
01:36 brion: had to poke around at crap; InitialiseSettings.php was 0-bytes on benet which confused things
01:10 brion: reinstalling missing PHP extensions on benet. adjusted dump scripts to barf loudly if UTF-8 helper extension is missing
May 9
20:35 brion: switched
amaryllis
's webserver from apache to lighttpd, handles large files more reliably
19:31 brion: unsuccessfully tried to reduce logging junk on mwdaemon. had to delete logs on maurus and restart, keeps filling disk
19:23 brion: increased tcp window size on
amaryllis
per
[4]
, speeds up large file transfers yaseo<->knams by a factor of 24
19:07 brion: disabled sendfile on download-yaseo; broke on very large files
11:30 mark: Testing Squid with a patch that allows HITs to be converted into MISSes when disk reads are required and disk I/O load is too high. Testing on
ragweed
sq1
sq2
08:05 mark: Disabled BGP load-sharing / multihop BGP, as this setup has flaws
07:00 Kyle:
srv97
srv104
up.
06:52 Tim: re-ran setup-apache on srv52
Tim: created
06:16 brion: under jeluf's direction shut down the link from csw1 to csw4, things work again. some broken failover...
05:17 brion: investigating some sort of network oddity
zwinger cannot reach outside world. reason not known; other servers with same gateway appear to be working fine.
02:30 brion: trying to copy that dump over to zedler
00:54 brion: restarted slave on dryas
00:17 brion: taking dryas out of rotation to make a synchronous dump for zedler
May 8
23:36 river: installed
mailgraph
on goeje
23:33 brion: tunnel to henbane is ready at that end
22:59 brion: created toolserver replication user on henbane. still need the tunnel added
19:53 mark: I documented the new network setup at
BGP
18:00 mark: BGP multipath (connection load sharing) works
14:50 mark: Restarted srv6-8 because of the SYN cookies problem, moved .235 from
srv8
to
will
05:22 Tim: increased contributions limit on Special:Renameuser to 200,000.
May 7
18:00-20:00 mark: Set up
BGP
L3 failover between our routers and PM. One uplink was moved to a new PM router. Failover has been tested and works, load sharing does not work yet due to some issue with multihop BGP sessions not coming up...
17:32 Tim: put srv81-96 into rotation
12:00- Tim: setting up srv81-96
07:46 Kyle:
srv81
srv96
are up and ready for service with raid1. (except
srv89
which has a bad drive.
06:49 brion: reset url host for wikimedianz-l list to mail.wikimedia.org, was inexplicably set to wikimedia.org which of course doesn't work. Additionally, changed mm_cfg.py to default to wikimedia.org/mail.wikimedia.org instead of wikipedia.org/mail.wikipedia.org for url and mail hosts.
May 6
20:41 river: configured asw3's switchport on csw4, and set up asw3... new machines should be working now
21:33 mark: Updated DNS for
asw3-pmtpa
. Shut down the internal interface of the
equalizer
load balancer, as it was using asw3's IP.
15:47 Tim: mysqld on srv73 was restarted (or maybe it restarted itself) a couple of hours after Domas disabled writes to it. That may have fixed it, re-enabling experimentally.
14:08 Tim: changed wikiadmin password on external storage servers
09:55 Kyle:
srv81
srv88
are up and ready for service. (Although I can't ping them, I assume it is because some configuration is required on
csw4-pmtpa
09:25 brion: rsync died, doesn't seem to happy with giant file sets. running tar
07:42 Kyle:
asw3-pmtpa
is up and available on the SCS. It's uplink (port 48) goes to
csw4-pmtpa
port 2.
04:46 Tim: made
a redirect to
01:00 brion: trying to rsync a private copy of amane's uploads for testing
May 5
17:26 Domas: Disabled writes on
external storage
cluster 4,
srv73
misbehaving; frequent
Connection error: Unknown error (srv73)
14:55 Tim: running squid compiled with -pg on sq1
06:23 Tim: moved .248 back to will after configuring and doing a reboot test.
03:54 Kyle:
will
reinstalled with FC4, at 10.0.0.21
02:39 Tim: moved 207.142.131.248 from will to srv9 so that Kyle could reinstall will
00:39 brion:
srv52
appears to be hung, can't ssh in
00:13 brion: disabled x-moz log; too big and we're not using it atm. srv59 was segfaultingk, restarted it
May 4
21:57 mark:
Kyle
: Please connect ports gi0/36 of
csw1-pmtpa
and gi0/47 of
csw4-pmtpa
to eachother!
21:33 mark: Setup
HSRP
for vlan 1 and vlan 2 between
csw1-pmtpa
and
csw4-pmtpa
20:18 brion:
hacked viewvc
for utf-8 charset header
18:30 jeluf: srv51 dead, srv66 dead. Replaced srv66's memcached by srv54
3:22 Kyle: anthony needs and updated dns entry, or I can change the ip. Its currently at 207.142.131.233
2:59 Kyle:
srv80
up after a removal of audit and a corrected grub.conf
2:52 Kyle:
srv51
and
srv54
are back up. Same raid controller crash. When this happens they need a cold reboot to get going again. I'll talk to SM in the moring about it.
May 3
21:50 brion: removed some junk users on enwiki
18:30 mark: All squids upgraded, except
ragweed
16:00 Tim, mark: Built a new squid RPM
2.5.STABLE13-7wm
with a
htcp.c
malloc() efficiency improvement by Tim, testing it on
mint
and
sq1
14:30 mark: Restarted Squid on
srv7
and
srv8
: were sending SYN cookies again. We're narrowing down this problem.
02:07 brion: added translation and wikimania-program queue aliases for otrs
00:06 brion: load spike on apaches for several minutes following that scap. not totally sure if related or not. did a set of apache-graceful on apaches around the time it cleared up, not sure if it did it.
May 2
23:59 brion: hacked Profiling to avoid spitting out errors when the profiling table is full
20:00 mark: Increased
cache_mem
on
vandale
to 6 GB.
19:45 jeluf: changed processing of wiki@wikimedia.org mails. Instead of sending them to OTRS, vacation will process them, sending a short notice explaining how to contact WMF.
16:00 mark: Increased
cache_dir
settings of
ragweed
to 2x 5000 MB
15:13 mark: This could be some issue with TCP sockets being "stuck" in a certain state. Alarm me when it happens next, so I can take a closer look!
14:50 mark: restarted core squids after users complained about slowness. Squids were sending
SYN cookies
, had high load and served only half of the usual req/s. After restart, req/s served doubled immediately.
13:20 jeluf: srv51 died, replaced its memcached with srv60's
05:44 Tim: configured sq9 and sq10 to add .228 on startup
05:40 Tim: fixed filedescriptor limit on sq9
04:43 brion: sync-common on
srv56
, was missing query.php
04:33 Tim: added .228 to sq9
04:11 Tim: warming cache on webster
04:06 Tim: sq9 back into service, under supervision
03:50 Tim: disabled AFS stuff on sq9 and rebooted
May 1
22:34 brion: added mediawiki-i18n list
22:31 mark: Took
sq9
out of service as it was behaving very oddly and distinctly from other Squids. Page fetches of exactly 1 second service time slowing upload.* down, almost 0 cpu usage, connection refused with cachemgr, weird mount options, AFS kernel processes and a standard load average of 1. What's up with this box?
20:06 brion; archiving some old files from benet to amane
~19:00 mark: Restarted Squid on
sq9
as it was strange.
08:13 Tim: sq9 back in service after an accidental outage
07:45 Tim: accidentally pasted wikiadmin password into IRC, changed it
07:24 Tim: took thistle out of rotation for mysqldump copy to webster
06:06 Tim: disabled toggle_offline_mode in cachemgr
April 30
18:52 mark: Changed performance mount options on sq1-10 filesystems
17:55 mark: Made Pascal's Apache listen only on pascal's main IP.
17:54 brion: starting
index rebuilt for itwiki; index was broken.
16:15 brion: srv8 down
16:10 brion: updated stats for kshwiki
15:56 Tim: brought sq9 into service as an upload squid
15:00 mark: Tried changing the I/O scheduler back on
sq1
but it
blocked
when doing so.
Kyle: please reboot sq1
Rebooted by colo.
14:30 mark: Changed the I/O scheduler on
sq1
in runtime to the
deadline
scheduler for all block devices.
12:40 mark: Made journal-less reiserfs partitions
/dev/sda5
and
/dev/sdb5
on
ragweed
, restarted squid with 3 GB cache_dirs on each, and 1500 MB of cache_mem.
05:47 brion: suppressing warning with @ on socket_sendto call in GlobalFunctions shitty hack to send stuff to webster (some sort of dns error intermittent?)
05:45 brion: removing some recentchanges entries from various databases due to privacy-leak bug in Special:Userlogin.php
04:36 brion: started dump run in yaseo; error mails and 7z junk output should be fixed
April 29
23:38 brion: hacked bugzilla to include subversion viewvc links for r\d+
23:29 brion: running special page jobs for all wikis on srv31
23:06 mark: New Squid RPM deployed on srv6-10, and therefore on all hosts.
22:47 mark: New Squid RPM deployed on sq1-10
22:08 mark: New Squid RPM deployed on all yaseo boxes
21:03 mark: New Squid RPM deployed on all knams boxes
21:00 jeluf: started sq3, added to lvs, died the very same minute.
20:59 brion: running update-special-pages-small manually on srv31. tim has a cronjob but it hasn't run since april 9, not sure why. changed the runner scripts to use full path to PHP just in case
20:00 mark: Starting deployment of
squid-2.5.STABLE13-6wm
, which has been modified to not replace
Cache-Control
headers from Squid requests to origin servers.
19:45 jeluf: added
db3
back to db.php
19:20 jeluf: restarted mail services. Move of database failed.
19:00 jeluf: shut down goeje's MTA for move of OTRS database.
08:45 Kyle:
isidore
now with fc4 at 10.0.0.18.
07:58 Kyle:
anthony
up.
07:51 Kyle:
sq3
rebooted. Everytime it crashes it is a kernel panic on what looks to be a sync.
07:11 Kyle:
srv54
up.
06:59 Kyle:
srv20
srv80
back up with new drives.
04:18 Kyle: Console redirection and netboot on
will
April 28
21:31 hashar: changed nds_nlwiki sitename (Wikipedie)
20:45 hashar: srv31 not sane, opened
bug #5750
20:40 hashar: scaping live trunk@13908 . Lot of "cp cannot open '*/.svn/empty-file for reading: permission denied'" :(
13:10 Tim: finished squid upgrade
13:10 jeronim: moved 207.142.131.246 squid VIP from will to srv10; starting OS reinstall on will
12:20 Tim: starting upgrade of squid on sq1-9 and yaseo
12:07 The new Squid RPM
2.5.STABLE13-4wm
seems memleak free! Deployed it in knams and srv6-10 (not will). Set cache_mem to 2048 MB too. I'll be gone later, so if any severe problems occur, you might want to reinstall the old Squid RPM
2.5.STABLE12-somethingwm
; and revert the cache_mem setting.
12:08 Tim: brought sq10 into service
11:51 jeronim: changed mailman master password
11:48 Tim: removed 10.0.0.30 from coronelli, is meant to be for bart
10:00 jeluf: changed
secure.wikimedia.org
from an alias for goeje into a service IP, 207.142.131.219, which is now pointing to bart. Goeje is still serving mail.wikimedia.org
7:30 jeluf: srv54 went down, using srv66 as replacement memcached
03:57 brion: secure.wikimedia.org back up, load should stay lower now hopefully
03:52 brion: goeje doesn't have apc installed, woops :) fixing
03:45 brion: goeje is swamped with high load; possibly from a redirect from a chinese site to secure.wikimedia.org
April 27
22:30 jeluf: Starting to move mail services (mailing lists, MTA, OTRS, etc) to
bart
mark: I found a
big memleak in Squid
which most likely is the one that's been giving us problems. I am testing the bugfix it on
ragweed
and will deploy on all squids if no problems occur.
10:30 jeluf: changed ticket.wm.o into a CNAME to secure.wm.o, restarted powerdns because 'update' was hanging
10:08 jeluf: activated subpages for the project namespace on frwiki
4:38 Kyle:
srv66
back up.
4:33 Kyle:
sq10
up.
4:21 Kyle:
db3
Backup. fsck'd.
4:09 Kyle:
sq3
back up.
April 26
20:36 hashar: from
dberror.log
: The table 'profiling' is full (10.0.0.2)
16:26 mark: Built and deployed a new Squid RPM 2.5.STABLE13 on
ragweed
, including a fix that might have been for our big memleak problem. Will deploy on all squids if testing is successful
15:53 Tim: restarted squid on yf1004, was swapping heavily
15:27 mark: Setup
ragweed
as Squid with FC5.
14:12 jeluf: fixed yf1002's fstab, had corrupted entry for swap partition. Changed / to noatime
09:12 jeluf: removed sq3 from the upload.wm.o pool, was timing out all the time. No SSH login to sq3 possible.
07:46 Tim: Set icp_query_timeout to 10ms. This seems to have fixed about half of the sibling hit problem.
04:20 Tim: Fixed squid.conf warnings. Restarting squid on sq1 for testing.
~03:00 Tim: set up bart for squid sibling hit test
02:18 Tim: readded sq9 to the pmtpa.wmnet zone. sq9 is on 10.0.3.9 but not 207.142.131.227, that's bart. Will be shortly be changing 207.142.131.227 from sq9.wikimedia.org to bart.wikimedia.org.
April 25
22:43 brion: shutting down
srv20
; disk errors caused / to be remounted read-only
02:11 Tim: re-running setup-apache on the servers brion complained about below. They have old versions of some things.
April 24
22:05 brion: created chapcomwiki.blobs manually on
srv73
21:59 brion: created chapcomwiki.blobs manually on
srv76
21:55 brion: reports of save failures on chapcomwiki, probably missing blobs table
mark: Reinstalled
ragweed
with FC5
16:10 jeronim: In
goeje:/opt/otrs/Kernel/Config.pm
, set
$Self->{FQDN} = 'secure.wikimedia.org';
as it was incorrectly set to ticket.wikimedia.org (advice from Solensean, thanks).
16:00 Hashar: r13845 should make sockets non blocking when purging squids (see
the diff
). Untested unfortunatly :(
11:20 Tim: checkStorage.php completed, I'm now using it to fix the wikis corrupted by a bug in compressOld.php. Sample output at
, the rest is in /home/wikipedia/logs/checkStorage . Currently running on srv31, I might need to move to benet later to get higher dump filtering speeds
03:37 brion: upgraded leuksman to php 5.1.3RC3 for testing
03:27 brion: added dns for wm06reg
02:43 brion: reenabled Austin's login so he can set up and maintain the wikimania registration
02:15 brion: setting up ruby on rails for wikimania registration app on friedrich
02:05 brion: mounted /home on friedrich
April 23
22:14 brion: enabling debug log for botquery (/h/w/l/botquery.log)
20:33 brion: srv51/55/61/67 segfault regularly. stopping apache on them for the moment.
These were out for memory replacement. Allegedly they are fixed, but why the crashes?
20:30 jeluf: db3 is crashed, removed it from the pool
20:15 jeluf: restarted apaches, some were showing APC issues
19:45 brion: quick hack on special:boardvote to send enwiki visits to meta. enwiki no worky due to db changes.
03:54 brion: restarting search boxes
03:45 brion: syncing search database from finished dewiki
April 22
12:00 jeluf: added thistle and db3 again to the pool
10:19 brion: redoing lucene build of dewiki; it failed before, and now there's no index :P
10:05 brion: restarted search servers; synced updated data
09:51 brion: started dump in yaseo
4:45 jeluf: stopped mysql on thistle and db3
4:45 Kyle: A candle of
Saint Jude
is lit in the colo.
April 21
23:41 brion: activated Makebot extension, for bureaucrats to assign bot status
19:20 brion: loosened
smtpd_helo_restrictions
as danny keeps complaining about mail bouncing from poorly-configured mail servers
05:50 jeluf: started db2 and db3 again.
04:33 Kyle:
srv56
back up after rma
04:30 jeluf: taking db2 down to copy its DB to db3
01:05 brion: changed scap and sync-common to use rsync instead of cp; cp was sometimes overwriting incorrect files when symlinks were replaced with regular files. (for example the common favicon.ico)
00:41 brion: restarted replication on henbane, was stopped with 'impossible log position' error.
stuck on .445 position 900689993; restarted on .446 position 0.
briefly accidentally started at .445 position 0. some duplicate key errors, shouldn't cause any harm.
April 20
18:30 jeronim: added info-sv otrs alias
17:40 brion: created wikisource-l mailing list
10:42 Tim: created missing mounts on harris
09:18 brion: started dumps threads 3,4 both on benet
08:37 brion: started
dumps
threads 1, 2
08:31 brion: removed
albert
nfs mount from
benet
; obsolete
08:25 brion: apache 2.2 on leuksman mysteriously hung somehow. had to kill -9 and restart.
06:49 brion: starting the
index rebuild on maurus. again. try not to explode this time, data center
06:39 brion: rebooting
benet
. added gateway on (disabled) eth1 config; also tried setting HWADDR to see if it sees that it has a problem more easily.
05:46 Tim: moved cache epoch forward to 15:15 yesterday, which is just after Mark fixed the NFS mount problems
05:45 brion: leuksman up, can now search for "rpm" on this wiki
05:37 brion: taking leuksman mysql offline to change search parameters and back up
05:21 brion: deleted 250megs of old log files, restarted pascal's apache. bugzilla back online.
05:14 brion: pascal root partition full, broken bugzilla. clearing space; stopped apache while owrking
04:30 jeluf: changed pppuser@zwinger to login shell '/bin/true', changed sqltunnel on zedler to use '-n -N' instead of 'while [ 1 ]; do sleep 10000; done'
03:35 Tim: changed /etc/fstab everywhere to use 10.0.5.8 for /home. Suda has that IP, but nothing is actually using it yet.
03:04 Tim: brought srv52 into apache rotation
02:41 Tim: fixed harris, srv15, srv51
02:15 brion: updated squid error pages with spanish fix
~01:00 Tim: fixed ganglia
April 19
23:52 Tim: ran "chkconfig ntpd on" on srv71-79, srv8 and srv9. Stepped time by about an hour on srv8. Started ntpd on srv8 and 9.
22:08 hashar:
shows an experimental graph of number of jobs on enwiki. It's in my crontab every 5 minutes.
21:16 brion: set wikitech wiki to require login to edit, to make sure we know who's editing this thing. :)
20:51 brion: started
job queue
runner on
srv31
20:01 brion: restarted postfix on
pascal
. seemed to be trying to send mail to itself instead of on to pmtpa. [yay, bugmails coming to tampa now]
19:35 jeluf: replaced memcached srv66 (down) by srv51 in mc-pmtpa.php
19:21 jeluf: started gmetad on zwinger, added it to rc3.d
18:18 brion: disabled wgAllowExternalImages on nlwiki. Why was it on?
18:03 brion: started ntp on srv71-srv79, were not quite synced and no ntp running
17:14 jeluf: reverted changes done on pascal's httpd.conf, so that bugzilla works again.
17:05 jeluf: moved IP 246 from srv9 to will
16:40 jeluf: added thistle to the mysql pool
16:30 jeluf: started external storage server srv71, didn't come up at boot time (since LDAP was not available yet and mysql user is in LDAP only)
replication on zedler broken:
060419 16:25:43 [ERROR] Got fatal error 1236: 'Client requested master
to start replication from impossible position' from master when reading
data from binary log
16:24 jeluf: changed pppuser account from /bin/cat to /bin/bash, since the former didn't work.
needs fixing
16:03 jeluf: mwdaemon crashed on all 3(!) nodes, restarted.
16:03 mark: Broke Benet's routing (even more than it was). Benet is
down
, colo says it freezes at 'booting the kernel'. Kyle needs to look into it
16:02 mark: Fixed SCS routing
Tim (various times up to 16:00): reset slave on db2, db4, lomaria, db1. Spot check on max(rc_id) on various databases showed no apparent problems, replication continued afterwards. Had some trouble with apache threads connecting to database servers while they were starting up, swamping them with load when they started serving. Thistle is currently out of rotation due to this. All wikis now r/w.
15:58 jeluf: startet
mwdaemon
on maurus, vincent, coronelli.
14:58 mark: mail stuff on goeje up
14:52 Tim, mark: enwiki up r/w, dewiki r/o, other DBs in progress.
14:40 mark: Brought up all squids, chkconfig on
14:30 mark: Brough up LVS on
avicenna
(pybal) and
dalembert
(lvsmon).
14:09 mark: fundraising.wm.org up.
13:10 Tim: All core mysql servers now have a /etc/init.d/mysql script, chkconfig on. All external storage servers have a /etc/init.d/mysqld script, due to the different RPM used on some of them. Also chkconfig on.
12:34 Tim and mark: Started named on albert and did chkconfig on
12:14 Tim and mark: Started named and ldap on srv1, did chkconfig on
12:00 mark: Fixed DNS issues on
zwinger
(ip stolen by goeje) and
pascal
11:50 Tim: sent all traffic briefly to an error server on pascal, then when that didn't work, rr.knams. So that everyone can see the pretty ERR_CANNOT_FORWARD message instead of a connection refused.
11:33 Kyle: Heading to colo, left message on brion's cell.
10:00
PMTPA
down.
05:56 Tim: turned off transactions for external store connections
05:20 jeluf: added sq3 to the upload.wikimedia.org pool of squids
April 18
21:56 brion: started medium wikis dump job (pmtpa3) on srv31. had to mount /mnt/benet
21:10 jeluf: started pybal.py on avicenna to update LVS weights.
20:50 brion: fixed perms on srv60's common-local; fixed favicon.ico again
10:38 brion: started the small wikis dump job (pmtpa4) on benet. will run others after making sure these went, then clearing space on benet
10:02 brion: started dump job on yaseo. preparing partials on pmtpa
April 17
some time Domas: dumped db4 to db2 for enwiki db job
21:11 brion: fixed grants for new toolserver repl user on ariel
20:43 brion: added wikimedianz-l list
10:17 brion: enabled special:nuke on pdcwiki
05:09 brion: adding querycache_info tables...
April 16
08:03 brion: fixed bad favicon on srv60
April 15
10:05 brion: upgraded mailman to 2.1.8
06:38 Tim: after restarting, db3 completed MySQL recovery successfully. Starting replication from ariel with fingers crossed.
04:17 Tim: db3 crashed. I switched the enwiki master to ariel. Site back in r/w mode at 04:35.
April 14
23:30 Tim: compressOld has finished on enwiki. Switched write destination to cluster4/cluster5.
22:05 brion: tried unsuccessfully to get dab's toolserver login working on the global zone so he can use the ssh tunnel for db replication
April 13
23:15 brion: more net troubles! pmtpa<->knams down. others ok
22:28 brion: yaseo able to reach pmtpa again. all squid centers appear to work
22:20 brion: net working for more people, but still not everywhere. (yaseo out; some in europe still reporting errors)
21:07 brion: bw reports the issue as flapping at level3 dampening the routes. should be resolved soon...
21:55 brion: net probs
21:00 jeluf: enabled captchas for zhwiki upon request, to fight a vandalbot
04:30 Kyle:
hydra
shutdown. Shipping soon.
April 12
22:03 brion: adding new replication user for the toolserver thingies to use on adler, db3, and potentially other places.
19:43 brion: srv71-srv79 don't appear to have working ntp, are ~30 seconds off. trying to fix again
06:30 Tim, Domas: split off enwiki db cluster to (db3,db4,ariel)
06:00 jeluf: created experimental rr-upload.wikimedia.org geozone
April 11
15:43 Tim: started compressOld.php on enwiki from position 3710964
15:37 Tim: started refreshLinks.php
15:30 Tim: removed dalembert from mediawiki-installation
15:00 Tim: did schema update for langlinks
Tim: db3 now has a copy of enwiki, up-to-date and replicating
11:32 Tim: put lomaria back into service
11:26 Tim: putting webster back into service with smaller data set
08:00 jeronim: added
/sbin/iptables -I INPUT 1 -p icmp -j ACCEPT
to
/etc/rc.local
on benet (download.wikimedia.org) so that Path MTU discovery is possible for clients.
Don't block all ICMP
. See
[5]
for why.
06:00 jeluf: added wikimania-cfp alias for OTRS
05:15 Tim: stopped slave on lomaria for SQL dump of webster's databases to db3.
April 10
22:59 brion: fixing network setup on
coronelli
, rebooting it. upgrading mono on search servers
22:34 brion: fixing network setup on
maurus
, rebooting it
22:12 brion: restarted
wikibugs
irc bot. should have auto-started on goeje boot, but likely failed due to services being out
22:04 brion: started apache on
friedrich
for fundraising.wm.o, set up crontab for updating
20:45 jeluf: started MWDaemon on vincent, but still getting the google fallback page when searching
20:00 jeluf: rebooted fuchsia
18:35 jeluf: Started squid on srv8, moved .203 and .205 from will to srv8
18:24 Tim: fixed Database::getLag(), properly this time I hope. Live hack, not sure how to commit it from there exactly.
17:48 Tim: put ixia back into service
16:09 Tim: put webster back into service
11:40 Tim: started two data directory copies, one from db2 to lomaria, and one from webster to ixia.
08:31 Tim: restarted compressOld.php, currently at 3708378. Stopped shortly afterward to reduce catchup times.
07:55 Tim: running fixTimestamps.php on all wikis
07:00 jeluf: thistle back in rotation after dammit repaired it.
05:35 Tim: started gmetad on zwinger
04:40 jeluf: started apache2 on albert
04:26 jeluf: ixia, thistle, lomaria, db1 have broken replication settings, webster has database page corruption. Taking
db2
out of rotation to create copies from it.
04:20 jeluf: mounted /home on all DB servers
04:03 brion: ran mass-correction of bad-timestamped entries on enwiki (1529 revision records)
03:05 brion: srv71-srv79 had wrong clock, apparently set to local time instead of UTC.
01:45 brion: irc feeds online. had to rescue udprec from kate's old home dir
01:38 brion: taking thistle and db1 out of rotation; broken replication.
01:32 brion: turning read_only off on adler. seems to be set to go on always on boot.
01:28 brion: things look mostly good; tried to take site read/write but someone has put adler into read-only? examining
01:23 brion: got fs-squids on the right ip. seems to work now.
01:20 brion: had to start lighty on amane
01:18 brion: trying to get fileserver squids+lvs up. (avicenna as lvs master)
01:10 brion: run-icpagent.sh didn't take previously; seems to have helped now
01:04 brion: trying to add 10.0.5.5 on dalembert also. no idea if this is correct. 10.0.5.3 works internally, but squids still don't show anything. there's no explanation for this that is obvious to me.
00:55 brion: added the lvs master ip on dalembert; http'ing to it internally seems to work, but still nothing from outside
00:49 brion: trying starting
LVS
monitor thingy on dalembert. no clue if it's working
00:45 brion: turning on apaches
April 9
23:45 brion: srv33, srv36 should now replicate properly.
External storage borkgage, 2006-04-09
23:20 brion: looking at srv33, srv36 external storage; jens reports replication seems borked
22:00 brion: added izwinger ip to suda; it wasn't automatic.
21:52 brion: finally got into srv1 and albert. maybe working
21:49 brion: ldap depends on dns; dns is still broken. we can't reach srv1 or albert.
21:32 brion: still trying to get some core machines online (suda booting; albert ?? srv1 ??). kyle should be available in 30 minutes
20:55 brion: bw is onsite and available to poke at machines. there was a power problem; some machines seem to still eb booting
20:42 brion: phoned kyle (message)
20:38 brion: network mostly back up, still trying to get in
19:20 brion: PowerMedium offline?
15:20 Tim: shutting down mysql on lomaria for copy to ixia
14:50 Tim: installed mysql on ixia
8:45 jeluf: deleted binlogs 110-129 on srv34. Now 33GB of disk space are left, that's about 3 weeks.
April 8
03:17 Kyle:
ixia
back up. Ready for mysql.
00:30 brion: added presskontakt otrs alias
April 7
19:40 brion: fixing favicons/logos for chapcom, spcom, internal
19:00 jeluf: rebooted iris, mayflower
04:44 Tim: updated article count on idwiki from 20777 to 21250 to correct for drift due to subst bug. The recount was done by running
select distinct pl_from from pagelinks,page where pl_from=page_id and page_namespace=0 and page_is_redirect=0;
and observing the number of returned rows.
01:40 brion: upgraded svn to 1.3.0 on zwinger; should fix the group-writable problems with 'svn up'
April 6
21:26 mark: Reenabled knams in DNS
00:38 brion: chapom.wikimedia.org and wikimaniateam.wikimedia.org set up
April 5
23:37 brion: added eventscom-l list
22:41 brion: found someone had borked over the live svn checkout as root. reassigned the .svn files to brion/wikidev and made group writable. hopefully svn will cooperate...
21:20 jeluf: removed binlogs 90 to 109 on srv34. Only 4GB of disk space were left.
19:34 brion: yaseo magically reachable again. the network gods have been appeased!
19:20 brion: yaseo unreachable via pmtpa squids. yaseo squid seems ok.
5:45 jeluf: checked grub.conf on sq1...10 and fixed the default kernel setting. When doing a yum update, please make sure that the server still boots 2.6.11. Only that kernel has the drivers for the SATA controller.
5:35 jeluf: Rebooted srv24 (cluster2). It had a kernel oops and the mysqld was not responding to any events - not even kill -9. Everything looks fine after the reboot.
April 4
19:30 brion: after manually readding the lvs ip to sq1, upload.wm.o seems a bit better. THESE BREAK ON REBOOT. THEY NEED TO BE SET UP PROPERLY.
19:20 brion: sq* messed up somewhat. sq1 missing net stuff; sq2-4(?) have broken ldap
07:00 brion: got
wikibugs
back online; running on goeje alongside the mail server
06:39 brion: fixed /mnt/math on goeje (
bugzilla:5441
); unmounted old upload shares no longer used.
April 3
21:50 jeluf: deleted srv34_log_bin.08*. There were only 2GB disk space left.
21:05 mark: knams seems unreachable, Redirected knams traffic to pmtpa.
21:07 mark: Moved back
21:16 mark: and back.
April 2
19:22 brion: restarted enwiki dump. it crapped out yesterday, apparently with a database error of some kind ('could not connect: unknown error')
08:30 brion: SVN-ified the live checkout in /h/w/c/php-1.5
04:05 Kyle: db4 reinstalled FC4.
04:00 Kyle: Bad drive in
ixia
found, RMA requested
April 1
14:30 Tim: increased default max factor in compressOld to 5, this should reduce the number of talk pages that are compressed with two revisions per chunk. This means that if a talk page is 100KB, the compressed chunk can be up to 500KB.
~14:00 Tim: easing thistle back into rotation after listing it in enwiki's section to prevent lag due to compressOld.php, see comments in db.php
10:50 brion: briefly taking thistle out; it's lagging a lot
08:00 brion: working on installing svn on leuksman server
March 31
19:42 brion: tossed together maintenance/purgeList.php, takes page titles on stdin and runs squid purges on them
19:10 brion: wiped fr.wikiquote.org
18:35 brion: got mailman back up. ran into a
mailman encoding bug
rebuilding archives
17:58 brion: shutting down mailman temporarily to edit archives
15:15 Tim: fixed mysql cluster in ganglia
14:35 Tim: running compressOld.php on enwiki
14:20 Tim: shutting down lomaria again for copy to db2
Tim: Set up db1-db4, installed mysql. Shut down lomaria for a while to copy its data directory to db1. db4 needs it mysql upgraded when it comes back up (killed during testing)
02:50 ixia down
March 30
18:40 brion: added info-da otrs alias
~16:30 Tim: ran compressOld.php and moveToExternal.php on gawiki to test the various tweaks I made to both. Seems to have worked.
14:40 Tim: running resolveStubs.php on all pmtpa wikis
14:20 Tim: rebalanced database load
~08:00 Tim: set up srv71-76 for external storage
07:19 brion: tweaking the user-agent protections so 'PHP' check is case-sensitive; some false positive problems with '.php'
05:45 brion: lomaria ans thistle were behind on replication: slow newusers log display (WHAT IS THIS WHY DOES IT HAPPEN EVERY COUPLE OF DAYS) and some other maint script. killed threads, lomaria caught up; thistle still running maint rebuilds, took temporarily out of rotation.
04:13 brion: adding audio/midi for amane (
bugzilla:5277
March 29
20:49 brion: set up cswikisource, mlwikisource, skwikisource. cs and sk imported pages from sourceswiki.
19:15 brion: blocked access to frwikiquote dumps
19:12 brion: locked frwikiquote per agreement
18:16 brion: moved oversize x-moz; was breaking the 32-bit machines due to its >2gb-ness
14:25 Tim: brought srv54 into apache service
09:25 Tim: brought srv71-80 into apache service
06:30 jeluf: activated upload directory creation in CommonSettings.php
March 28
22:00 brion: hacked up refreshImageCount.php to force the ss_image columns to start replicating. the updater is still bad
21:44 brion: ss_image is NULL on slaves; updater script uses variables which don't survive replication
21:30 jeluf: Rebooted iris
19:45 brion: created missing upload dirs for bat_smg, closed_zh_tw, fiu_vro, frp, ksh, lij, map_bms, nds_nl, nrm, pap, pdc, rmy, roa_rup, tet, vls, xal, zh_min_nan, zh_yue
09:06 brion: updated messages on he* wikis
07:24 brion: someone enabled the experimental ajax search on dewiki and dewikibooks. I've turned it off, as I received a complaint and I agree it's a very unexpected and painful UI (takes over the screen with no warning, what the hell)
05:20 Tim: running update.php on all wikis, to add the ss_images field
02:20 Tim: Fixed various ganglia problems.
March 27
18:08 brion: fixed upload dir for pmswiki; fixed permission on wikipedia upload parent dir, should allow add script to add them automatically if it's doing that now. mounted upload3 on zwinger
11:21 Tim: The upgrade had apparently stopped apache on most of the servers over the course of 2 hours, eventually causing extreme site slowness. Ran apache-start on all apaches.
09:24 brion: finishing upgrades; some broke because yum is confused by the libxml manual upgrade. may want to check comparison vs fedora bits.
01:49 brion: upgrading remaining PHP 5.1.1 boxen to 5.1.2; timezone bug showing +0100 instead of +0000 for UTC
01:29 brion: added upload dir for test wiki
March 26
22:52 brion: dropping lomaria temporarily from db.php as it's badly lagged atm -- running special page cache rebuilds or something
22:00 JeLuF: SSH keys added to db1...db4, LDAP configured, NFS configured, NTP configured, timezone changed to UTC.
21:30 JeLuF: Added Piedmontese wikipedia
12:12 Kyle: db3 and db4 are now on vlan2 on
csw1-pmtpa
. Pingable.
March 25
07:27 brion: fixing up backup processes; weren't properly setting server usage, might work around the adler oddities. (also fixing the display problem)
07:00 jeluf: db3 and db4 (10.0.0.236 and .237) do not ping
07:00 Tim: restarted squid on yf1001 and yf1003, heavy swapping
04:32 Tim: Added namespaces to hewikibooks
[6]
02:35 Tim: reduced article size limit to 1MB on request from users in #wikipedia-en-vandalism
March 24
23:00 jeluf: Added new Wikipedias for nds-nl, rmy, lij, bat-smg, map-bms, ksh, pdc, vls, nrm, frp, zh-yue, tet, xal, pap. See
meta
16:40 jeluf: Restarted slave on zedler. Why doesn't it restart automatically?
16:10 jeronim: unfirewalled all ICMP on benet to solve someone's problem with downloading from dumps.wm.org. /u/l/b/firewall-init.sh not altered because i don't know if that's the right script nowadays
07:14 Kyle: db1-4 are ready for service at 10.0.0.234-237 with 408GB /a's
5:40 jeluf: rebooting iris
March 23
19:52 brion: another mystery case of 'Error: 1114 The table '' is full' on adler. Various tables (job, text, pagelinks, etc). Plenty of disk space free, dump still running; unclear what's full. Adler's error log shows lots of "InnoDB: many active transactions running concurrently?060323 19:52:43InnoDB: Warning: cannot find a free slot for an undo log. Do you have too"
19:49 jeluf: KNAMS back, switched back to old DNS map.
19:05 jeluf: www.kennisnet.nl down, too, no SSH. DC outage assumed. Switched PowerDNS to point all of Europe to Florida.
18:50 jeluf: KNAMS squids not responding. Load balancer?
02:33 brion: starting enwiki backup again; last run got hit by a mysterious "Error: 1114 The table '#sql_a6a_0' is full (10.0.0.101)"
March 22
07:30 domas: srv59, srv51 hit by /h/w/src/memcache/install-fc3, continuing...
March 21
20:25 jeluf: srv59 is listed twice in the list of memcached servers. Replaced one of them by srv71.
20:00 jeluf: Users complain about bad performance. No servers seem to be broken, but tugelas are behaving odd. There are fast ones (0.05s for 100 requests) and slow ones (5s for 100 requests). Slow ones have bi values of 450, fast ones have bi values of 20. bo is 0. mctest at 20:17 UTC:
10.0.2.51:11000 set: 100 incr: 100 get: 100 time: 4.16831994057
10.0.2.55:11000 set: 100 incr: 100 get: 100 time: 0.0873651504517
10.0.2.53:11000 set: 100 incr: 100 get: 100 time: 0.0911560058594
10.0.2.54:11000 set: 100 incr: 100 get: 100 time: 3.38875198364
10.0.2.56:11000 set: 100 incr: 100 get: 100 time: 0.061262845993
10.0.2.70:11000 set: 100 incr: 100 get: 100 time: 3.37843799591
10.0.2.58:11000 set: 100 incr: 100 get: 100 time: 0.126893043518
10.0.2.59:11000 set: 100 incr: 100 get: 100 time: 6.54098010063
10.0.2.59:11000 set: 100 incr: 100 get: 100 time: 6.14648485184
10.0.2.62:11000 set: 100 incr: 100 get: 100 time: 4.1362080574
10.0.2.64:11000 set: 100 incr: 100 get: 100 time: 4.54642486572
10.0.2.65:11000 set: 100 incr: 100 get: 100 time: 0.0734169483185
10.0.2.66:11000 set: 100 incr: 100 get: 100 time: 3.67762804031
10.0.2.68:11000 set: 100 incr: 100 get: 100 time: 0.155061006546
10.0.2.69:11000 set: 100 incr: 100 get: 100 time: 5.22008705139
localhost set: 100 incr: 0 get: 0 time: 0.0392808914185
9:00 jeluf: rebooted hawthorn, mayflower, sage, clematis
7:00 Kyle: Racked 4 new database servers, pending names and ip's.
March 20
19:51 brion: dumps started up again in pmtpa
19:30 jeluf: added symlink to init.d/nfs from rc3.d on benet
19:13 brion: manually banged on
benet
, got it back online on the external IP. Somehow it's switched from using eth0 to using eth1, and config needs to be adjusted.
18:54 brion: Someone, somewhere, somehow rebooted benet for some reason around
midnight UTC
two hours ago and there's a network problem, can't be reached from zwinger.
16:44 PM rebooted benet
15:30 jeluf: dumps.wikimedia.org down, connection refused when trying to ssh to the box, HTTP times out.
March 19
17:40 ævar: Synced a
new plwikiquote logo
March 18
08:40 jeluf: added srv36 to external storage cluster 3.
March 17
21:55 brion: srv60's memcached/tugela/whatever is VERY slow, 120s response time. can't ssh in. temporarily replacing it with srv59 in the mc cluster
March 16
23:14 brion: added redirects for quickipedia.(org|net) as requested
21:45 jeluf: Set up mysql server on srv36, replicating data from srv34 (cluster3). No old data imported to srv36, yet.
20:00 jeluf: Set up squid on srv8, moved one IP from srv6 to srv8
19:00 jeluf: restarted srv7's squid, using /usr/sbin/squid instead of /usr/local/squid/bin/squid
March 15
20:01 brion: adjusted checkers.php logging to use @ on all error_log() calls, so files that are forgotten on yaseo don't display warnings
19:00 jeluf: moved IP .204 from srv7 to srv9 (now they have 3 IPs each)
14:00 jeluf: restarted srv7's squid
March 14
08:04 brion: fixed bad permissions on some servers which broke sync-dblist script (uses rsync to copy *.dblist out)
07:44 brion: set up zh.wikinews.org
07:00 brion: setting up spcom s3kr1t wiki
March 13
22:55 brion: fixed (hopefully) the fallback for text loading. it was broken, badly, didn't notice before :P
22:45 jeluf: fixed replication of srv33. It has a gap from 15:00-22:45. Added back to pool. If a revision does not exist, the master should be asked anyway.
15:45 midom: srv32 manually resynced with srv34, srv33 still down
14:45 jeluf: srv32 and srv33 have out-of-sync replicas, shut them down. srv34 overloaded, went read-only
14:00 ævar: / on srv34 filled up, cleared out /tmp/mediawiki/, approx 70MB left
March 12
23:00 mark: Moved back
ns0.wikimedia.org
's IP to
zwinger
to get DNS back up
00:54 brion: renaming wikimaniawiki to wikimania2005wiki to future-proof and convenience things
March 11
22:10 jeluf: set up NFS, NTP, timezone, ... on
ixia
, added it to the mysql pool
07:30 jeluf: ixia doesn't start replication:
060311 2:13:04 Failed to open the relay log './lomaria-relay-bin.312' (relay_log_pos 36322078)
060311 2:13:04 Could not find target log during relay log initialization
060311 2:13:04 Failed to initialize the master info structure
The file is there, permissions are there, no idea what's wrong
06:55 jeluf: restarted mysql on lomaria
05:09 brion: fundraising display partially back online. waiting for dns to clear, and will start regularly updating again....
01:12 brion: got
friedrich
switched; on 207.142.131.232. rebooting to test...
March 10
23:00 brion: taking
friedrich
out of apache service to replace
tingxi
22:20 JeLuF: Taking
lomaria
down to copy its DB to
ixia
. Will take some hours.
07:20 Solar:
yongle
back up, but not public interface, just private. (It only had one cat5, let me know if you want me to hook another up to csw1.
March 9
05:22 Solar: Correct password on
ixia
March 8
23:25 brion: thistle caught up, back in service
23:23 brion: taking thistle out of rotation temporarily; it's behind on master. reports of edits overwriting without conflict message may or may not be releated
19:35 jeluf: Changed IP of mail.wikimedia.org from .207 to .221. This allows us to move ns0 back to zwinger (needs to be done later, when the change is known on all DNS servers)
08:00 jeluf: khaldun had two default gateways. Removed default gw 10.0.0.4, ping to goeje works, NFS works
08:00 jeluf:
Khaldun down, NFS times out. No user complaints yet - is khaldun still in use at all?
Update: Zwinger can reach khaldun, but goeje can't. Routing?
March 7
23:53 avar: Made Naconkantari sysop on kowiki due to massive WP is communism vandalism which none of the kowiki admins were awake to clean up.
02:48 Tim: Added Ozemail proxies to the trusted XFF list
March 6
11:50 jeluf: Changed config to use spamd instead of spamassassin
11:30 domas: reduced postfix, apache concurrency on goeje
11:30 jeluf, domas: goeje up, rebooted by PM.
09:00 jeluf: goeje down, postfix and apache shutdowns didn't help
08:00 jeluf: goeje overloaded, load avg 260, slow to no response. shut down postfix, shutting down apache
07:45 jeluf: replication of srv33 in sync with master. Restarted srv33 with mysql port 3306 enabled.
March 5
23:10 brion: added external.log for ExternalStoreDB load failures. we think mysterious text load failures might have been from srv33
23:05 jeluf: started srv33 with mysqld port set to 330
22:50 jeluf killed wiki by starting lagged external storage srv33, killed it.
22:40 brion: jens put us back to read/write as the threads finished
22:19 brion: adler broken. nobody bothering to update the admin log
InnoDB: many active transactions running concurrently?
060228 20:08:48InnoDB: Warning: cannot find a free slot for an undo log. Do you have too
InnoDB: many active transactions running concurrently?
Processlist showed several hundred attempts to invalidate one image page (Vynil_record.jpg). Perhaps from automated job?
a template change
March 4
21:35 brion: fixed problem (whitespace in language file), captchas back on except for sr.wikipedia, which is reasonably well-populated
21:28 brion: disabling captchas on all sr projects; broken on sr for some reason
12:05 brion: yaseo uploads resolved (bad symlink into /mnt/wikipedia/htdocs on yaseo docroot), math also fixed (rewrite condition crashed apache; changed it and now works)
11:43 brion: noticed amaryllis / part is very small (10g) and full. nice.
11:40 brion: yaseo uploads borked for some reason. tossed in a symlink on amaryllis so /mnt/upload works there, but not sure why many still don't work on http
March 3
20:00 jeluf: set up new queues info-ch and info-als on OTRS.
05:04 Tim: set up daily cron job on goeje, to backup its root directory to hypatia once per day, at 06:00.
03:13 brion: started another enwiki dump, yaseo dump
03:03 brion: installing setproctitle on srv31; php is whining
March 2
23:09 brion: adding dns entries for wikimania200[56].wikimedia.org, will set up new wiki and redirects shortly
14:00 Tim: started rsync of goeje's root directory to hypatia:/var/backup/ssl-server, for backup and maybe failover capability in the future.
March 1
22:16 brion: turning on wgEmailAuthentication on public wikis. Somehow
goeje got blacklisted by spamcop
, allegedly for sending to blackhole addresses. There's a small possibility that active spamming was attempted through the wiki.
05:30 Solar: srv55, srv57, srv61, srv67 have new ram, and are up, but out of sync
04:20-04:25 Tim: srv54, a tugela server, was accidentally rebooted. This took the site down for about 5 minutes, probably due to unconfigurable fwrite() timeouts on persistent connections.
2000s
Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct
with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan
with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec
2010s
Archive 15: 2010 Jan - 2010 Jun
Archive 16: 2010 Jul - 2010 Oct
Archive 17: 2010 Nov - 2010 Dec
Archive 18: 2011 Jan - 2011 Jun
Archive 19: 2011 Jul - 2011 Dec
Archive 20: 2011 Dec - 2012 Jun
with revision history 2007-02-21 to 2012-03-27
Archive 21: 2012 Jul - 2013 Jan
Archive 22: 2013 Jan - 2013 Jul
Archive 23: 2013 Aug - 2013 Dec
Archive 24: 2014 Jan - 2014 Mar
Archive 25: 2014 April - 2014 September
Archive 26: 2014 October - 2014 December
Archive 27: 2015 January - 2015 July
Archive 28: 2015 August - 2015 December
Archive 29: 2016 January - 2016 May
Archive 30: 2016 June - 2016 August
Archive 31: 2016 September - 2016 December
Archive 32: 2017 January - 2017 July
Archive 33: 2017 August - 2017 December
Archive 34: 2018 January - 2018 April
Archive 35: 2018 May - 2018 August
Archive 36: 2018 September - 2018 December
Archive 37: 2019 January - 2019 April
Archive 38: 2019 May - 2019 August
Archive 39: 2019 September - 2019 December
2020-2024
Archive 40: 2020 January - 2020 April
Archive 41: 2020 May - 2020 July
Archive 42: 2020 August - 2020 November
Archive 43: 2020 December
Archive 44: 2021 January - 2021 April
Archive 45: 2021 May - 2021 July
Archive 46: 2021 August - 2021 October
Archive 47: 2021 November - 2021 December
Archive 48: 2022 January
Archive 49: 2022 February
Archive 50: 2022 March
Archive 51: 2022 April 1-15
Archive 52: 2022 April 16-30
Archive 53: 2022 May
Archive 54: 2022 June
Archive 55: 2022 July
Archive 56: 2022 August
Archive 57: 2022 September
Archive 58: 2022 October
Archive 59: 2022 November 1-15
Archive 60: 2022 November 16-30
Archive 61: 2022 December
Archive 62: 2023 January
Archive 63: 2023 February
Archive 64: 2023 March
Archive 65: 2023 April
Archive 66: 2023 May
Archive 67: 2023 June
Archive 68: 2023 July
Archive 69: 2023 August 1-15
Archive 70: 2023 August 16-31
Archive 71: 2023 September
Archive 72: 2023 October
Archive 73: 2023 November
Archive 74: 2023 December
Archive 75: 2024 January
Archive 76: 2024 February
Archive 77: 2024 March
Archive 78: 2024 April
Archive 79: 2024 May 1-15
Archive 80: 2024 May 16-31
Archive 81: 2024 June 1-15
Archive 82: 2024 June 16-30
Archive 83: 2024 July
Archive 84: 2024 August
Archive 85: 2024 September
Archive 86: 2024 October
Archive 87: 2024 November
Archive 88: 2024 December
2025-present
Archive 89: 2025 January
Archive 90: 2025 February
Archive 91: 2025 March
Archive 92: 2025 April
Archive 93: 2025 May
Archive 94: 2025 June
Archive 95: 2025 July
Archive 96: 2025 August
Archive 97: 2025 September
Archive 98: 2025 October
Archive 99: 2025 November
Archive 100: 2025 December
Archive 101: 2026 January
Archive 102: 2026 February
Archive 103: 2026 March
Retrieved from "
Category
Server Admin Log archive
Server Admin Log/Archive 7
Add topic