⚓ T299823 Regularly run recountCategories.php on Wikimedia wikis via systemd timer
Page Menu
Phabricator
Create Task
Maniphest
T299823
Regularly run recountCategories.php on Wikimedia wikis via systemd timer
Closed, Resolved
Public
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
MarcoAurelio
Authored By
Legoktm
Jan 22 2022, 7:42 AM
2022-01-22 07:42:15 (UTC+0)
Tags
WMF-General-or-Unknown
Wikimedia-maintenance-script-run
(Backlog)
User-notice-archive
(Backlog)
User-MarcoAurelio
(cabinet)
Referenced Files
F34941981: mediawiki_job_recount_categories.txt
Feb 4 2022, 1:33 AM
2022-02-04 01:33:04 (UTC+0)
Subscribers
Aklapper
AntiCompositeNumber
JJMC89
Legoktm
MarcoAurelio
matmarex
Quiddity
View All 11 Subscribers
Description
For reasons, category counts tend to get out of sync, leading to categories having inaccurate counts, even sometimes negative counts (impossible).
@Taylor
has documented past examples of this at
T85696#7105227
One suggestion (which I pushed a patch for) is to recount categories on action=purge:
T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category
, but that requires human intervention and someone to recognize the count is wrong. We have a pretty efficient script, recountCategories.php, that should be run regularly to reconcile any remaining differences. We had to run it everywhere for
T299244
and the slowest wiki was commonswiki at ~45 minutes.
I propose we run recountCategories.php once a month as a systemd timer.
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
mediawiki::maintenance: Run recountCategories.php monthly on all wikis
operations/puppet
production
+8
-0
Customize query in gerrit
Related Objects
Mentions
Mentioned In
T300303: Deadlock in WikiPage::updateCategoryCounts query
T221795: Refactor Category::refreshCounts logic to a job and simplify
T170737: Run recountCategories.php on Wikimedia wikis
T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category
Mentioned Here
P19566 (An Untitled Masterwork)
T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category
T299244: Deleted pages are not being removed from links tables, which also messes up category counts
Event Timeline
Legoktm
created this task.
Jan 22 2022, 7:42 AM
2022-01-22 07:42:15 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Jan 22 2022, 7:42 AM
2022-01-22 07:42:15 (UTC+0)
Legoktm
mentioned this in
T85696: Allow action=purge to recalculate the number of pages/subcats/files in a category
Jan 22 2022, 7:42 AM
2022-01-22 07:42:33 (UTC+0)
Peachey88
renamed this task from
Run recountCategories.php regularly on Wikimedia wikis
to
Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer
Jan 22 2022, 7:44 AM
2022-01-22 07:44:31 (UTC+0)
Peachey88
added a project:
Wikimedia-maintenance-script-run
taavi
renamed this task from
Regularly run recountCategories.php regularly on Wikimedia wikis via systemd timer
to
Regularly run recountCategories.php on Wikimedia wikis via systemd timer
Jan 22 2022, 7:51 AM
2022-01-22 07:51:08 (UTC+0)
JJMC89
subscribed.
Jan 22 2022, 8:26 AM
2022-01-22 08:26:16 (UTC+0)
Zabe
subscribed.
Jan 22 2022, 2:00 PM
2022-01-22 14:00:28 (UTC+0)
MarcoAurelio
awarded a token.
Jan 22 2022, 2:05 PM
2022-01-22 14:05:52 (UTC+0)
MarcoAurelio
subscribed.
Edited
Jan 22 2022, 5:32 PM
2022-01-22 17:32:41 (UTC+0)
Comment Actions
How do we want to run this? For all wikis? By shards?
Taking as example what
@Legoktm
did for
T299244
, I made a draft patch at
If you think we can go directly with
foreachwiki
, please let me know. It'll make the file substantially shorter too.
Taylor
added a comment.
Jan 23 2022, 12:04 AM
2022-01-23 00:04:30 (UTC+0)
Comment Actions
SUPPORT, all wikis, once per month.
Taylor
mentioned this in
T170737: Run recountCategories.php on Wikimedia wikis
Jan 23 2022, 12:11 AM
2022-01-23 00:11:01 (UTC+0)
Legoktm
added a comment.
Jan 23 2022, 12:22 AM
2022-01-23 00:22:06 (UTC+0)
Comment Actions
In
T299823#7642475
@MarcoAurelio
wrote:
How do we want to run this? For all wikis? By shards?
Taking as example what
@Legoktm
did for
T299244
, I made a draft patch at
If you think we can go directly with
foreachwiki
, please let me know. It'll make the file substantially shorter too.
I think we can do this with a plain
foreachwiki
; in
T299244
there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.
gerritbot
added a comment.
Jan 23 2022, 12:22 AM
2022-01-23 00:22:24 (UTC+0)
Comment Actions
Change 756069 had a related patch set uploaded (by Legoktm; author: MarcoAurelio):
[operations/puppet@production] [WIP] p::mediawiki::maintenance: Run recountCategories.php regularly
gerritbot
added a project:
Patch-For-Review
Jan 23 2022, 12:22 AM
2022-01-23 00:22:25 (UTC+0)
MarcoAurelio
added a comment.
Edited
Jan 23 2022, 11:24 AM
2022-01-23 11:24:16 (UTC+0)
Comment Actions
In
T299823#7642742
@Legoktm
wrote:
I think we can do this with a plain
foreachwiki
; in
T299244
there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.
Thank you. Patch amended to use
foreachwiki
I see that
the script suggests
to run
rMW maintenance/cleanupEmptyCategories.php
in
--mode remove
if the recounting category script is run in the
pages
mode (we're configuring it to use
all
which includes that mode). Shall we configure a periodic job for
cleanupEmptyCategories.php
too?
Legoktm
added a comment.
Jan 24 2022, 7:00 AM
2022-01-24 07:00:19 (UTC+0)
Comment Actions
In
T299823#7643093
@MarcoAurelio
wrote:
In
T299823#7642742
@Legoktm
wrote:
I think we can do this with a plain
foreachwiki
; in
T299244
there was some urgency to get the counts fixed so I split it by shard but here if it's a bit slower that's fine. Each wiki will still get counts refreshed once a month anyways.
Thank you. Patch amended to use
foreachwiki
LGTM!
I see that
the script suggests
to run
rMW maintenance/cleanupEmptyCategories.php
in
--mode remove
if the recounting category script is run in the
pages
mode (we're configuring it to use
all
which includes that mode). Shall we configure a periodic job for
cleanupEmptyCategories.php
too?
Good point. See
Krinkle
mentioned this in
T221795: Refactor Category::refreshCounts logic to a job and simplify
Jan 25 2022, 6:32 AM
2022-01-25 06:32:28 (UTC+0)
RP88
subscribed.
Jan 25 2022, 7:08 AM
2022-01-25 07:08:28 (UTC+0)
MarcoAurelio
added a subscriber:
taavi
Jan 26 2022, 12:27 PM
2022-01-26 12:27:46 (UTC+0)
Comment Actions
@Majavah
Could you please run the
improved
recountCategories script on the Beta Cluster and save the output for review in a Paste?
foreachwiki maintenance/recountCategories --mode all >path/to/logs
@Legoktm
and myself would be interested in knowing how it does before merging the Puppet patch.
Thanks!
Umherirrender
mentioned this in
T300303: Deadlock in WikiPage::updateCategoryCounts query
Jan 27 2022, 9:11 PM
2022-01-27 21:11:21 (UTC+0)
Stashbot
added a comment.
Jan 28 2022, 9:45 PM
2022-01-28 21:45:06 (UTC+0)
Comment Actions
Mentioned in SAL (#wikimedia-releng)
[2022-01-28T21:45:05Z]
T299823#7652496
taavi
added a comment.
Jan 28 2022, 9:46 PM
2022-01-28 21:46:01 (UTC+0)
Comment Actions
In
T299823#7652496
@MarcoAurelio
wrote:
@Majavah
Could you please run the
improved
recountCategories script on the Beta Cluster and save the output for review in a Paste?
P19566
taavi
unsubscribed.
Jan 28 2022, 9:50 PM
2022-01-28 21:50:05 (UTC+0)
gerritbot
added a comment.
Feb 3 2022, 8:35 PM
2022-02-03 20:35:23 (UTC+0)
Comment Actions
Change 756069
merged
by RLazarus:
[operations/puppet@production] mediawiki::maintenance: Run recountCategories.php monthly on all wikis
Stashbot
added a comment.
Feb 3 2022, 8:43 PM
2022-02-03 20:43:10 (UTC+0)
Comment Actions
Mentioned in SAL (#wikimedia-operations)
[2022-02-03T20:43:10Z]
T299823
Maintenance_bot
removed a project:
Patch-For-Review
Feb 3 2022, 9:10 PM
2022-02-03 21:10:17 (UTC+0)
RLazarus
subscribed.
Feb 4 2022, 1:33 AM
2022-02-04 01:33:04 (UTC+0)
Comment Actions
As requested on IRC, here's the log from today's run:
mediawiki_job_recount_categories.txt
2 MB
Legoktm
assigned this task to
MarcoAurelio
Feb 4 2022, 6:15 AM
2022-02-04 06:15:52 (UTC+0)
Legoktm
added a project:
User-notice
Comment Actions
LGTM! Looking at the mysql-aggregated dashboard for those times, I can't really see any major spikes. Looking at enwiki/s1, you can see the increase in rows read, but there are (presumably) normal traffic spikes that are even bigger.
I added a note in tech news:
Closing because I think we're all set!
Legoktm
closed this task as
Resolved
Feb 4 2022, 6:16 AM
2022-02-04 06:16:01 (UTC+0)
Legoktm
reopened this task as
Open
Feb 4 2022, 8:08 AM
2022-02-04 08:08:47 (UTC+0)
Comment Actions
Maybe not,
is a bit weird.
MarcoAurelio
added a comment.
Feb 4 2022, 3:34 PM
2022-02-04 15:34:12 (UTC+0)
Comment Actions
In
T299823#7677609
@Legoktm
wrote:
Maybe not,
is a bit weird.
I've checked the example categories mentioned by Liz in that thread and they show as empty on
Category:Empty categories awaiting deletion
and they have no articles inside either (so they're indeed empty categories correctly displaying as empty right now). Unless I am not reading the thread right, looks like there are no issues now?
As for the cause, could it be some caching issue? or DB master/replica lag?
enwiki had quite a bit of rows to update
only next to commonswiki
AntiCompositeNumber
subscribed.
Feb 4 2022, 8:38 PM
2022-02-04 20:38:40 (UTC+0)
Quiddity
subscribed.
Feb 4 2022, 9:00 PM
2022-02-04 21:00:14 (UTC+0)
Comment Actions
Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.
Quiddity
moved this task from
To Triage
to
In current Tech/News draft
on the
User-notice
board.
Feb 4 2022, 9:00 PM
2022-02-04 21:00:40 (UTC+0)
Legoktm
added a comment.
Feb 4 2022, 9:58 PM
2022-02-04 21:58:10 (UTC+0)
Comment Actions
In
T299823#7684603
@Quiddity
wrote:
Re: Tech News - let me know if any changes are needed, ASAP. I'll be freezing it for translations within ~2 hours.
I think what we have now is fine. On IRC MA and I discussed that while it is possible things are temporarily wrong because of this script (which is still undetermined), overall things are less wrong.
MarcoAurelio
added a comment.
Feb 8 2022, 7:02 PM
2022-02-08 19:02:41 (UTC+0)
Comment Actions
Shall we
ensure: absent
the script while the investigation on the reported issues is ongoing?
Quiddity
moved this task from
In current Tech/News draft
to
Already announced/Archive
on the
User-notice
board.
Feb 10 2022, 8:48 PM
2022-02-10 20:48:54 (UTC+0)
Legoktm
closed this task as
Resolved
Apr 5 2022, 9:22 PM
2022-04-05 21:22:32 (UTC+0)
Comment Actions
I had looked at this back in February or in March and didn't think there was anything actionable left. If people are having issues with category counts being significantly wrong at the beginning of the month (after this script runs), please shout.
Maintenance_bot
edited projects, added
User-notice-archive
; removed
User-notice
Aug 13 2022, 1:09 PM
2022-08-13 13:09:25 (UTC+0)
Restricted Application
added a project:
User-MarcoAurelio
View Herald Transcript
Aug 13 2022, 1:09 PM
2022-08-13 13:09:31 (UTC+0)
MarcoAurelio
moved this task from
unsorted/backlog
to
cabinet
on the
User-MarcoAurelio
board.
Aug 20 2022, 2:38 PM
2022-08-20 14:38:50 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits
US