⚓ T336101 Stray __TOC__ added by Parsoid in a 3-day window when 1.41.0-wmf.7 group2 wikis had been rolled back.
Page Menu
Phabricator
Create Task
Maniphest
T336101
Stray __TOC__ added by Parsoid in a 3-day window when 1.41.0-wmf.7 group2 wikis had been rolled back.
Closed, Resolved
Public
BUG REPORT
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
ssastry
Authored By
Tacsipacsi
May 6 2023, 12:12 PM
2023-05-06 12:12:23 (UTC+0)
Tags
DiscussionTools
(Triaged)
Parsoid
(Bugs & Crashers)
Content-Transform-Team-WIP
(To Verify)
Editing-team (Tracking)
(External)
Essential-Work
Parsoid-Read-Views (Phase 1 - DiscussionTools support)
(Backlog)
User-notice-archive
(Backlog)
Referenced Files
None
Subscribers
Aklapper
binbot
Izno
KEMONO_PANTSU_KEMONO_PANTSU_KEMONO_PANTSU_KEMONO_PANTSU
matej_suchanek
matmarex
Quiddity
View All 10 Subscribers
Description
Steps to reproduce
Look at
, an edit done with the reply tool.
Actual result
In addition to the comment, a stray
__TARTALOMJEGYZÉK__
__TOC__
) magic word was added.
Expected result
The only change is the new comment.
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Bump parsoid to 0.18.0-a14
mediawiki/vendor
master
+55
-55
Script to process RC stream for dirty TOC edits
mediawiki/services/parsoid
master
+364
-103
Customize query in gerrit
Related Objects
Mentions
Mentioned In
T339273: Wikitext 2017 editor emptying a page when trying to edit ro:Partidul Comunist Român
Mentioned Here
T330213: 1.41.0-wmf.7 deployment blockers
Event Timeline
Tacsipacsi
created this task.
May 6 2023, 12:12 PM
2023-05-06 12:12:23 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
May 6 2023, 12:12 PM
2023-05-06 12:12:24 (UTC+0)
matmarex
added a project:
Parsoid
May 6 2023, 1:17 PM
2023-05-06 13:17:31 (UTC+0)
Izno
subscribed.
Edited
May 6 2023, 4:19 PM
2023-05-06 16:19:40 (UTC+0)
Comment Actions
I expect
this addition
was also unintentional.
matej_suchanek
subscribed.
May 6 2023, 9:20 PM
2023-05-06 21:20:26 (UTC+0)
Comment Actions
These started to occur on cswiki when using the visual editor (
__OBSAH__
__TOC__
):
Seems to be more specific to
Parsoid
ssastry
subscribed.
May 7 2023, 3:51 AM
2023-05-07 03:51:02 (UTC+0)
Comment Actions
I think caused by
and VE interaction. I am not sure how that is happening since I cannot reproduce it locally, but will investigate -- I imagine it is some specific kinds of interactions causing this.
ssastry
added a comment.
May 7 2023, 4:00 AM
2023-05-07 04:00:39 (UTC+0)
Comment Actions
Oh .. but that patch is part of v0.18.0-a7 in Parsoid which was to go out as part of 1.41.0-wmf.7 ... But, group2 wikis (enwiki, huwiki, cswiki) don't have wmf.7 yet .. So, I am confused!
Did 1.41.0-wmf.7 get rolled out to group2 and then get rolled back?
ssastry
added a comment.
May 7 2023, 4:24 AM
2023-05-07 04:24:41 (UTC+0)
Comment Actions
Indeed, it got rolled back in
T330213#8828193
. So, looks like it was on group2 wikis for about 3 hours in which time new Parsoid HTML ended up in RESTBase for some subset of pages and when those pages were now edited and hit the older version of Parsoid, we dirtied the pages! This is totally my oversight -- we should have recognized this as effectively a minor HTML version change and followed our
established process
At this time, unless this is causing major disruptions, we could wait for the train to be rolled forward to group2 wikis on Monday.
Tacsipacsi
added a comment.
May 7 2023, 8:48 AM
2023-05-07 08:48:07 (UTC+0)
Comment Actions
Could affected pages be collected somehow so that they can be fixed manually or using a bot (depending on the amount)? It’s good to know it will fix itself with the train (I agree it’s not a “major disruption”), but the already-bad pages probably won’t be fixed by future DiscussionTools edits.
ssastry
added a comment.
Edited
May 7 2023, 9:07 AM
2023-05-07 09:07:46 (UTC+0)
Comment Actions
Yes, I've been pondering that question. The affected pages won't be fixed on their own. One obvious solution to write a script to process the RC from the affected time window with the visualeditor or discussiontools tags and look for
TOC
in the diff. But, I am trying to think if there is a simpler solution than that. We'll figure out a strategy this coming week.
Tacsipacsi
added a comment.
May 7 2023, 6:25 PM
2023-05-07 18:25:54 (UTC+0)
Comment Actions
Okay, thanks in advance!
matmarex
subscribed.
May 8 2023, 12:23 PM
2023-05-08 12:23:27 (UTC+0)
MSantos
triaged this task as
High
priority.
May 8 2023, 3:07 PM
2023-05-08 15:07:59 (UTC+0)
MSantos
added a project:
Content-Transform-Team-WIP
MSantos
moved this task from
Backlog
to
In Progress
on the
Content-Transform-Team-WIP
board.
Jgiannelos
moved this task from
Needs Triage
to
Bugs & Crashers
on the
Parsoid
board.
May 11 2023, 2:19 PM
2023-05-11 14:19:09 (UTC+0)
MSantos
assigned this task to
ssastry
May 15 2023, 3:17 PM
2023-05-15 15:17:51 (UTC+0)
ssastry
added a comment.
May 16 2023, 11:42 AM
2023-05-16 11:42:27 (UTC+0)
Comment Actions
I think we should be able to adapt
this existing script
easily to gather this info.
ssastry
added a comment.
May 16 2023, 1:00 PM
2023-05-16 13:00:46 (UTC+0)
Comment Actions
I did a few tweaks and I have it running against enwiki VE edits in the main namespace and it seems to be working, but since this needs to fetch the diff for every tagged edit, it is going to take a while to run through. I might have to run this script on a server somewhere. It also has a few false positives (because there is
TOC
in the diff both before/after -- I could make it smarter but this is just a quick trial test run).
But, from the run so far, it looks there is about one edit every 10 mins that has this
TOC
dirtying which probably would mean several hundreds of pages across all wikis (enwiki probably has the highest edit frequency of all wikis). What is the best way to surface this list of dirtied pages? Add it as a paste on this phab task?
I will have to tweak the script to use the localized name of TOC on a wiki, and then run it with both visualeditor and discussion tools edit filters for the timeframe when the rollback was in place. I will then probably let this run on scandium or some labs server.
matej_suchanek
added a comment.
May 16 2023, 1:55 PM
2023-05-16 13:55:36 (UTC+0)
Comment Actions
Theoretically, you could also narrow the list of pages to scan by only evaluating pages that include
__TOC__
now and were last edited recently. Something like:
or
. (But only if the time saved is not exceeded by time spent on implementing this.)
Thibaut120094
subscribed.
May 19 2023, 12:33 PM
2023-05-19 12:33:49 (UTC+0)
gerritbot
added a comment.
May 21 2023, 11:17 AM
2023-05-21 11:17:12 (UTC+0)
Comment Actions
Change 920285 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Prototype script to process RC stream for dirty TOC edits
gerritbot
added a project:
Patch-For-Review
May 21 2023, 11:17 AM
2023-05-21 11:17:13 (UTC+0)
MSantos
moved this task from
In Progress
to
Code Review
on the
Content-Transform-Team-WIP
board.
May 22 2023, 3:19 PM
2023-05-22 15:19:23 (UTC+0)
ppelberg
moved this task from
Backlog
to
Triaged
on the
DiscussionTools
board.
Jun 1 2023, 3:28 PM
2023-06-01 15:28:40 (UTC+0)
ppelberg
added a project:
Editing-team (Tracking)
ppelberg
moved this task from
Backlog
to
External
on the
Editing-team (Tracking)
board.
Tacsipacsi
added a comment.
Jun 3 2023, 7:33 AM
2023-06-03 07:33:00 (UTC+0)
Comment Actions
Any progress on this? It’s been four weeks since I reported this issue, and as time goes, the issue becomes more sever and harder to fix:
While it caused no visual changes back then, it may well have caused some in the meantime: if a new first section was added, the TOC may now be before the second section, if the first section was removed, it may no be in the middle of the text, and if there were more than three sections at the time but there are at most three currently, the TOC may appear unnecessarily (this latter is probably most likely on automatically archived talk pages).
With more new edits, the likelihood of edit conflicts grows, making it impossible to revert by bot.
ssastry
added a comment.
Jun 3 2023, 9:10 AM
2023-06-03 09:10:18 (UTC+0)
Comment Actions
My apologies - I was waiting for reviews of my patch, but I can instead run this without review and tweak it later if necessary. I'll try to get these results early next week.
ssastry
added a comment.
Jun 3 2023, 1:18 PM
2023-06-03 13:18:05 (UTC+0)
Comment Actions
Alright, I updated the script one final time to make sure I run it on all non-closed non-private wikis (~750+) and kicked it off on the parsing-qa-02 VM. I expect I'll have results in 24 hours or less.
ssastry
added a comment.
Jun 3 2023, 3:55 PM
2023-06-03 15:55:15 (UTC+0)
Comment Actions
The timestamps I picked for may 5 may not be the right now since I am seeing some edits from before the start timestamp. Anyway, once this run completes, I can rerun the script with a different timestamp range and merge the results.
But, FWIW, this run found about 70 diffs across all 750+ wikis caused by the use of the reply tool (many of which are also false positives because the talk page has reports of the stray toc additions which then gets picked up the script as a dirty diff).
The run with visualeditor tag is still going on and I expect that to find a much larger number of dirty diffs.
ssastry
added a comment.
Edited
Jun 3 2023, 4:31 PM
2023-06-03 16:31:57 (UTC+0)
Comment Actions
In
T336101#8900362
@ssastry
wrote:
The timestamps I picked for may 5 may not be the right now since I am seeing some edits from before the start timestamp. Anyway, once this run completes, I can rerun the script with a different timestamp range and merge the results.
This current run is from 2023-05-05T18:25:20Z ...
2023-05-08T14:40:02Z
. But that was the wrong start timestamp. Once this run completes, will rerun script for range
2023-05-04T20:51:13Z
-- 2023-05-05T18:25:20Z and that will pretty much do it.
Tacsipacsi
added a comment.
Jun 3 2023, 4:45 PM
2023-06-03 16:45:21 (UTC+0)
Comment Actions
In
T336101#8900210
@ssastry
wrote:
My apologies - I was waiting for reviews of my patch, but I can instead run this without review and tweak it later if necessary. I'll try to get these results early next week.
Okay, thanks!
ssastry
added a comment.
Edited
Jun 4 2023, 4:14 AM
2023-06-04 04:14:23 (UTC+0)
Comment Actions
Looks like based on the results so far (> 100K edits examined), about 0.8% of them have been dirtied across all wikis. enwiki (which has completed) has about 333 dirty diffs and is the highest so far.
All discusstiontools edits got fully processed for the affected timeframe. Since RC logs for 4th May cleared before the second script completed for the visualeditor tag , RC entries ofr about 3 hours on May 4th are missing for a bunch of wikis. I will need to query the db to find relevant edits for everything after frwiki. I will look into it on Monday.
What is the best way to share these results?
ssastry
renamed this task from
Stray __TOC__ added by the reply tool
to
Stray __TOC__ added by Parsoid in a 3-day window when 1.41.0-wmf.7 group2 wikis had been rolled back.
Jun 4 2023, 4:17 AM
2023-06-04 04:17:53 (UTC+0)
ssastry
added a comment.
Jun 4 2023, 1:59 PM
2023-06-04 13:59:51 (UTC+0)
Comment Actions
Okay, the script runs are complete. Some stats.
~200K edits were examined and about 1600 dirty diffs were found across all wikis (~100 are false positive reports on talk pages because of
TOC
matches later on the page). frwiki has 589 dirty diffs and enwiki has 333. All other wikis have fewer than 100. All but 13 wikis have fewer than 10 dirty diffs.
As reported above, on a few wikipedias (alphabetically everything after frwiki), there is missing data for about 3 hours on May 4 because I didn't get the script run completed before the RC logs cleared entries older than 30 days. But, looking at the stats as I have not (which I am including below), I there a few 10s of diffs (across multiple wikis) might have been missed across these wikis for 3 hours over a 3 day period (3 hrs over 3 day s= 1/25, 1/25 of 1500 = 60, but in reality it is going to be smaller because almost 1000 of those 1500 are from wikis with complete data). So, I am not yet sure it is worth putting in additional effort trying to identify them all precisely. Input welcome.
589 frwiki
333 enwiki
98 ukwiki
85 zhwiki
69 dewiki
66 eswiki
52 ruwiki
35 trwiki
33 plwiki
26 idwiki
21 jawiki
14 fawiki
14 cswiki
9 rowiki
9 arwiki
7 elwiki
6 viwiki
6 euwiki
5 svwiki
5 skwiki
5 nlwiki
4 srwiki
4 ptwiki
4 nowiki
4 hifwiki
4 fiwiki
3 thwiki
3 tewiki
3 huwiki
3 etwiki
2 sowiki
2 quwiki
2 mtwiki
2 ltwiki
2 lbwiki
2 kowiki
2 hywiki
2 bnwiki
2 bgwiki
1 vowiki
1 uzwiki
1 tawiki
1 slwiki
1 simplewiki
1 mkwiki
1 extwiki
1 dagwiki
1 azwiki
So, the last thing remaining is dumping this list of diffs somewhere and figure out how to go about fixing them. enwiki and frwiki and few others will definitely benefit from some bot help. But, the rest could probably be tackled by manual edits -- I am happy to take on some of this myself.
Izno
added a comment.
Jun 4 2023, 3:04 PM
2023-06-04 15:04:19 (UTC+0)
Comment Actions
Dumping the pages and/or specific diff links into a phab paste and then advertising in
User-notice
should work to get eyes, people can cross post from there for e.g. en.wp. Or paste on a wiki of your choice e.g. mw wiki.
ssastry
added a comment.
Jun 5 2023, 4:07 AM
2023-06-05 04:07:00 (UTC+0)
Comment Actions
Good idea. I created
.. which should also make it easy for editors to fix the page. Hopefully editors can strike out the entry there after fixing it.
ssastry
added a project:
User-notice
Jun 5 2023, 5:38 AM
2023-06-05 05:38:53 (UTC+0)
binbot
subscribed.
Jun 6 2023, 7:01 AM
2023-06-06 07:01:54 (UTC+0)
Comment Actions
In
T336101#8831670
@ssastry
wrote:
I think caused by
and VE interaction. I am not sure how that is happening since I cannot reproduce it locally, but will investigate -- I imagine it is some specific kinds of interactions causing this.
I met this ticket just now. I am the author of the above edit. The edit was done under Monobook with the oldest working interface you can imagine. :-)
binbot
awarded a token.
Jun 6 2023, 7:02 AM
2023-06-06 07:02:42 (UTC+0)
ssastry
added a comment.
Jun 6 2023, 5:33 PM
2023-06-06 17:33:28 (UTC+0)
Comment Actions
Looks like
a huwiki editor didn't like
me fixing the dirty diff on their user page!
binbot
added a comment.
Jun 6 2023, 7:00 PM
2023-06-06 19:00:56 (UTC+0)
Comment Actions
Not so surprising. :-) But this is not a talk page, and the answering tool
does not work here. The magic word may have been put there on purpose.
ssastry ezt írta (időpont: 2023. jún.
6., K, 19:33):
ssastry added a comment. View Task
Looks like a huwiki editor didn't like
me fixing the dirty diff on their user page!
*TASK DETAIL*
*EMAIL PREFERENCES*
*To: *ssastry
*Cc: *binbot, Thibaut120094, matmarex, ssastry, matej_suchanek, Izno,
Aklapper, Tacsipacsi, Isabelladantes1983, Themindcoder, Adamm71, Jersione,
Hellket777, LisafBia6531, SLopes-WMF, 786, Biggs657, ihurbain, Bebiezaza,
EhsanKhandowa, Juan90264, Alter-paule, Beast1978, Un1tY, DAlangi_WMF,
Hook696, PatsagornY, Kent7301, joker88john, Viztor, CucyNoiD, Gaboe420,
Patriccck, Amorymeltzer, Giuliamocci, Cpaulf30, Af420, Bsandipan,
Lewizho99, JJMC89, Maathavan, Neuronton, Luke081515, Jrf, Dinoguy1000,
Arlolra, TheDJ, Jay8g
ssastry
added a comment.
Jun 7 2023, 7:15 AM
2023-06-07 07:15:16 (UTC+0)
Comment Actions
In
T336101#8907272
@binbot
wrote:
Not so surprising. :-) But this is not a talk page, and the answering tool
does not work here. The magic word may have been put there on purpose.
The dirty diff was caused by the use of visualeditor, not the reply tool. In any case, if the editor is fine with it, I don't have a problem. :)
gerritbot
added a comment.
Jun 7 2023, 5:02 PM
2023-06-07 17:02:02 (UTC+0)
Comment Actions
Change 920285
merged
by jenkins-bot:
[mediawiki/services/parsoid@master] Script to process RC stream for dirty TOC edits
Maintenance_bot
removed a project:
Patch-For-Review
Jun 7 2023, 5:10 PM
2023-06-07 17:10:34 (UTC+0)
Quiddity
subscribed.
Jun 7 2023, 6:57 PM
2023-06-07 18:57:54 (UTC+0)
Comment Actions
Re: Tech News - What wording would you suggest as the content? My best guess is something like this (improvements/tweaks appreciated!):
For a few hours last month, some pages edited with VisualEditor or DiscussionTools had an unintended
__TOC__
(or its localized form) added during an edit. There is
a listing of affected pages sorted by wiki
, that may still need to be fixed.
KEMONO_PANTSU_KEMONO_PANTSU_KEMONO_PANTSU_KEMONO_PANTSU
subscribed.
Jun 7 2023, 6:58 PM
2023-06-07 18:58:49 (UTC+0)
This comment was removed by
taavi
taavi
subscribed.
Jun 7 2023, 7:08 PM
2023-06-07 19:08:04 (UTC+0)
taavi
unsubscribed.
ssastry
added a comment.
Jun 8 2023, 3:41 AM
2023-06-08 03:41:43 (UTC+0)
Comment Actions
In
T336101#8911067
@Quiddity
wrote:
Re: Tech News - What wording would you suggest as the content? My best guess is something like this (improvements/tweaks appreciated!):
For a few hours last month, some pages edited with VisualEditor or DiscussionTools had an unintended
__TOC__
(or its localized form) added during an edit. There is
a listing of affected pages sorted by wiki
, that may still need to be fixed.
Thanks! The only change I would recommend is: to change "For a few hours" to "For 3 days" and also mention that this only impacted group2 wikis, so mostly wikipedias, not other wikis.
Quiddity
moved this task from
To Triage
to
In current Tech/News draft
on the
User-notice
board.
Jun 8 2023, 5:16 PM
2023-06-08 17:16:11 (UTC+0)
gerritbot
added a comment.
Jun 12 2023, 8:40 AM
2023-06-12 08:40:11 (UTC+0)
Comment Actions
Change 929160 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/vendor@master] Bump parsoid to 0.18.0-a14
gerritbot
added a project:
Patch-For-Review
Jun 12 2023, 8:40 AM
2023-06-12 08:40:12 (UTC+0)
gerritbot
added a comment.
Jun 12 2023, 10:24 AM
2023-06-12 10:24:42 (UTC+0)
Comment Actions
Change 929160
merged
by jenkins-bot:
[mediawiki/vendor@master] Bump parsoid to 0.18.0-a14
Maintenance_bot
removed a project:
Patch-For-Review
Jun 12 2023, 10:29 AM
2023-06-12 10:29:52 (UTC+0)
Trizek-WMF
moved this task from
In current Tech/News draft
to
Already announced/Archive
on the
User-notice
board.
Jun 15 2023, 10:22 AM
2023-06-15 10:22:00 (UTC+0)
Johannnes89
mentioned this in
T339273: Wikitext 2017 editor emptying a page when trying to edit ro:Partidul Comunist Român
Jun 15 2023, 5:31 PM
2023-06-15 17:31:47 (UTC+0)
MSantos
added projects:
Unplanned-Sprint-Work
Parsoid-Read-Views
Jun 19 2023, 11:23 AM
2023-06-19 11:23:33 (UTC+0)
MSantos
edited projects, added
Essential-Work
; removed
Unplanned-Sprint-Work
MSantos
moved this task from
Code Review
to
To Verify
on the
Content-Transform-Team-WIP
board.
Jun 20 2023, 2:07 PM
2023-06-20 14:07:01 (UTC+0)
MSantos
moved this task from
Uncategorized
to
Phase 1 - DiscussionTools support
on the
Parsoid-Read-Views
board.
Jun 22 2023, 2:16 PM
2023-06-22 14:16:10 (UTC+0)
MSantos
edited projects, added
Parsoid-Read-Views (Phase 1 - DiscussionTools support)
; removed
Parsoid-Read-Views
ssastry
closed this task as
Resolved
Jul 17 2023, 9:53 PM
2023-07-17 21:53:41 (UTC+0)
Comment Actions
I am going to close this task. The wiki page with affected pages has helped fix the majority of pages. I'll probably go in and fix any remaining pages over this week.
Maintenance_bot
edited projects, added
User-notice-archive
; removed
User-notice
Jul 27 2023, 10:30 PM
2023-07-27 22:30:27 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits