⚓ T368720 rendering differences (visualdiff testing)
Page Menu
Phabricator
Create Task
Maniphest
T368720
rendering differences (visualdiff testing)
Closed, Resolved
Public
BUG REPORT
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
ihurbain
Authored By
ihurbain
Jun 28 2024, 12:11 PM
2024-06-28 12:11:11 (UTC+0)
Tags
Parsoid
(Missing Functionality)
OKR-Work
(Backlog)
Content-Transform-Team (Work In Progress)
(To Verify)
MW-1.44-notes (1.44.0-wmf.19; 2025-03-04)
Parsoid-Read-Views
(Follow-up)
MW-1.45-notes (1.45.0-wmf.4; 2025-06-03)
Referenced Files
F55974952: Screenshot_2024-06-28-19-18-33-15_3aea4af51f236e4932235fdada7d1643.jpg
Jun 28 2024, 5:22 PM
2024-06-28 17:22:54 (UTC+0)
F55963108: image.png
Jun 28 2024, 12:11 PM
2024-06-28 12:11:12 (UTC+0)
F55963120: image.png
Jun 28 2024, 12:11 PM
2024-06-28 12:11:12 (UTC+0)
Subscribers
ABreault-WMF
Aklapper
Arlolra
cscott
ihurbain
Izno
Description
The following links:
render differently in Parsoid and legacy:
Legacy
Parsoid
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Re-enabled fixed Cite tests
mediawiki/extensions/Cite
master
+13
-12
Bump wikimedia/parsoid to 0.21.0-a23
mediawiki/vendor
master
+524
-298
Process new line tokens after

tags out of order in Remex
mediawiki/services/parsoid
master
+148
-12
Bump wikimedia/parsoid to 0.21.0-a19
mediawiki/vendor
master
+73
-77
Process white space tokens after

tags out of order in Remex
mediawiki/services/parsoid
master
+96
-19
Temporarily disable a couple of parserTests
mediawiki/extensions/Cite
master
+2
-2
Allow to not reconstruct AFE on only-space content
mediawiki/libs/RemexHtml
master
+81
-6
Customize query in gerrit
Related Objects
Search...
Task Graph
Status
Subtype
Assigned
Task
Open
None
T378089
[EPIC] Wikivoyages rollout followup work
Resolved
BUG REPORT
ihurbain
T368720
rendering differences (visualdiff testing)
Event Timeline
ihurbain
created this task.
Jun 28 2024, 12:11 PM
2024-06-28 12:11:11 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
Jun 28 2024, 12:11 PM
2024-06-28 12:11:12 (UTC+0)
Izno
subscribed.
Jun 28 2024, 5:22 PM
2024-06-28 17:22:54 (UTC+0)
Comment Actions
This is a misnested tag - the code tag is an inline tag wrapping multiple paragraphs which are blocks (modulo HTML 5 framing). Linter appropriately identifies it as such:
In other words, the page is already broken and Parsoid happens to make it more obvious.
You should review lint errors on a page to see if that may be the source of a rendering diff.
ihurbain
added a comment.
Edited
Jul 1 2024, 1:51 PM
2024-07-01 13:51:21 (UTC+0)
Comment Actions
On the other hand, legacy fixes that in the output, and Parsoid does weird stuff:
$ echo -e "1\n\n2\n\n3\n\n4" |php ./bin/parse.php

1


2


3


4


So, yeah, the input is not great, but there's still something iffy going on. If anything I'd be happier with a single around all of that (because that would point in the direction of things just not being normalized), but here clearly SOMETHING happens, possibly incompletely.
It also doesn't feel like OBVIOUSLY wrong wikitext (although it does trip the linter), so it's probably worth having a look, regardless of the specific page.
ihurbain
closed this task as
Declined
Jul 1 2024, 4:16 PM
2024-07-01 16:16:41 (UTC+0)
Comment Actions
Okay, fine.
At the point it enters remex, it looks like

1

2

3


and the behaviour is at least consistent with what happens in other places. It's not pretty, but as mentioned, inline formatting elements are not really supposed to span multiple paragraphs. Modifying either remex or what feeds it would have the potential to break A Lot of things (in particular, articles starting with a formatting tag, as is common).
I don't like it, it makes me grumpy, but it's technically bad input, and is linted as such, so I'm going to close this issue as won't solve.
ihurbain
reopened this task as
Open
Jul 2 2024, 2:32 PM
2024-07-02 14:32:12 (UTC+0)
ihurbain
claimed this task.
ihurbain
added a project:
Content-Transform-Team-WIP
ihurbain
moved this task from
Backlog
to
In Progress
on the
Content-Transform-Team-WIP
board.
SLopes-WMF
moved this task from
Needs Triage
to
Missing Functionality
on the
Parsoid
board.
Jul 4 2024, 2:19 PM
2024-07-04 14:19:17 (UTC+0)
ihurbain
added a comment.
Jul 8 2024, 4:21 PM
2024-07-08 16:21:01 (UTC+0)
Comment Actions
There, I spent a significant amount of time hunting down where the discrepancy comes from.
Given wikitext

what enters Remex in the legacy parser is

a

b


and what enters Remex with Parsoid is

a

b


So far, so good.
The problem is that, if these paragraphs are included in a formatting element, we have reconstructAFE kicking in at some point. With the legacy parser, it happens on the string boundary, that is just before b. With Parsoid, it also happens on the string boundary... that is, the two \n\n between the paragraphs. So the is re-opened before the new paragraph opening, and this is where the discrepancies come from.
I've explored a tiny bit around "what if we don't run reconstructAFE on character runs that are just
[\t\n\f\r]+
(because there was a line above checking for just that), but it was messing up the list handling. But, there still might be SOMETHING to explore around that. (right now I'm thinking "maybe I just want to avoid it specifically on
\n\n
", which might be enough because if there are more of them, we get some


tags kicking in).
gerritbot
added a comment.
Jul 8 2024, 4:29 PM
2024-07-08 16:29:43 (UTC+0)
Comment Actions
Change #1052779 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/libs/RemexHtml@master] Suppress AFE reconstruction on \n\n
gerritbot
added a project:
Patch-For-Review
Jul 8 2024, 4:29 PM
2024-07-08 16:29:44 (UTC+0)
ihurbain
added a comment.
Jul 12 2024, 4:48 PM
2024-07-12 16:48:47 (UTC+0)
Comment Actions
I played around a bit with that thing, and I'm arriving at a proposal looking like a variation of
\n\n
(PS2 on
).
I considered a few other paths:
removing the new lines between paragraphs when generating Parsoid HTML. Requires to either modify a large number of tests OR normalize them by replacing new lines by '' rather than ' ', and this feels iffy. We also lose readability in the output, which may or may not be a concern.
removing all AFE on all whitespace runs. This poses problems with wt2wt around lists - we lose end tags (because they're marked as auto-inserted, and none gets re-added at the end).
Not really considered yet: removing AFE on all whitespace, AND find a way to fix wt2wt. Would require more investigation if we were to go that route.
Removing AFE only on
\n\n
. Feels not exceedingly satisfying, because it breaks on patterns like
\n\t\n
, which I would expect may happen in

block if they got copy-pasted from somewhere else
So right now I'm proposing: "removing all AFE on whitespace runs that contain at least 2
\n
". It's not super satisfying, because it feels super hacky. Right now, this only means changing 3 parserTests in parsoid (and I think one in core, I need to check), including 2 that are "unexpected change to known failure output", and one for which the result is actually nicer in my view.
One thing that makes me hesitate is that we have the
Inline HTML vs wiki block nesting
test that currently has
Inline HTML vs wiki block nesting
!! wikitext
Bold paragraph

New wiki paragraph
!! html/php

Bold paragraph

New wiki paragraph


!! html/parsoid

Bold paragraph

New wiki paragraph


!! end
and this would not be consistent with the desired output here, which doesn't have a

tag on the second paragraph. (It is, however, consistent with legacy; and this test is already failing by adding

tags in there, just slightly differently.)
ihurbain
added a comment.
Edited
Jul 12 2024, 4:59 PM
2024-07-12 16:59:06 (UTC+0)
Comment Actions
Mmmh. Easier, probably: exclude all whitespace unless it's a single
\n
. This way we avoid touching lists & tables in "less-corner-case" cases, and it feels less hacky/easier to justify. (and it seems to be passing parserTests as well.)
cscott
moved this task from
In Progress
to
Code Review
on the
Content-Transform-Team-WIP
board.
Jul 15 2024, 3:10 PM
2024-07-15 15:10:16 (UTC+0)
MSantos
moved this task from
Uncategorized
to
Phase 1 - DiscussionTools support
on the
Parsoid-Read-Views
board.
Jul 25 2024, 1:43 PM
2024-07-25 13:43:15 (UTC+0)
MSantos
edited projects, added
Parsoid-Read-Views (Phase 1 - DiscussionTools support)
; removed
Parsoid-Read-Views
cscott
added subscribers:
Arlolra
cscott
Jul 26 2024, 10:08 PM
2024-07-26 22:08:55 (UTC+0)
Comment Actions
@Arlolra
found another instance of this pattern using

in visual diff testing of hewikivoyage:
MSantos
added a project:
OKR-Work
Jul 29 2024, 2:46 PM
2024-07-29 14:46:25 (UTC+0)
MSantos
added a parent task:
T371640: [EPIC] Deploy Parsoid Read Views for all Wikivoyage wikis
Aug 1 2024, 9:25 PM
2024-08-01 21:25:23 (UTC+0)
ihurbain
edited parent tasks, added:
T378089: [EPIC] Wikivoyages rollout followup work
; removed:
T371640: [EPIC] Deploy Parsoid Read Views for all Wikivoyage wikis
Oct 24 2024, 2:20 PM
2024-10-24 14:20:25 (UTC+0)
ihurbain
moved this task from
Code Review
to
Needs Investigation
on the
Content-Transform-Team-WIP
board.
ihurbain
moved this task from
Needs Investigation
to
Code Review
on the
Content-Transform-Team-WIP
board.
Jan 27 2025, 3:37 PM
2025-01-27 15:37:54 (UTC+0)
ABreault-WMF
subscribed.
Jan 28 2025, 9:53 PM
2025-01-28 21:53:32 (UTC+0)
Comment Actions
In
T368720#9961875
@ihurbain
wrote:
what enters Remex in the legacy parser is

a

b


and what enters Remex with Parsoid is

a

b


So far, so good.
Can we change the Parsoid output to be closer to the legacy output. Have one of the newlines inside the paragraph?

a


b


Does that make any difference to the AFE opening? It might mess with roundtripping as the wikitext serializer currently sits.
ihurbain
added a project:
Content-Transform-Team (Work In Progress)
Feb 6 2025, 5:42 PM
2025-02-06 17:42:37 (UTC+0)
ihurbain
moved this task from
Backlog
to
In Progress
on the
Content-Transform-Team (Work In Progress)
board.
ihurbain
removed a project:
Content-Transform-Team-WIP
Feb 7 2025, 2:34 PM
2025-02-07 14:34:37 (UTC+0)
gerritbot
added a comment.
Feb 11 2025, 6:27 PM
2025-02-11 18:27:10 (UTC+0)
Comment Actions
Change #1118845 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/services/parsoid@master] [DNM] playing around with token order before remex
ihurbain
added a comment.
Feb 11 2025, 6:30 PM
2025-02-11 18:30:07 (UTC+0)
Comment Actions
@ABreault-WMF
the above patch feels like an EXCESSIVELY cheeky way of doing that, buuuuuuuut it might not be *completely* preposterous? I kind of want your opinion before looking into cleaning it up/making it less preposterous :)
gerritbot
added a comment.
Feb 24 2025, 9:30 AM
2025-02-24 09:30:20 (UTC+0)
Comment Actions
Change #1052779
abandoned
by Isabelle Hurbain-Palatin:
[mediawiki/libs/RemexHtml@master] Allow to not reconstruct AFE on only-space content
Reason:
I7665fee2b25ceff693427771ed1ef580bf8e2965 looks more promising
gerritbot
added a comment.
Feb 25 2025, 11:34 AM
2025-02-25 11:34:38 (UTC+0)
Comment Actions
Change #1122548 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/extensions/Cite@master] Temporarily disable a couple of parserTests
gerritbot
added a comment.
Feb 25 2025, 6:28 PM
2025-02-25 18:28:30 (UTC+0)
Comment Actions
Change #1122642 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/extensions/Cite@master] Re-enabled fixed Cite tests
ihurbain
moved this task from
In Progress
to
Code Review
on the
Content-Transform-Team (Work In Progress)
board.
Feb 25 2025, 6:29 PM
2025-02-25 18:29:24 (UTC+0)
gerritbot
added a comment.
Feb 27 2025, 11:46 PM
2025-02-27 23:46:25 (UTC+0)
Comment Actions
Change #1122548
merged
by jenkins-bot:
[mediawiki/extensions/Cite@master] Temporarily disable a couple of parserTests
gerritbot
added a comment.
Feb 27 2025, 11:50 PM
2025-02-27 23:50:19 (UTC+0)
Comment Actions
Change #1118845
merged
by jenkins-bot:
[mediawiki/services/parsoid@master] Process white space tokens after

tags out of order in Remex
ReleaseTaggerBot
added a project:
MW-1.44-notes (1.44.0-wmf.19; 2025-03-04)
Feb 28 2025, 12:00 AM
2025-02-28 00:00:47 (UTC+0)
ihurbain
moved this task from
Code Review
to
To Deploy
on the
Content-Transform-Team (Work In Progress)
board.
Feb 28 2025, 8:35 AM
2025-02-28 08:35:38 (UTC+0)
gerritbot
added a comment.
Mar 3 2025, 9:19 PM
2025-03-03 21:19:01 (UTC+0)
Comment Actions
Change #1124186 had a related patch set uploaded (by Arlolra; author: Arlolra):
[mediawiki/vendor@master] Bump wikimedia/parsoid to V0.21.0-a19
gerritbot
added a comment.
Mar 3 2025, 10:30 PM
2025-03-03 22:30:04 (UTC+0)
Comment Actions
Change #1124186
merged
by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a19
ihurbain
moved this task from
To Deploy
to
In Progress
on the
Content-Transform-Team (Work In Progress)
board.
Mar 4 2025, 7:52 AM
2025-03-04 07:52:30 (UTC+0)
gerritbot
added a comment.
Mar 7 2025, 6:07 PM
2025-03-07 18:07:37 (UTC+0)
Comment Actions
Change #1125493 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/services/parsoid@master] Process white space tokens after

tags out of order in Remex
MSantos
edited projects, added
Parsoid-Read-Views
; removed
Parsoid-Read-Views (Phase 1 - DiscussionTools support)
Mar 13 2025, 12:31 PM
2025-03-13 12:31:05 (UTC+0)
MSantos
moved this task from
Uncategorized
to
Follow-up
on the
Parsoid-Read-Views
board.
ihurbain
moved this task from
In Progress
to
Code Review
on the
Content-Transform-Team (Work In Progress)
board.
Mar 20 2025, 2:37 PM
2025-03-20 14:37:18 (UTC+0)
gerritbot
added a comment.
Mar 24 2025, 8:18 PM
2025-03-24 20:18:17 (UTC+0)
Comment Actions
Change #1125493
merged
by jenkins-bot:
[mediawiki/services/parsoid@master] Process new line tokens after

tags out of order in Remex
ihurbain
moved this task from
Code Review
to
To Deploy
on the
Content-Transform-Team (Work In Progress)
board.
Mar 25 2025, 9:09 AM
2025-03-25 09:09:27 (UTC+0)
gerritbot
added a comment.
Mar 31 2025, 6:35 PM
2025-03-31 18:35:47 (UTC+0)
Comment Actions
Change #1132724 had a related patch set uploaded (by Arlolra; author: Arlolra):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a23
gerritbot
added a comment.
Mar 31 2025, 7:10 PM
2025-03-31 19:10:55 (UTC+0)
Comment Actions
Change #1132724
merged
by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a23
ABreault-WMF
moved this task from
To Deploy
to
To Verify
on the
Content-Transform-Team (Work In Progress)
board.
Apr 3 2025, 8:19 PM
2025-04-03 20:19:05 (UTC+0)
ihurbain
closed this task as
Resolved
Apr 10 2025, 9:03 AM
2025-04-10 09:03:16 (UTC+0)
gerritbot
added a comment.
May 27 2025, 2:06 PM
2025-05-27 14:06:38 (UTC+0)
Comment Actions
Change #1122642
merged
by jenkins-bot:
[mediawiki/extensions/Cite@master] Re-enabled fixed Cite tests
Maintenance_bot
removed a project:
Patch-For-Review
May 27 2025, 2:33 PM
2025-05-27 14:33:17 (UTC+0)
ReleaseTaggerBot
added a project:
MW-1.45-notes (1.45.0-wmf.4; 2025-06-03)
May 27 2025, 3:00 PM
2025-05-27 15:00:31 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits