This page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made.

@SSastry (WMF) I have flicked through the test pages and tried to identify if there were tests on the Wikisource Page: and Index: namespaces as align with mw:Extension:ProofreadPage . Browsing the list was too hard and tedious to try and determine and I could see no means to narrow tests to a wiki or a sister of wikis. As these two namespace are critical work areas for the WSes, are you able to point to such tests or indicate that there has been some level of testing and success? Thanks. — billinghurst sDrewth 01:30, 4 October 2016 (UTC)Reply

Yes, the UI needs an update to display results per-wiki. But, we've tested against enwikisource, itwikisource, frwikisource, and eswikisource ... about 200 pages each. However, all test pages are from the article namespace. SSastry (WMF) (talk) 03:53, 4 October 2016 (UTC)Reply
You can look at http://mw-expt-tests.wmflabs.org/visualdiff/pngs/enwikisource/ for the list of pngs for enwikisource. Look for pngs with ".diff.png" -- those show the rendering diffs for that title between Tidy and its replacement. Similarly for the other 3 wikis. SSastry (WMF) (talk) 03:58, 4 October 2016 (UTC)Reply
"Page:" and "Index:" namespaces are important content namespaces at Wikisource. I believe that there's also a Wikipedia with a "List:" namespace. Can the tests be expanded (within reason) to run a few other namespaces? Whatamidoing (WMF) (talk) 17:16, 4 October 2016 (UTC)Reply
For a bunch of technical reasons, this is going to be somewhat non-trivial. It involves (a) generating a random list of titles (b) generating a full list of titles (including templates) that need to be exported to create a dump (c) generating a dump (d) importing the dumps into two VMs (e) updating the test database with the new titles from (f) rerunning tests
At this time, we have a document set of steps and scripts and process to get this all done. But, it involves a bunch of co-ordinating with different people to run scripts on different servers, etc. It can be done and we'll dicsuss this.
But, if we are going to be doing this, it is good to get recommendations for any other things that need coverage beyond what we've tried to do already so we don't have to repeat this process again. Can you add a section to the replacing_tidy page with a list of other namespaces / projects / languages / wikis that should be added to the test set?
Another constraint: It takes about 2 days to get each test run completed and to weed through failures and retry failed tests. So, we don't want to do a huge jump beyond the 64K titles we are already testing. But, we can take appropriate sized samples based on the size of the wikis (if we need to include any new ones). SSastry (WMF) (talk) 17:26, 4 October 2016 (UTC)Reply
At this time, both Tim and I are not aware of reasons why some of these namespaces will need specialized testing with respect to Tidy. So, we think it is best for the ParserMigration deployment this quarter when we can readily test any page anywhere. We can then evaluate if we need any additional automated testing. SSastry (WMF) (talk) 15:22, 5 October 2016 (UTC)Reply
While we have clean-up processes identified, has there been consideration in how to prevent the breaking tags being added into the future? Similarly, for the required cleanup, it would seem that as usual the biggest of the big wikis will be able to manage and monitor their cleanup and the smaller wikis with less technical knowledge are going to need assistance and guidance.
It would be worthwhile
  1. having a simple list of each of the "bad" tags
  2. publishing usable regexp searches, and some of the regexp cleanup scripts, so that the wikis can self-serve on the cleanup
  3. designing abuse filter(s) that can capture the new addition of such tags, this can be loaded as a meta global filter to protect small/medium wikis and shareable to the big wikis — billinghurst sDrewth 21:25, 5 October 2016 (UTC)Reply
I've seen a couple of requests for regexp searches, and I believe that easy-to-use scripts would be welcome.
Do you think that you could add some suggestions for insource: searches to this page, especially for anything that might not be easily fixed via bot? Whatamidoing (WMF) (talk) 08:22, 11 October 2016 (UTC)Reply
I added a few search strings here: https://www.mediawiki.org/w/index.php?title=Parsing/Replacing_Tidy&diff=2359556&oldid=2304753
If someone who actually understands this could double-check them (and extend and correct as necessary), I'd really appreciate it. Whatamidoing (WMF) (talk) 20:01, 16 January 2017 (UTC)Reply
I added a long list of varieties copied from the script that I have been using to fix errors on en.WP. Jonesey95 (talk) 00:47, 17 January 2017 (UTC)Reply
There's a danger of over-engineering this.
How does the current system stop people from breaking the page? It does it via humans — if you edit the page and it's then broken, you go back and fix it. If you currently write <di v> and find that it doesn't work, you go remove the space. The communities of each wiki don't seem to feel the need to add custom abuse filters for that kind of problem now — at least, I'm not aware of such.
With a good set of updates to documentation Tech/News notes, and the like, a long enough cut-over process, and judicious use of bots to find and fix particularly common issues, we might not necessarily need it for this change. Jdforrester (WMF) (talk) 16:15, 6 October 2016 (UTC)Reply
I think steps should go in this order:
  • create WikiMedia-level maintenance categories for the relevant errors,
  • find a way to get the categories to populate quickly (the self-closed tag category is still filling, months after its creation)
  • have bots and editors fix all of the existing problems,
  • watch the error categories to see how new errors are being added,
  • fix any tools and scripts that are programmed to add new broken code,
  • evaluate next options based on how bad the problem is (bot-delivered warnings to editors about errors they have made, continue to have bots follow editors and fix problems, implement edit filters for things that will truly break pages) Jonesey95 (talk) 18:41, 6 October 2016 (UTC)Reply
Maybe it would be better to find a system that didn't rely upon maintenance categories. Whatamidoing (WMF) (talk) 08:18, 11 October 2016 (UTC)Reply
Well, Tidy is a system that doesn't rely on maintenance categories, but it is apparently going away, and it's a workaround, for sure.
CheckWiki is a system that doesn't rely on maintenance categories, but it amounts to the same thing: the database is scanned for errors, and then lists of pages with errors are compiled.
Big red Javascript error messages don't rely on error categories, but they do not provide a way for gnomes to locate articles with errors so that they can be fixed, and they are likely to confuse regular editors.
It may be that maintenance categories, like democracy, are the worst form of error correction except for all those other forms that have been tried. (with apologies to Sir Winston Churchill) Jonesey95 (talk) 15:55, 11 October 2016 (UTC)Reply
The CheckWiki lists are available to gnomes. A ready-made list of regexp searches, or AWB-like scripts, would also be available to gnomes. So that's two options. Whatamidoing (WMF) (talk) 17:40, 14 October 2016 (UTC)Reply
On the Parsoid end, we are beginning to finish up the first prototype of a linting tool (finally, after the first half was done couple years back). But, this tool will allow errors like these to be tagged and added to the database with precise source locations which existing tools could leverage. For details, see https://phabricator.wikimedia.org/T48705 ... but, this might hopefully be an alternative to maintenance categories in the longer term. SSastry (WMF) (talk) 23:35, 19 October 2016 (UTC)Reply
Reviving this old thread...
Apparently, the Italian Wikipedia has AbuseFilter 423, which totally prevents publishing if any tags that are not supported by Remex are used.
I'm concerned about this.
Doing this with an AbuseFilter feels wrong in general. AbuseFilters are local to every wiki. The Tidy migration project, on the other hand, uses the same technology for all the wikis. Hence, the blocking or the warnings about deprecated tags should be uniform as well.
If anybody really wants AbuseFilters, I'd recommend not defining them as blocking, but as warnings and tags. At least for now that the dust hasn't completely settled on the transition. Such edits are usually done by well-meaning people, who don't necessarily understand error messages about self closing tags. People usually don't insert such tags intentionally and maliciously, but because they use some template, or because they are copying from some place. It's better to track the problematic edits as they happen and explore the reasons for them and not to block people from editing with cryptic error messages. Amir E. Aharoni {{🌎🌍🌏}} 17:01, 21 January 2018 (UTC)Reply
Do I understand correctly, 0 hits in 9 511 edits? Elitre (WMF) (talk) 17:49, 22 January 2018 (UTC)Reply
Yeah, it looks a bit strange to me as well, but it did happen at least once, otherwise I wouldn't even know about it. See Talk:Content translation/2018/01#h-bug_sulla_traduzione-2018-01-21T13:15:00.000Z. Amir E. Aharoni {{🌎🌍🌏}} 18:14, 22 January 2018 (UTC)Reply
It now says 2 out of 3 095 . I am not entirely sure if the figures are right, but it seems to me like it's triggered so rarely that the fact that the community chose to block the action rather than just warn, in this case, isn't the end of the world, until a proper solution is developed by Subbu's team? I mean, we can still discuss with them if you think that's necessary. Elitre (WMF) (talk) 18:29, 22 January 2018 (UTC)Reply
Yeah, I'd prefer a non-blocking filter. It makes it easier to actually examine and fix the issues. If it happens rarely, it's a good reason not to block. Amir E. Aharoni {{🌎🌍🌏}} 22:04, 22 January 2018 (UTC)Reply
I brought this up on the village pump subpage where the fixes are being discussed. HTH Elitre (WMF) (talk) 23:51, 22 January 2018 (UTC)Reply
Since I'm the author of the filter, I gave some kind of explanation on it.wiki, that I'm gonna repeat here.
First, let's talk about numbers: the filter is up since late september, and it has about 1000 hits right now. Of course that's not much, but we can't let it go, since hits are not that rare.
I completely agree about users hitting it in good faith, though this doesn't mean much. In most of the cases, people add those tag when translating from another wiki where such tags are still used. We don't have any more obsolete tag on it.wiki, meaning they come from user intention or from outside. So, a really working solution would be to clean out those wikis as well, but that's a huge amount of job: it would be 16e+6 tags only on en.wiki.
About users not understanding the error message: that's definitely true. However it's also hard to write down an error message being both short and clear, even for those who don't know HTML at all. Since the filter is active, I tried asking feedbacks to people who hit it more than 5 times in a row, though I almost received 0 feedbacks, which made even more difficult to make it better. Moreover, we should also consider the fact that, while using CX, people don't get the right warning; instead, they're given raw code as pasted by Codas in the other topic, which would make it difficult to understand to almost everyone without good skills.
Finally, that filter was actually a warning-only one for about two months. The reason I made it block the action is that almost no-one fixed their edits without saving, which wasn't good as well.
So, no problem in making it warning-only, especially if looking forward to a built-in solution, though there isn't a true way to solve this problem right now. Daimona Eaytoy (talk) 19:12, 23 January 2018 (UTC)Reply
I wasn't aware of it until I started this discussion. If it becomes warning-only, I'll be happy to help investigate the issues, and get them properly reported or fixed. It's not a problem for me that it is in Italian. I have experience with doing something very similar around problematic markup in the Hebrew Wikipedia.
If you make it warning-only and reach out to me, I promise nice surprises. Amir E. Aharoni {{🌎🌍🌏}} 23:42, 23 January 2018 (UTC)Reply
No problem in making it warning-only, I'll do it right now. Though, I'm not that confident that we could get to a proper solution without fixing the other main wikis. Anyway, let's see how it goes, there might be good results anyway. Daimona Eaytoy (talk) 11:20, 24 January 2018 (UTC)Reply
(User:Elitre (WMF), User:Whatamidoing (WMF), User:SSastry (WMF) - attracting your attention just in case you haven't seen it...) Amir E. Aharoni {{🌎🌍🌏}} 17:52, 21 January 2018 (UTC)Reply
There are two possible strategies here, one of which is applicable to the immediate task of Tidy replacement, and the other is more general and applicable to linting in general.
As far as Tidy replacement is concerned, once Remex replaces Tidy, the rendering breakage offers immediate clear feedback about broken markup - I don't think abuse filters or other such tooling is required. So, it doesn't make sense to deploy something that is only applicable for the interim period between now and when Remex is deployed.
However, more generally, it does make sense to close the linting loop to add a pre-save linting ability that lints the page and displays any issues on the page -- however, this needs some thought and design work to figure out what best makes sense. Do we display this to all users? Or to some subset of users (however that is determined) more likely to be able to deal with the notices / warnings? Do we lint only the edited portions of the page or the whole page? But, linting only edited portions of the page is difficult to get right because wikitext ... But, displaying lint errors on the entire page can also be overwhelming. But, yes, we definitely need to close the loop and add a pre-save linting feature. We are aware of this need, and it has been requested a few times now. We haven't been able to get around to it in the middle of everything else we are already working on. But, it will happen.
Not sure if this addresses your question / concerns. Let me know. SSastry (WMF) (talk) 20:24, 21 January 2018 (UTC)Reply
That's OK, that's the line of thought I was pointing at.
Ideally, I'd love to see a live comment while editing, similar to what an editor seen in Visual Editor when writing wiki syntax such as [[.
If not, it can be a warning similar to what is shown when a user is trying to save without writing an edit summary.
Finding a list of pages that have lint errors is already possible using Special:LintErrors. Amir E. Aharoni {{🌎🌍🌏}} 17:35, 22 January 2018 (UTC)Reply

What happens on pages with <pre/> and similar typos in tags that are not exactly HTML but resemble it? They do not show up in the self-closed HTML tag error category, but they should be tracked somehow, and they should be tested as part of this migration. There are examples on en.WP. Jonesey95 (talk) 23:50, 9 October 2016 (UTC)Reply

Non-HTML self-closing tags shouldn't be affected by this change. SSastry (WMF) (talk) 23:25, 19 October 2016 (UTC)Reply

Сould you enable on the wikitextexp.wmflabs.org :

Regarding wikidata, since we are comparing output between identical configurations, the error itself shouldn't matter unless we expect Tidy to render the extension output differently. As for the others, I'll take a look before we do another round of testing in the coming week or so. SSastry (WMF) (talk) 23:23, 19 October 2016 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

@Elitre (WMF): Could you please mark this page for translation, so it will be possible to translate it via the translation tool? Guycn2 (talk) 17:33, 14 November 2016 (UTC)Reply

Thanks for your thought, but these pages are nowhere near to being finalized yet, so it'd be a waste of time on everybody's side for the time being. Elitre (WMF) (talk) 17:37, 14 November 2016 (UTC)Reply
I don't even see a couple of paragraphs on the basic concept that could be worth translating right now. In fact, the specific examples are more likely to be stable than the overall explanations. Whatamidoing (WMF) (talk) 18:22, 14 November 2016 (UTC)Reply
Ok... Guycn2 (talk) 17:26, 18 November 2016 (UTC)Reply

I was wondering if we should start listing what existing tools can do to help fixing the wikitext that needs to be fixed.

I have developed a few things in WPCleaner to help detecting and fixing some errors (like self closing tags) : should I list them in the page ? NicoV (talk) 12:52, 3 February 2017 (UTC)Reply

Yes, that will be very helpful. Thanks! SSastry (WMF) (talk) 12:55, 3 February 2017 (UTC)Reply
Ok, I have written a first short description. Is it ok ? Feel free to expand and modify... NicoV (talk) 13:05, 3 February 2017 (UTC)Reply
Thanks! Good for now. We might reorganize / rearrange to highlight these fixup options as we get further along. SSastry (WMF) (talk) 13:10, 3 February 2017 (UTC)Reply
I had meant for all information relevant to the end user to go to Parsing/Replacing Tidy/FAQ. However, we can fix later :) Elitre (WMF) (talk) 14:13, 3 February 2017 (UTC)Reply
Feel free to reorganize it in a way that seems most useful. :-) SSastry (WMF) (talk) 15:27, 3 February 2017 (UTC)Reply
Yes, same for me : if it's better in the FAQ, feel free to move the information there. I will add more later NicoV (talk) 15:43, 3 February 2017 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


I know I keep harping on this, but now that Tech News has announced that Tidy will be going away in 2017, the "Pages using invalid self-closed HTML tags" categories really do need to be fully populated on all wikis. The category was added to MediaWiki in July 2016, and it has still not fully populated. For example, on te.wikisource.org, there is only one page, "మూస:Transform-rotate", in the error category at this writing, but that page doesn't even have a problem – the problem is in a page that is transcluded on that page. That transcluded page is not yet in the error category (after seven months). This is T132467.

How can we get an accurate list of all pages that need to be fixed?

See also T106685 (insource searches don't work right), which is another bug that makes it more difficult to migrate away from Tidy. Let me know how I can help. Jonesey95 (talk) 02:47, 7 February 2017 (UTC)Reply

You can do the same as I did - take a month (for enwiki it will be a decade), and run nulledit bot on all pages. IKhitron (talk) 13:51, 7 February 2017 (UTC)Reply
Get a null-edit bot approved and running on 868 different wikis? That's outside of my scope.
If WMF wants to retire Tidy, WMF needs to null-edit all pages on all wikis or otherwise fix the conditions that prevent the categories from being fully populated. If that does not happen, it seems to me that Tidy's retirement will not be able to occur. Jonesey95 (talk) 17:17, 7 February 2017 (UTC)Reply
Yap. And 902 wikis. IKhitron (talk) 17:18, 7 February 2017 (UTC)Reply
@Jonesey95, we discussed this recently and @Legoktm updated T132467#3004685. We'll track this. But, note that we have backward compatibility fixes in the parser for self-closed tags, so, it is not catastrophic to not be able to fix all those self-closing tags before Tidy is removed. We'll eventually remove the compatibility fix once we are sure it is tackled. Meanwhile, we are also looking at how to ensure that pages are refreshed in a fixed time frame (as discussed in that phab task link above). SSastry (WMF) (talk) 10:07, 9 February 2017 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


The new dashboard report is a useful summary, but it would also be helpful to have a periodically refreshed report that lists all templates with errors in every installation, all on one page. That would make it easier for us to fix the templates first, which would in turn give us a better sense of how many individual pages need to be fixed at each MW installation. Jonesey95 (talk) 06:48, 7 February 2017 (UTC)Reply

Do you think just having a separate count of every page in the template namespace that is in the category would work? Sometimes it depends on how the parameters are being used that can generate the errors, but that's a bit tricky to determine automatically. Legoktm (talk) 05:58, 12 February 2017 (UTC)Reply
A count would not be helpful, but a list would be. Jonesey95 (talk) 16:07, 12 February 2017 (UTC)Reply
OK, I added a template listing report. It's running into some unicode error right now that I'll fix tonight, in the meantime stuff like https://tools.wmflabs.org/wikitext-deprecation/self-closing/testwiki works. Legoktm (talk) 21:46, 15 February 2017 (UTC)Reply
Unicode errors now fixed https://tools.wmflabs.org/wikitext-deprecation/self-closing/fawiki and others work now. Legoktm (talk) 04:37, 16 February 2017 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


The dashboard report shows zero pages in en.WP, but there are 23 pages with errors. They are in a subcategory. The report may need to traverse subcategories. Jonesey95 (talk) 00:35, 11 February 2017 (UTC)Reply

I intentionally didn't traverse subcategories, because if "Category:Foo" has a improperly closed tag in it, then it will appear as a subcategory to the main one, but the articles inside that subcategory are totally fine. Legoktm (talk) 05:54, 12 February 2017 (UTC)Reply
Can you parse the system message with category names, retrieving only relevant subcategories, Legoktm? IKhitron (talk) 08:50, 12 February 2017 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Of the 16,000+ pages in the self-closed HTML tag category at ms.wikipedia.org, nearly all of them are in the "Perbincangan pengguna" namespace, and they all have the same problem. They need <center>{{Panduan interaktif}}<center/> changed to <center>{{Panduan interaktif}}</center>. Note the position of the slash in the final "center" tag. Jonesey95 (talk) 02:39, 11 February 2017 (UTC)Reply

All of them as plain text??? They don't know about templates? IKhitron (talk) 22:03, 11 February 2017 (UTC)Reply
It was probably substed, which is normal for content like this in case the template is later deleted or modified. Jonesey95 (talk) 00:15, 12 February 2017 (UTC)Reply
That's the User talk: namespace. Whatamidoing (WMF) (talk) 22:49, 15 February 2017 (UTC)Reply


Sorry if this means you get irrelevant pings, and thanks for your understanding!

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


Per Subbu: "Because of some performance-related fixes, Linter is now reporting estimated error counts instead of actual counts. Therefore, wikis might notice an artificial increase in error counts. Once we figure out a better solution to the performance problem, this will be fixed. [...] this only seems to affect wikis that have linter categories with large error counts." Keep an eye on https://phabricator.wikimedia.org/T184280: we appreciate your understanding. Elitre (WMF) (talk) 18:15, 11 February 2018 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


I just found this page about replacing tidy. My immediate response was “…Seriously?!”

Once, it was true that tidy had been unmaintained during the earlier spread of HTML5 on the web.

That hasn’t been true for a long time! Tidy has been updated with HTML5 semantics for years. Since late 2015, in fact.

Don’t believe me? See for yourself!

Tidy is once again a living, actively maintained software project. Let’s use it!

If, at any point, it doesn’t meet Wikipedia’s needs, open an Github Issue. The community maintaining it is alive and well. Zearin (talk) 14:27, 8 March 2018 (UTC)Reply

We are / have been aware of html5 tidy. See https://phabricator.wikimedia.org/T89331#1681258 and https://lists.wikimedia.org/pipermail/wikitech-l/2015-August/082845.html
At this point, we have a viable solution that does what we need. This is based on the HTML5 spec and we then have added some html4-tidy compatibility passes to reduce breakage of pages. Eventually, we'll get rid of these compatibility passes by fixing pages instead.
In any case, the real problem with replacing HTML4-Tidy is fixing all pages that will break with a migration to HTML5 semantics. That process is ongoing now and we'll effect a completely switch from Tidy to Remex by mid 2018. SSastry (WMF) (talk) 15:33, 8 March 2018 (UTC)Reply
If you are aware of html5 tidy, that awareness should not be confined to discussion on phabricator’s page. This article’s opening statements are outdated by over 3 years, and the rest of the article continues as if that outdated state of the software were still true today.
The appearance from the outside is that either Wikimedia users are grossly out of touch and unaware of widespread use of HTML5 tidy, or that they don’t care about misleading the reader as long as it is in pursuit of their custom software solution.
At the very least, acknowledge the fact that tidy is **not** outdated software. And ideally, explain why your custom software solution, whose premise for adoption was that tidy was outdated, remains the best choice for the Wikimedia ecosystem. Zearin (talk) 15:42, 8 March 2018 (UTC)Reply
I can add a reference to HTML5 tidy to the page.
But, here are some examples to show how Tidy (even html5) differs from what a HTML5 parser (like a browser) interprets the HTML. This is based on the ubuntu-packed tidy that i installed via apt. tidy -v says: HTML Tidy for HTML5 for Linux version 5.2.0.
Given <p>a<span>b</p><p>c</span>d</p>, tidy-html5 generates <p>a<span>b</span></p>\n<p><span>c</span>d</p> whereas a HTML5 tree builder generates <p>a<span>b</span></p><p>cd</p>.
Given <ul><li>  a  </li><li>  b </li></ul>, it generates <ul>\n<li>a</li>\n<li>b</li>\n</ul>.
So, like tidy-html4, it does 3 things: (a) it fixes up span tags incorrectly compared to what the html5 treebuilder is supposed to (b) it adds line breaks by pretty-printing html (c) it trims whitespace in tags. In HTML5, whitespace can be significant and can be styled by CSS and you cannot arbitrarily manipulate it without impacting rendering. In fact, precisely because of this tidy-html4 behavior, we are now having to introduce a tidy compatibility mode in wikitext.
We want code / library that is compliant with the html5 tree builder spec. This gives us options as our platform evolves. We can replace one html5 library with another. We can switch development languages, port from node.js to PHP or PHP to Rust or to C or Java and know that the input source will continue to parse identically. With a custom html parser like tidy, we are less able to do that. In fact, we had the choice of 3 or 4 different html5 parser libraries when we were looking for one.
In any case, there are other very strong reasons why we are going with a HTML5 parser library (while homegrown, it is spec-compliant and can be replaced with something else down the line, if we so choose). As we evolve wikitext, we want to be able to build a DOM from the raw html and manipulate it. Tidy doesn't give us that right now.
But, thanks for flagging that there is a gap in our documentation. We'll update the page suitably. SSastry (WMF) (talk) 16:28, 8 March 2018 (UTC)Reply
Also, a colleague pointed out that we do reference tidy-html5 in our FAQ and discuss why we are using RemexHtml instead. See https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/FAQ#Why_are_you_replacing_it?_And_with_what? SSastry (WMF) (talk) 16:44, 8 March 2018 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.


Hello. Today we saw a new problem, that looks like something that comes from Tidy replacement, and has no tracking category. See [1]. It fixed a state, when about a half of wikicode editing links disappearred, one for each section. Starting just below the fixed section, and stopping about 100 sections below. When opening the problematic section for editing, it includes all theses sections together. Thank you. IKhitron (talk) 11:20, 25 March 2018 (UTC)Reply

Can you give me a link to the problematic section? Without knowing hebrew, I am having trouble figuring out where to look.
Note that, there will always be some Tidy issues that won't be covered by tracking categories. We are including tracking categories only for commonly seen patterns. Once Tidy is removed (as has been on hewiki), the problem will be obvious and you will continue to fix pages like you normally do. SSastry (WMF) (talk) 13:05, 25 March 2018 (UTC)Reply
  1. I fixed the previous link, sorry.
  2. It's section 48.
  3. This is the problem, this time it is not obvious at all.
Thank you. IKhitron (talk) 13:15, 25 March 2018 (UTC)Reply
@IKhitron, are you talking about the problem described in the section "היעלמות האופציה "עריכת קוד מקור" בחלק ניכר מהפרקים בדף זה"? The one that Reuven M. says that he fixed? Amir E. Aharoni {{🌎🌍🌏}} 04:44, 26 March 2018 (UTC)Reply
Indeed. IKhitron (talk) 08:07, 26 March 2018 (UTC)Reply
OK, so you're saying that in this revision some "Edit section" links weren't displayed, and in this revision they were displayed? Amir E. Aharoni {{🌎🌍🌏}} 08:15, 26 March 2018 (UTC)Reply
Yap. IKhitron (talk) 08:19, 26 March 2018 (UTC)Reply
OK. @SSastry (WMF), does this help? Amir E. Aharoni {{🌎🌍🌏}} 08:28, 26 March 2018 (UTC)Reply
Thanks, that helps, and it is section 48 according to Ikhitron. But, I don't think this is Tidy related. It could also be a bug because of https://gerrit.wikimedia.org/r/#/c/415770/. Anyway, I'll take a look today. If not, it will have to wait till next week. SSastry (WMF) (talk) 14:21, 26 March 2018 (UTC)Reply
Thank you. IKhitron (talk) 16:33, 26 March 2018 (UTC)Reply
This is not related to Tidy replacement or that whitespace patch either. It is a general parser issue -- I haven't investigated what is causing those edit links to disappear from that point on. So, I am going to close this discussion here. SSastry (WMF) (talk) 16:50, 26 March 2018 (UTC)Reply
Thank you very much. Should I open a phabricator ticket instead? IKhitron (talk) 16:51, 26 March 2018 (UTC)Reply
Sure. Thanks! SSastry (WMF) (talk) 16:52, 26 March 2018 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

I am developing a bot written by PHP that can fix multiple-unclosed-formatting-tags bug (in my opinion, this is just a feature of the bot). Most of the code is now complete. I noticed that you plan to replace tidy before June, I want to finish it as soon as possible, but my studies prevented me from developing it. Welcome to contribute. Code hosting on Github. 星耀晨曦 (talk) 05:07, 27 April 2018 (UTC)Reply

Lot of templates do not test the presence of absence of a leading colon for page names, but simply prepend a ":" always to make sure they'll get a link, and not render a file as an image (or audio/videoplayer), and will not categorize a page, and will not add interwikis. Extra leading colons are harmless, but now the lint checker complains about these and I don't know why it is needed to fix that, given that page names can never start by a colon.

Fixing that in templates is sometimes very complex as it forces testing the values to check if they start by a colon or not, and generate the colon conditionally (and this test increase the expansion nodes count, so it will break several pages, as well it will increase the expansion depth by 1)...

Can't we fix that simply in the new parser instead of asking people to fix pages and templates ? -- Verdy p (talk) 01:21, 22 June 2018 (UTC)Reply

[[::Test]] gives [[::Test]] with either parser. JJMC89 (talk) 01:23, 22 June 2018 (UTC)Reply
This were harmless, and this is a radical change... All these were equivalent, all of them generating a wikilink to the same page:
  • [[::Test]], [[:Test]], [[Test]]
    (gives now: [[::Test]], Test, Test)
  • [[:::fr:Test]], [[::fr:Test]], [[:fr:Test]]
    (gives now: [[:::fr:Test]], [[::fr:Test]], fr:Test)
    (but not [[fr:Test]] which generates an interwiki metadata and not a link to a resolved wiki)
  • [[::Category:Test]], [[:Category:Test]]
    (gives now: [[::Category:Test]], Category:Test)
    (but not [[Category:Test]] which sets a categorization metadata)
In images, we can use |link= with a value which can be either a wikilink, or a URL (starting by "http:" or "https:" or "//"), the colon may also be used to force a pagename wikilink instead of a URL starting by "http". Testing parameter values to know when to generate or not a colon is complex or will require adding some helper templates to know when to generate a colon depending on specific rules.
I do not see the interest of displaying verbatim "[]" pairs enclosing multiple leading colons except for [[:]] because there's no trailing pagename after the leading colons and it's impossible to generate a link from that.
If an article has to refer to a title starting by ":" we need to change the pagename: do you want to allow pagenames stargting by colons or just a page name with title ":" (that pagename would be quite difficult to use as targets of wikilinks or URLs) ?
Or do you plan to use "::" for adding new syntaxic features in MediaWiki (disambiguating more easily pagenames from other interwiki or namespace or special prefixes)?
----
Note that we use multiple leading colons with a visual editor (and in this talk thread using "Flow"), the whole link with brackets becomes now surrounded by "nowiki" tags (added silently). This does not happen when using the wikitext editor. I think this silent (and unexpected) addition of "nowiki" is in fact a nuisance (a pollution in fact). This unnecessarily obscures the code (and also caused edit bugs in this message when adding "(gives now: ...)" lines above, where all subsequents tags or wiki markup were corrupted) Verdy p (talk) 01:26, 22 June 2018 (UTC)Reply
I believe [[:::fr:Test]] would always have been rendered as plaintext; it's only 1 or 2 leading colons that would have worked. However, 2 colons would have resulted in a leading colon being part of the link text, which differs from the single colon escaping. Arlolra (talk) 21:56, 25 June 2018 (UTC)Reply
There's absolutely NO possibility for a link to start with a colon, as it is invalid in every page name (on all wikis, not just those from Wikimedia).
And there's absolutely no point at all in changing that to a plain-text with visible brackets surrounding the text with the multiple leading colons, and no point to enforce it (badly) by silently inserting nowiki tags, which also obscures the wikitext and make it even less editable.
Why do you want to see "[[::" and "]]" as plain-text ? If one really wants to see that, the "nowiki" tags can be added manually to escape them **only** where this plain-text is expected (extremely rare case in fact, compared to the very common cases where extra leading colons may be inserted by templates using optional parameters which may be empty for an optional namespace indicator or interwiki prefix).
Leading colons are used explicitly to force the interpretation as a wikilink and not as a rendered image or categorization metadata, or to make distinction between template names in the template namespace (no leading colon) or a transcluded page from another specified namespace or the root namespace.
This new requirement removes that useful distinction and just breaks many pages and complicates a lot the development of templates by forcing them to inspect the values of substituted parameters (we need now to add various tricky "#if" tests, the expansion time or nesting level is increased, the number of expanded nodes increases by forcing parameters tro be expanded multiple times... in summary this adds additional charges on the server or makes some page impossible now to render correctly due to resource exhaustion caused by these extra tests).
So in summary I do not like this new requirement that just breaks things and makes things just more complicated (and does not even help other parsers to disambiguate things. For me any wikitext sequence matched by this regexp (except those found in "nowiki" sections, or in HTML comments normally stripped in an earlier first stage of the parser, before handling "includeonly", "noinclude" and "onlyinclude" sections in the second earlier stage):
\[\[[ \t]*:*[ \t]*([^:\|\[\]][^\|\]]*)[ \t]*(\|[^\]]+)?\]\]
is a wikilink (or interwiki link) whose target is the page indicated in the first regexp-grouping parentheses in blue (to render as an HTML link with inline text content) if there's 1 or more leading colons (after stripping ignorable whitespaces , just indicated as [ \t]* in this simplified regexp, as there are other ignorable whitespaces), and the content of the second regexp-grouping parentheses in green is the inline content to render in that surrounding HTML link, independantly of the number of colons (indicated in red).
Note that the content matched by the blue group above may include transclusions of templates (or expansion of magic keywords) surrounded by {{...}}; their expansion could return leading colons to discard silently as well if they are in excess and there's already at least one colon before them...
Only when there's no leading colon at all (in red, or in the expansion of wiki templates in the blue group), the target may be interpreted as inline file rendering (image thumbnails, or audio/video player objects), or as a categorization (when the content of the first regexp-parentheses pair starts by the special namespaces names for files or categories); and otherwise it will also generate an HTML link with the inline content of the second group displayed. Verdy p (talk) 21:43, 27 June 2018 (UTC)Reply
To be clear, before any changes were made,
[[:{3,} ... ]]
already rendered as plaintext. It was only,
[[:{1,2} ... ]]
that gave the desired wikilink escaping.
The change was made because the 2 was likely the arbitrary result of some refactoring and not an explicit goal. There was no comment in the parser saying why it existed. And the functionality of being able to escape a wikilink did not depend on it.
When it came time to write another wikitext parser, it was a surprising find.
The point of the linting pass was to try and determine the extent to which it was relied upon.
As we saw in the ambassadors thread, it did result in some template authors having to use cumbersome workarounds rather than specifying that page titles passed to the templates shouldn't be manually colon escaped to begin with.
At the time I said I'd be willing to revert the change if it proved too bothersome but that was a year ago and there hasn't been much noise about it.
The proposal you're making here,
[[:+ ... ]]
is obviously more lenient and seems fine, but let's not pretend like that was ever the case. Arlolra (talk) 23:38, 23 July 2018 (UTC)Reply

Hi,

At dawiki we found that some pages have changed appearance because Tidy appearantly used to exhange div and span HTML elements so div wasn't placed inside span. "<span><div>Text here</div></span>" was changed to "<div><span>Text here</span></div>". Would it be possible for Special:LintErrors to find all such cases? Dipsacus fullonum (talk) 13:17, 6 July 2018 (UTC)Reply

For God's sake, Tidy!! :-(
Yes, we can find them. But, it will take a few days to get it deployed. Do you have a sense of how many pages are affected? Is it from a template? If it is coming from a template, perhaps you can fix those right away and see what happens? SSastry (WMF) (talk) 13:55, 6 July 2018 (UTC)Reply
It does find it in the Special:LintErrors/html5-misnesting. Although it seems that it ignores certain cases like the one you noted. Perhaps it was ignored because it didn't affect the how the final page looked. Other cases like the one below will be detected properly.
<span>bb<div>Text here</div></span>
197.218.81.173 (talk) 13:59, 6 July 2018 (UTC)Reply
Yes, Parsoid doesn't modify <span><div>foo</div></span> because paragraph tags aren't added around it. So, it doesn't trigger a html5-misnesting error.
But, <span>x<div>foo</div></span> has p-tags added because of the text node in the span tag. That is then broken up by the HTML5 parser and triggers a html5-misnesting error. SSastry (WMF) (talk) 14:04, 6 July 2018 (UTC)Reply
It definitely is important to add such a linter category, given that this affects rendering:
https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=2822626&action=parsermigration-edit
Chances are that such constructs are heavily used by templates. 197.218.81.173 (talk) 14:19, 6 July 2018 (UTC)Reply
https://gerrit.wikimedia.org/r/#/c/mediawiki/services/parsoid/+/444242/ -- I am going to wait for dawiki response to my question above and also to see if there are other complaints before we deploy this. SSastry (WMF) (talk) 15:10, 6 July 2018 (UTC)Reply
[2] IKhitron (talk) 15:28, 6 July 2018 (UTC)Reply
Hi, all known occurrences at dawiki was from one template which contained '<span style="style 1">{{{Text}}}</span>'.
That template was then used in other templates like '{{foo|Text=<div style="style 2">bar</div>}}
So when the span and div tags were exchanged by Tidy, it changed the order of the style attributes, and thus the appearance. I guess it may have affected around 30,000 pages at dawiki, mostly user talk pages.
The template with the span tag is already changed. But there may be unknown cases and similar cases at other wikis. Dipsacus fullonum (talk) 21:51, 6 July 2018 (UTC)Reply
This seems to occur in several cases outside divs or spans. It might be best to develop a general solution, but whitelist such reports in tags only when someone reports any occurences. That way it becomes a simpler case of a configuration change rather than coding.
See :
If one adds something like "style="background-color:green" to the parent tag it does show rendering differences. It might be good to over https://phabricator.wikimedia.org/tag/tidy/and close all tasks that became obsolete or decline as tidy is disabled.
The labels of categories in the linter could probably be changed too, eventually the concept of tidy will become irrelevant. Maybe it should instead focus on invalid html vs wrong / undesireable parser output. 197.218.91.75 (talk) 12:38, 7 July 2018 (UTC)Reply
This linter category is now live, but it probably has a number of false positives since rendering is affected in only in a small number of cases. We'll take a look at that. It shows up in the Miscellaneous-Tidy-Replacement-Issues category. SSastry (WMF) (talk) 14:01, 7 August 2018 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Just FYI, for those that are interested and probably watch this page. Jdforrester (WMF) (talk) 20:36, 10 December 2018 (UTC)Reply

A pity. IKhitron (talk) 22:34, 10 December 2018 (UTC)Reply