⚓ T346849 Introduce a machine-readable f

⚓ T346849 Introduce a machine-readable format for storing source reliability consensus
Page Menu
Phabricator
Create Task
Maniphest
T346849
Introduce a machine-readable format for storing source reliability consensus
Open, Needs Triage
Public
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
None
Authored By
ppelberg
Sep 19 2023, 11:01 PM
2023-09-19 23:01:28 (UTC+0)
Tags
VisualEditor
(Triaged)
EditCheck
(Backlog)
Editing-team
(This Fiscal Year)
Referenced Files
None
Subscribers
Aklapper
cmadeo
DLynch
Enterprisey
Eragon_Shadeslayer
Esanders
Esther.Osayande
View All 27 Subscribers
Description
Per
T345118
, offering people feedback about how likely experienced volunteers are to consider the source they are attempting to add to be reliable depends on the existence of a machine-readable "list" wikis can use to store/encode the current reliability consensus of a given source.
This task involves the work of proposing a format for said list that is:
Sufficiently explicit to be machine readable
and
Expressive enough to hold the nuanced and non-binary nature of
WP:RSP
and other pages like it [i]
References
en:Wikipedia:Citation Watchlist/Lists/RSP
Wikipedia:Citation Watchlist/Lists/Wikimedia
via
@Harej
and
@Ocaasi
Loose
It'll be important that this list include a field for volunteers to be able to specify what feedback message people see when attempting to cite a given source.
i.
Related Objects
Search...
Task Graph
Mentions
Status
Subtype
Assigned
Task
Open
None
T265163
Create a system to encode best practices into editing experiences
Open
None
T276857
Surface Reference survival signal within VE
Open
None
T346849
Introduce a machine-readable format for storing source reliability consensus
Mentioned In
T348060: Expand the Reference Reliability check to include a wider range of sources
T347531: Define user experience for prompting people to review and replace unreliable sources
T347775: Implement edit check configuration setting to specify categories it'll apply within
Mentioned Here
T349261: Introduce an API to enable editing interfaces to offer feedback about reference reliability
T347531: Define user experience for prompting people to review and replace unreliable sources
T337431: Rework MediaWiki:SpamBlacklist
T347435: Warn when adding a URL that matches blocked external domains in the 2010 editor
T347775: Implement edit check configuration setting to specify categories it'll apply within
T276857: Surface Reference survival signal within VE
T345118: [SPIKE] Investigate what capabilities are need to support a reference reliability check
Event Timeline
ppelberg
created this task.
Sep 19 2023, 11:01 PM
2023-09-19 23:01:28 (UTC+0)
matmarex
unsubscribed.
Sep 19 2023, 11:24 PM
2023-09-19 23:24:18 (UTC+0)
Sdkb
added a comment.
Sep 20 2023, 11:14 PM
2023-09-20 23:14:20 (UTC+0)
Comment Actions
This is an interesting, potentially tricky task. The creation of RSP in the first place was controversial, although it's grown more accepted over time. The refrain of opponents is "it's always contextual, and trying to codify it is flattening". Personally, I support a more standardized system, which is what you're going for here, but just be aware that it's a consensus you'll likely have to fight for, and the less expressive the system you propose is, the harder that fight will be.
Some qualities that I'd ideally like to see the system be able to distinguish between are news vs. opinions (e.g. we want to present different advice to someone citing a NYT op-ed than someone citing a NYT news article; the URL would presumably be the key to discerning that) and subject area (e.g. let's say we decide that Al Jazeera is generally reliable except for anything related to the Israeli-Palestinian conflict; we'd want to be able to present a specialized notice to anyone attempting to use it on an article in a subcategory of Category:Israeli–Palestinian conflict; and even then it wouldn't capture a citation related to the conflict within an article that isn't).
And even if we come up with a perfect system, it'll still inevitably involve some amount of rejiggering RSP to fit everything into that system. And given how contentious debates around sources can be, that's not going to be easy.
VPuffetMichel
moved this task from
To Triage
to
Triaged
on the
VisualEditor
board.
Sep 21 2023, 3:17 PM
2023-09-21 15:17:23 (UTC+0)
FNavas-foundation
added a comment.
Sep 21 2023, 3:36 PM
2023-09-21 15:36:34 (UTC+0)
Comment Actions
Just want to add that CredBot may be a way, or hold some options —
ppelberg
mentioned this in
T347775: Implement edit check configuration setting to specify categories it'll apply within
Sep 29 2023, 9:29 PM
2023-09-29 21:29:59 (UTC+0)
Snaevar
unsubscribed.
Oct 1 2023, 11:28 PM
2023-10-01 23:28:12 (UTC+0)
ppelberg
added a comment.
Edited
Oct 2 2023, 9:34 PM
2023-10-02 21:34:20 (UTC+0)
Comment Actions
In
T346849#9187978
@FNavas-foundation
wrote:
Just want to add that CredBot may be a way, or hold some options —
Oh, great spot,
@FNavas-foundation
; I've
added CredBot
to
T276857
as well as to
mw:Edit check
so that we can revisit it.
ppelberg
added a subscriber:
MusikAnimal
Edited
Oct 2 2023, 9:38 PM
2023-10-02 21:38:44 (UTC+0)
Comment Actions
In
T346849#9185418
@Sdkb
wrote:
This is an interesting, potentially tricky task. The creation of RSP in the first place was controversial, although it's grown more accepted over time. The refrain of opponents is "it's always contextual, and trying to codify it is flattening"...
@Sdkb
: this context is extremely helpful – thank you for making the time to share it with us.
Two things below:
Responses to the specific points you raised (as well as a couple of follow-up questions)
How the Editing Team is thinking about moving forward from here towards a future where people receive actionable feedback about the sources they're attempting to add.
Responses
The refrain of opponents is "it's always contextual, and trying to codify it is flattening".
The simplicity of the above is clarifying.
Personally, I support a more standardized system, which is what you're going for here…
This is helpful to know.
Question:
Can you please say a bit more about where/why the support you have for such a system comes from?
Asked in another way: what positive impact(s) do you think a more standardized system has potential to deliver?
Some qualities that I'd ideally like to see the system be able to distinguish between are news vs. opinions
Noted.
Question:
How – if at all – do you currently make these distinctions? E.g. do you need to visit the source and make that call based on what you see?
...subject area (e.g. let's say we decide that Al Jazeera is generally reliable except for anything related to the Israeli-Palestinian conflict; we'd want to be able to present a specialized notice to anyone attempting to use it on an article in a subcategory of Category:Israeli–Palestinian conflict;
Great spot, I've created
T347775
for this use case.
Editing Team Proposed Plans
Now, to how the Editing Team is thinking of moving forward…
For the reasons you named above, [i] it seems most tractable to move forward with an initial approach that depends on a facet of reliability policy that is:
Already in a machine-readable format
Widely consented upon
With this in mind, we're thinking of starting with offering people feedback when they are attempting to use a URL as a reference that matches the SpamBlacklist, as
this wish from the 2023 Wishlist Survey describes
and is an approach that would benefit from the work
@Ladsgroup
has done in
T337431
and
@MusikAnimal
is doing in
T347435
Longer-term, if we collectively come to find this initial implementation impactful, perhaps it will help prompt a wider conversation about how the "reliability check" might be expanded upon to include a wider range of sources/domains.
i. Specifically, the complexity/challenge of trying to codify a policy that is evolving and non-binary
ppelberg
moved this task from
Untriaged
to
This Fiscal Year
on the
Editing-team
board.
Oct 2 2023, 9:39 PM
2023-10-02 21:39:00 (UTC+0)
ppelberg
mentioned this in
T347531: Define user experience for prompting people to review and replace unreliable sources
Oct 3 2023, 7:30 PM
2023-10-03 19:30:57 (UTC+0)
ppelberg
mentioned this in
T348060: Expand the Reference Reliability check to include a wider range of sources
Oct 3 2023, 7:38 PM
2023-10-03 19:38:23 (UTC+0)
Sdkb
added a comment.
Edited
Oct 4 2023, 7:11 AM
2023-10-04 07:11:29 (UTC+0)
Comment Actions
Can you please say a bit more about where/why the support you have for such a system comes from? Asked in another way: what positive impact(s) do you think a more standardized system has potential to deliver?
Fundamentally, I think organization into standardized frameworks is helpful because it reduces complexity. This facilitates machine-reading (including Edit Check, but also any other tools that might want to use the RSP list) as well as makes things simpler/easier for humans browsing the list. For instance, browsing through RSP, I see a bunch of small wording variations that might be considered quirks or point to potential ambiguities: Why does the
CS Monitor
entry
say it is "generally reliable for news" whereas other entries like
The Age
just say "generally reliable", full stop? And why does the
NYT
's entry include a special note that opinion pieces should be governed by WP:RSOPINION whereas the
Financial Times
' entry does not, even though it also publishes opinion? I think a standardized system would help iron out these wrinkles and force us to confront ambiguities that may be causing confusion. Ultimately, it'd lead to the summary section of RSP becoming more regulated/optimized, which would make it easier to parse. It'd also be clarifying for debates on source reliability. In the past, those debates had a certain reinvent-the-wheel aspect, which could make consensus less clear given that different editors might interpret something like "generally reliable" differently. More recently, those debates have moved toward a more standardized four-option menu, as e.g.
here
. Having that common vocabulary makes it easier to say e.g. "this source is about as reliable as that other one, so let's treat it similarly". A more complete standardized framework would effectively expand that common vocabulary.
Question: How – if at all – do you currently make these distinctions? E.g. do you need to visit the source and make that call based on what you see?
The easiest cases are those for publications that clearly distinguish between news and opinions (more often newspapers). For these, it's just a matter of visiting the source, seeing whether the URL/section label is "Opinion" or a news department, and going by that. It gets trickier with publications with blurrier lines (more often magazines). For those, it's often necessary to read the article to figure out what type it is. And in some cases, for "analysis"-type edge cases, editors may disagree about whether a statement in a source is opinion or fact.
Your initial approach sounds good to me. Overall, it'll be easiest to work from the bottom of
the reliability scale
up (blacklisted sources being at the very bottom), since at that extreme there are very few exceptions where it would become okay to use a source. (If you want to know what those exceptions are, then search for whitelisted uses of currently blacklisted sources.) Greenlit "generally reliable" sources would be the second-easiest category to machine-read. It's the yellow "marginally reliable" sources that will be the greatest challenge, since they have a bunch of caveats and nuances. We'll eventually want to be able to parse them, but for a minimum viable product they seem better to avoid. Browsing through the entries there will give you a sense of what the caveats and nuances tend to be. The specific language used is important ("no consensus" means editors were unable to agree, whereas "marginally reliable" means editors agreed that the source is marginal) but it tends to fall into patterns.
ppelberg
renamed this task from
Propose a format for storing how reliable a project considers a source to be
to
Introduce a machine-readable format for storing how reliable a project considers sources to be
Nov 9 2023, 1:06 AM
2023-11-09 01:06:08 (UTC+0)
ppelberg
renamed this task from
Introduce a machine-readable format for storing how reliable a project considers sources to be
to
Introduce a machine-readable format for storing source reliability consensus
ppelberg
updated the task description.
(Show Details)
Nov 9 2023, 1:17 AM
2023-11-09 01:17:14 (UTC+0)
Izno
subscribed.
Nov 9 2023, 1:38 AM
2023-11-09 01:38:40 (UTC+0)
ppelberg
added a subscriber:
DLynch
Nov 13 2023, 6:37 PM
2023-11-13 18:37:12 (UTC+0)
Comment Actions
In
T347531#9318220
@DLynch
wrote:
Re: 1.A, this message is probably going to be very similar to the existing
spamprotectiontext
message, but we can't just use that one because (a) its default wording expects to be talking about a whole attempted-edit and so is a bit too vague ("probably caused by a link to a forbidden external site"), and (b) some wikis (looking at you,
enwiki
) have customized it to be
huge
and so it would cause display issues squeezing it into the citoid dialog.
Re: 2, we could say there's six grades of referenced-website, which mostly relates to the categories in the legend in the
perennial sources
page. In rough order of "goodness":
Generally reliable.
Unknown; nobody has made a rule against it, but
could
be unreliable, so we shouldn't say it's "good". This is going to cover a lot of citations to information on minor websites.
Situational ("no consensus"); sources for which you need to read the warning. (To pick an early example from the list: "Arab News is reliable unless the article is about the Saudi Arabian government".)
Generally unreliable; you probably shouldn't use this, but it's not actually forbidden.
Deprecated; you can only use this for self-descriptions, i.e. referencing the fact of the content existing on the site
Blocked; you literally cannot add this to the wiki.
The last one is the only one we should actually block people from adding, because there's occasional valid reasons to use all of the others.
Pcoombe
subscribed.
Nov 15 2023, 5:58 PM
2023-11-15 17:58:18 (UTC+0)
Frostly
subscribed.
Dec 11 2023, 12:47 AM
2023-12-11 00:47:37 (UTC+0)
ppelberg
added a comment.
May 6 2024, 5:46 PM
2024-05-06 17:46:00 (UTC+0)
Comment Actions
In
T349261#9775284
@ppelberg
wrote:
Related:
via
@Samwalton9-WMF
ppelberg
updated the task description.
(Show Details)
May 6 2024, 5:47 PM
2024-05-06 17:47:37 (UTC+0)
ppelberg
added subscribers:
Harej
Ocaasi
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits