Commons Impact Metrics now available via data dumps and API – Diff
Skip to content
Commons Impact Metrics is a new data product offering
monthly data dumps
and a new
Wikimedia Analytics API
for Wikimedia Commons
categories
of images relating to cultural heritage. These categories include content from libraries, museums, and archives but also visual documentation of natural, built, and living heritage. We’re agnostic about who facilitates the upload, so relevant categories can be added by institutions, Wikimedia affiliates,
Wiki Loves…
campaigns, and individual contributors. Using this data, Commons contributors and their partners can count monthly edits in a category; identify their most active contributors and most viewed files; and understand which Wikimedia projects, languages, and articles are using their images. We will share some specific examples using real data in a follow-up Diff post.
Partners need to demonstrate the impact of their contributions
Museums, libraries and archives have contributed millions of high quality images to Commons. An
analysis of digitized paintings on Wikipedia
found that
67% of the articles they illustrate are not directly related to art
These images are used to illustrate articles about history, religion, geography, and even broader concepts like
play
and
love
. Paintings, drawings, and sculpture from cultural institutions are important visual records of people and historical events that preceded the widespread use of photography.
By contributing their digitized collections to the Wikimedia projects, institutions can make their knowledge visible and relevant to new audiences in more than 300 languages. In 2018,
The Met reported
that their image of Katsushika Hokusai’s
The Great Wave
painting attracted 10x more views on English Wikipedia than on their own website, and 20x more views on Wikipedia articles in all languages than on their own website. In 2021,
Wellcome Collection announced
that their images on Wikipedia have been viewed more than 1.5 billion times, illustrating articles about Sagittarius, an East India Company opium processing factory, the Al-Aqsa Mosque, bipolar disorder, Neville Chamberlain, and more.
Wellcome Collection images used to illustrate Wikipedia articles, taken from
Images from Wellcome Collection pass 1.5 billion views on Wikipedia
, by Alice White, CC BY 4.0
It’s not only large museums or international brands that achieve this visibility on Wikipedia. Each month, a few hundred image files contributed by The Museum of Veterinary Anatomy in São Paulo (
category
) attract millions of page views.
Many of the institutions sharing images do so for reasons that go beyond a simple measure of reach. For example, the Smithsonian is focused on amplifying the accomplishments of American women by adding their biographies to Wikipedia and some of their most viewed images on Wikipedia are of women of color, such as Sojourner Truth, A Chippeway Widow, and Josephine Baker. Similarly, Wikimedia UK and the private Khalili Collections are
working together
to improve Wikipedia’s coverage of topics from Islamic pilgrimage to Japanese fashions. This followed
research
that found “a systemic cultural bias against non-Western visual art and artists across all Wikipedia platforms and in various languages.”
We’re providing more reliable data
To understand the impact of their contributions to Wikimedia projects, cultural institutions are interested in the utilization of their content (what projects, languages, and articles their images are used in) and views (how many times their images were seen per project, language, and article). Vital tools to access this data were developed by the community but it has been difficult for volunteer developers to consistently maintain them, leading to outages that damage the credibility of the Wikimedia movement. Because of the way these tools generate analytics, any outages also lead to data loss and both under- and overcounting of views.
To address these reliability issues, we developed a centralized, pre-computed dataset to increase trust in the numbers used to demonstrate the impact of image contributions.
More reliable
because the dataset, data dumps, and API are developed and maintained by the
Data Products team
and openly available for use and integration.
Less complex
because we are delivering
pre-computed data
instead of raw data that has to be processed in order to extract metrics from it.
More stable
because the calculation includes an
algorithm
with a maximum depth of seven sub-categories that covers most known use cases without inadvertently causing the system to fail by traversing the entirety of the category graph.
Operates at scale
. While existing tools can fail silently when handling larger quantities of content, we’re handling categories with up to 1 million files. For comparison, GLAMorgan has a 30K files max in the category graph.
Well documented
API
and
service
that aims to standardize our definitions and methods, so it is easier to compare and learn.
The data product was informed by community discussions and feedback
The development of this new data product was informed by community documentation and feedback, including a Wikimedia-l email thread (
The problems with Wikimedia metrics
, February 2023); a
GLAM Manifesto
(February 2023); a meeting report by Wikimedia Sweden (
Wikimedia metrics tools
, February 2023); and the
GLAM Metrics Needs page
(February-August 2023). We also reviewed past research, including a study from 2013,
Report on requirements for usage and reuse statistics for GLAM content
We carried out design research interviews with 16 participants representing 9 affiliates and institutions; released a
prototype for user testing
; and facilitated a workshop at the GLAM Wiki 2023 conference (
Understanding the Impact of Image Contributions to Commons
, November 2023). In May 2024, a beta version of Commons Impact Analytics data was made available at
Wikimedia Dumps
and
Marcel Ruiz Forns
led an exploration session at the Wikimedia Hackathon, “
Commons Impact Metrics BETA Data Dumps available today!”
Marcel Ruiz Forns and Krishna Chaitanya Velaga at the Wikimedia Hackathon 2024, by Robert Sim, CC BY-SA 4.0
We had to make some trade-offs to deliver useful analytics this year
There is an allow list of categories
Early
investigations
showed that we couldn’t work with the Commons repository in its entirety. There are more than 100 million media files associated with more than 16 million categories. The category graph can quickly connect to nearly every file on Commons causing computational and system failures. We therefore use an allow list of the categories that are in scope. We have backfilled more than 6 months of data for more than
1,200 categories
that are used by the GLAM Wiki Dashboard, the Cassandra Dashboard, and BaGLAMa 2. (As on-demand metrics tools, GLAMorgan and GLAMorous don’t have predefined lists of categories.) Having this backfilled data will help category owners understand how our new definitions and methods impact their numbers. New categories can be
requested
and will be added each month.
The data has a monthly granularity
To keep the dumps lighter and more manageable for volunteers, affiliates, and partners, we aggregate the data in a monthly granularity, on a monthly release schedule. New data will be available between the 2nd and 5th of each month.
We prioritized pageviews over mediarequests
The other big decision we needed to make was whether to prioritize
pageviews
(used by baGLAMa2 and GLAMorgan) or
mediarequests
(used by GLAM Wiki Dashboard). We selected
pageviews
because we thought it was important for partners and contributors to see which articles their images are illustrating. This gives them insight into how their collections can meet the everyday information needs of internet users, and where they are closing visual knowledge gaps on Wikimedia projects. This is more actionable information than total view counts by projects and languages only.
However, one downside of using
pageviews
is that we had to exclude main page views until we have a better way of accounting for when media is actually present in the page. While main page views can significantly increase the overall count for a category or file, it is less intentional traffic than article views. For more about the pros and cons of these different approaches, see the
Data Model page on Wikitech
In the future, we may offer
mediarequest
counts alongside
pageview
counts if there is a demonstrated need. We have captured this potential work in a
Phabricator ticket
This product will be supported by two teams at the Foundation
The dumps and API service are managed by the
Data Products Team
. Please watch the
Data Model page
and join the
analytics-l
mailing list for any planned updates that could have an impact on integrations and tools. Issues and requests should be logged in Phabricator using the
Commons-Impact-Metrics
and
Data Products
tags. You can track incoming requests and progress on the
Phabricator workboard
. If you have questions or feedback, please use the
Talk page for the project on Wikitech
If you want to add a new Wikimedia Commons category to the allow list (or rename or remove existing categories), you can make a request to the
Culture and Heritage team
by
using this form
. You can see all open requests on
this Phabricator workboard
How to start using Commons Impact Metrics
Access the data via
Dumps
(tips on
how to work with dumps locally
) and
API
Find out more by reading the
documentation
and reviewing the
Data Model
Add your own cultural heritage categories by making a
Phabricator request
We’ll soon publish a second post with examples of the questions that can be answered using Commons Impact Metrics.
Share this:
Share on Mastodon (Opens in new window)
Mastodon
Share on Bluesky (Opens in new window)
Bluesky
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation
Related
Related
Welcome to Diff
Welcome to Diff, a community blog by – and for – the Wikimedia movement. Join Diff today to share stories from your community and comment on articles. We want to hear your voice!
Subscribe to Diff via Email
Wikimedia News
Wikimedia Foundation News
“Cinematic intensity”: The winners of Wiki Loves Earth 2025
2 March 2026
by Wikimedia Foundation
Wikimedia Technology Blog
A Tech Blog Diff
24 February 2026
by LGoto
Down the Rabbit Hole
Announcing Wikipedia’s top 25 most-read articles of 2025
2 December 2025
by Wikimedia
Photo credits
Wellcome Collection images used to illustrate Wikipedia articles
Alice White
CC BY 4.0
Marcel Ruiz Forns at the Wikimedia Hackathon 2024
Robert Sim
CC BY-SA 4.0
Report this comment
wpDiscuz
You are going to send email to
Move Comment