Notebook
MediaWiki REST API docs
Discuss
| Fork
this notebook
on
PAWS
Exploring page history
The
MediaWiki REST API
lets you build apps and scripts that interact with any MediaWiki-based wiki. In this tutorial, we'll use the REST API page history endpoints to explore the history of articles on
English Wikipedia
A wiki page's history is divided into a sequence of revisions. A revision can be any change to the content of a page: an anonymous editor correcting a typo, an admin reverting vandalism, a bot fixing a citation link. The history API lets you browse through a page's history in segments of 20 revisions.
To see the most recent revisions to a page, we'll make a request to the page history endpoint, including the page title in the path. The page history endpoint returns the latest revision segment, starting with the most recent revision. Each revision object includes the revision
id
, which you can use to get more information about the revision to compare revisions. Revision IDs do not increase sequentially, so you may see an older revision with a higher revision ID than a newer revision.
In this example, we'll explore the history of the
Agapanthus article on English Wikipedia
. The
delta
gives us a sense of the size of the change.
In [1]:
import
requests
import
json
url
'https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history'
headers
'User-Agent'
'MediaWiki REST API docs examples/0.1 (https://meta.wikimedia.org/wiki/User:APaskulin_(WMF))'
response
requests
get
url
headers
headers
response
json
loads
response
text
'timestamp id delta'
for
revision
in
response
'revisions'
]:
revision
'timestamp'
],
revision
'id'
],
revision
'delta'
])
timestamp id delta
2020-05-07T08:22:21Z 955342435 4
2020-04-18T16:55:06Z 951740615 37
2020-03-11T14:59:54Z 945053408 6
2020-03-11T14:58:11Z 945053115 -13
2020-03-11T14:56:56Z 945052882 0
2020-03-11T14:54:46Z 945052528 5014
2020-03-11T14:52:32Z 945052235 -15
2020-02-24T14:29:11Z 942410937 -98
2020-02-24T07:23:23Z 942370303 98
2020-01-25T20:03:40Z 937558444 14
2020-01-25T20:02:56Z 937558348 -7
2020-01-25T02:44:12Z 937448611 7
2020-01-05T05:16:23Z 934187529 -9
2019-09-18T12:37:37Z 916345891 43
2019-09-11T02:17:14Z 915073907 -2
2019-08-09T09:15:08Z 910045778 -8
2019-07-15T15:05:47Z 906391967 -33
2019-07-15T02:23:42Z 906315318 33
2019-07-12T02:10:35Z 905878677 31
2019-05-08T08:15:32Z 896094138 71
The page history endpoint also provides
older
newer
, and
latest
properties to make it easy to scroll through revisions.
In [2]:
\n
Get next 20:'
response
'older'
])
Get next 20: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?older_than=896094138
To look further back in the history of the page, we can use the
older
property to fetch the next 20 oldest revisions. The response is non-inclusive; it doesn't include the revision specified in the URL.
In [3]:
url
response
'older'
response
requests
get
url
headers
headers
response
json
loads
response
text
\n
timestamp id delta'
for
revision
in
response
'revisions'
]:
revision
'timestamp'
],
revision
'id'
],
revision
'delta'
])
timestamp id delta
2019-02-20T05:05:22Z 884205793 9
2019-01-29T05:16:31Z 880741028 -17
2018-10-23T12:11:36Z 865354227 18
2018-10-04T08:59:21Z 862426460 176
2018-07-24T07:44:07Z 851733941 523
2018-06-23T11:57:05Z 847170467 10
2018-05-15T15:15:19Z 841389263 12
2018-02-17T03:22:09Z 826086995 14
2018-02-15T08:25:11Z 825767252 23
2018-02-15T07:40:47Z 825763872 -27
2018-01-13T22:05:53Z 820249697 1
2017-07-13T23:35:03Z 790471460 65
2017-05-24T21:47:07Z 782092200 24
2017-03-10T22:48:55Z 769666928 4
2017-02-11T09:29:37Z 764856323 0
2017-02-07T08:29:51Z 764138832 69
2017-02-07T08:21:34Z 764138197 256
2017-01-26T05:40:42Z 762025429 -4
2017-01-02T14:18:08Z 757923981 18
2017-01-02T13:57:15Z 757921856 20
We can continue exploring using the URLs provided by the
older
newer
, and
latest
properties.
From here, we can use the
newer
URL to scroll forward in the page's history, or by requesting the page history without an
older_than
or
newer_than
parameter, return to the most recent revisions. The page history endpoint can only return up to 20 revisions, so you cannot use both the
older_than
and
newer_than
parameters in the same request.
If the API does not return an
older
property, there are no older revisions left to return. Following the same pattern, the absence of a
newer
URL indicates that there are no newer revisions available.
In [4]:
\n
Get next 20:'
response
'older'
])
'Get previous 20:'
response
'newer'
])
'Get latest:'
response
'latest'
])
Get next 20: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?older_than=757921856
Get previous 20: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?newer_than=884205793
Get latest: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history
Revision details
In exploring the history of the Agapanthus page, we can see an addition of 256 bytes on 2017-02-07 with the revision ID 764138197. To understand why an edit was made and by whom, we can use the revision ID to make a request to the revision endpoint.
In [5]:
url
'https://en.wikipedia.org/w/rest.php/v1/revision/764138197/bare'
response
requests
get
url
headers
headers
response
json
loads
response
text
'Revision ID:'
response
'id'
],
\n
Minor edit?'
response
'minor'
],
\n
Editor:'
response
'user'
][
'name'
],
\n
Summary:'
response
'comment'
])
Revision ID: 764138197
Minor edit? True
Editor: Rjwilmsi
Summary: Journal cites, added 1 PMID, templated 2 journal cites using [[Project:AWB|AWB]] (12142)
Filtering page history
Some revisions are tagged to indicate the type of edit. The page history endpoint supports filtering by edit type, allowing you to request, for example, only edits made by bots or only edits made by anonymous users. See the
endpoint reference
for the complete list of supported filters.
To see the most recent edits made by bots, call the page history endpoint with the
filter
parameter set to
bot
In [6]:
url
"https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?filter=bot"
response
requests
get
url
headers
headers
response
json
loads
response
text
\n
timestamp id delta'
for
revision
in
response
'revisions'
]:
revision
'timestamp'
],
revision
'id'
],
revision
'delta'
])
\n
Get next 20:'
response
'older'
])
'Get latest:'
response
'latest'
])
timestamp id delta
2020-04-18T16:55:06Z 951740615 37
2019-09-18T12:37:37Z 916345891 43
2019-09-11T02:17:14Z 915073907 -2
2019-05-08T08:15:32Z 896094138 71
2019-02-20T05:05:22Z 884205793 9
2018-10-04T08:59:21Z 862426460 176
2018-01-13T22:05:53Z 820249697 1
2017-05-24T21:47:07Z 782092200 24
2017-01-02T14:18:08Z 757923981 18
2016-10-05T10:06:20Z 742716578 43
2014-09-03T10:19:11Z 623989770 -16
2013-04-27T13:15:33Z 552414853 -2
2013-04-10T05:50:03Z 549632707 0
2013-02-19T16:12:27Z 539058803 -408
2012-12-11T02:40:00Z 527460621 -48
2012-07-08T06:43:09Z 501207527 18
2011-06-19T05:02:00Z 435041747 1449
2011-06-04T08:45:46Z 432487673 26
2010-06-08T17:32:46Z 366828952 -2
2010-01-06T09:48:03Z 336173014 -18
Get next 20: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?filter=bot&older_than=336173014
Get latest: https://en.wikipedia.org/w/rest.php/v1/page/Agapanthus/history?filter=bot
You can pair the
filter
parameter with the
older_than
or
newer_than
parameter to explore page history by revision type. Due to differences in caching infrastructure, the number of revisions returned by the page history filters may not match the statistics returns by the
page history counts endpoint
Comparing revisions
When exploring a page's history, you can use the
delta
to understand the impact of the edit in relation to the previous edit. You can also compare any two revisions line by line using the compare revisions endpoint. The compare revision endpoint takes two revision IDs as path parameters; you can use the page history endpoint to select revision IDs.
To compare two revisions, supply the revision IDs in the endpoint path. The API returns the differences between the two revisions as a structured JSON object designed to help construct a visual representation of the diff.
In [7]:
url
'https://en.wikipedia.org/w/rest.php/v1/revision/847170467/compare/851733941'
response
requests
get
url
headers
headers
response
json
loads
response
text
json
dumps
response
indent
))
"from": {
"id": 847170467,
"slot_role": "main",
"sections": [
"level": 2,
"heading": "==Description==",
"offset": 3006
},
"level": 2,
"heading": "==Taxonomy==",
"offset": 4324
},
"level": 3,
"heading": "===Family placement===",
"offset": 4643
},
"level": 3,
"heading": "===Species===",
"offset": 9513
},
"level": 2,
"heading": "==Cultivation==",
"offset": 12089
},
"level": 2,
"heading": "==Allergenic potential==",
"offset": 15634
},
"level": 2,
"heading": "==References==",
"offset": 15985
},
"level": 2,
"heading": "==External links==",
"offset": 16018
},
"to": {
"id": 851733941,
"slot_role": "main",
"sections": [
"level": 2,
"heading": "==Description==",
"offset": 3006
},
"level": 2,
"heading": "==Taxonomy==",
"offset": 4324
},
"level": 3,
"heading": "===Family placement===",
"offset": 4643
},
"level": 3,
"heading": "===Species===",
"offset": 9513
},
"level": 2,
"heading": "==Cultivation==",
"offset": 12089
},
"level": 2,
"heading": "==Allergenic potential==",
"offset": 16157
},
"level": 2,
"heading": "==References==",
"offset": 16508
},
"level": 2,
"heading": "==External links==",
"offset": 16541
},
"diff": [
"type": 0,
"lineNumber": 95,
"text": "",
"offset": {
"from": 14692,
"to": 14692
},
"type": 0,
"lineNumber": 96,
"text": "Though ''Neuranethes spodopterodes'' is invasive in the regions where it has emerged as a pest, it is not an exotic invader, but a [[Species translocation|translocated species]], having been imported inadvertently from its natural range in more northerly regions of the country. In its original range the moth is not of horticultural importance, being controlled by natural enemies that as yet have neither been identified nor imported along with the host plants. In contrast the ''Agapanthus'' borer is of considerable concern in the South West, and its voracity is so impressive that the species shows promise as a possible control for invasive ''Agapanthus praecox'' in countries like New Zealand.M.D. Picker and M. Kr\u00c3\u00bcger. Spread and Impacts of the Agapanthus Borer (Neuranethes spodopterodes (Hampson, 1908), comb. nov.), a Translocated Native Moth Species (Lepidoptera: Noctuidae). African Entomology 2013 21 (1), 172-176",
"offset": {
"from": 14693,
"to": 14693
},
"type": 1,
"lineNumber": 97,
"text": "",
"offset": {
"from": null,
"to": 15633
},
"type": 1,
"lineNumber": 98,
"text": "In 2016, a new species of [[Cecidomyiidae|gall midge]], ''[[Enigmadiplosis agapanthi]]'', was described damaging Agapanthus in the [[United Kingdom]].{{cite journal| last1 =Harris | first1 =KM | last2 =Salisbury |first2=A| last3=Jones |first3=H | title =''Enigmadiplosis agapanthi'', a new genus and species of gall midge (Diptera, Cecidomyiidae) damaging Agapanthus flowers in England. | journal =Cecidology | volume =31 | issue = | pages =17-20 | publisher =British Gall Society | location = | date =2016 }} ",
"offset": {
"from": null,
"to": 15634
},
"type": 0,
"lineNumber": 99,
"text": "",
"offset": {
"from": 15633,
"to": 16156
},
"type": 0,
"lineNumber": 100,
"text": "==Allergenic potential==",
"offset": {
"from": 15634,
"to": 16157
You should now be able to use the REST API history endpoints to explore page history on Wikipedia. You can also use these endpoints with any
Wikimedia project
To fork, edit, and re-run this Jupyter Notebook, download the
source
, and upload to
PAWS
using your Wikimedia account.
For more information about these endpoints, see the
API reference
. To share your feedback on this tutorial, post a comment to the
REST API discussion page
This tutorial is licensed under the
Creative Commons Attribution-ShareAlike License