Adds/changes dumps

This is the Wikimedia adds/changes dump service. Please read the copyrights information. See Meta:Data dumps for documentation on the provided data formats.

Here's the big fat disclaimer.

This service is experimental. At any time it may not be working, for a day, a week or a month. It is not intended to replace the full XML dumps. We don't expect users to be able to construct full dumps of a given date from the incrementals and an older dump. We don't guarantee that the data included in these dumps is complete, or correct, or won't break your Xbox. In short: don't blame us (but do get on the email list and send mail: see xmldatadumps-l).

The data provided in these files is ''partial data''. To be precise:

  • * Revisions included in these dumps are not up to the minute. We write out those that were created up to 12 hours ago; this gives local editing communities time to delete revisions with sensitive information, vulgarities and other vandalism, etc.
  • * New pages entered for the first time during the time interval are included
  • * Revisions of undeleted pages will be included only if new revision IDs need to be assigned to the restored revisions. For most revisions this will not be the case.
  • * Information about moves and deletes are not included.
  • * Imported revisions will be included if they were imported during the time interval, since they will have new revisions IDs.
  • * As with all dumps, hidden revisions or more generally revisions not readable by the general public are not provided.
  • * When a wiki is closed, it no longer shows up in this list.

What is in these files:

The stubs file consists of the metadata for revision texts of each page, where the revision texts were added within the time interval. These look just like the history stubs files you would find on our XML data dumps page, having the exact same format but only new revisions since the last adds/changes dump. This means you get metadata for articles, user pages, discussion pages, etc. If you want articles only, you will need to write a filter to grab just those entries.

The revs file consists of the metadata plus the wikitext for each new revision since the last adds/changes dump. This is in the same format as the pages-meta-history files you would find on our XML data dumps page. This means you get articles, user pages, discussion pages, etc. If you want articles only, you will need to write a filter to grab just those entries.

The md5sums.txt file contains the md5 hash of the stubs file and the revs file, so that downloaders can verify the integrity of the files after download.

The file maxrevid.txt contains the largest revision ID on the project from 12 hours before the start of the run or 12 hours before midnight on dates run after the fact.

The file status.txt, if it exists, will contain the value "done" in cases where the run is complete and was successful.

Adds/changes dump listing (links to latest complete run)


Return to our other datasets, the XML data dumps, or the main index.