MediaWiki at WMF - Wikitech
Jump to content
From Wikitech
Wikimedia infrastructure
Data centers
Networking
Global traffic routing
MediaWiki SRE
Application servers
PHP 7 and php-fpm
BounceHandler
Citoid
Dumps
Envoy
EtcdConfig for MediaWiki
External storage
MediaWiki HTTP cache headers
MediaWiki On Kubernetes
mw-cron jobs
mw-experimental
MediaWiki Maintenance scripts
MediaWiki JobQueue
Mathoid
Memcached
mw-mcrouter
Mcrouter runbook
Nutcracker
Parser cache
Redis
Shellbox
Videoscaling
MediaWiki Engineering
MediaWiki at WMF
Parser cache
MediaWiki JobQueue
Performance review
PHP upgrade process
performance.wikimedia.org
Web Perf Hero award
Guides:
Frontend best practices
Backend best practices
more...
Runbooks:
Access control
Daily duties
Multimedia
Data Engineering
SRE Data Persistence
SRE Infra Foundations
SRE Observability
Wikidata Platform
Wikimedia Performance
Event Platform
Release Engineering
Fundraising
edit
MediaWiki
is the collaborative editing software that runs Wikipedia. This page documents its deployment at Wikimedia Foundation.
For more about the history, see
"MediaWiki" on wikipedia.org
For how to use, install or contribute, see
mediawiki.org
Infrastructure
A Wikipedia web request is processed in a series of steps outlined here (as of August 2022).
The DNS resolves hostnames like
en.wikipedia.org
ultimately points to an address like
text-lb.*.wikimedia.org
, for which the IP addresses are service IPs handled by LVS, which acts as a
direct-routing load balancer
to our caching proxies.
» See also
DNS
Global traffic routing
, and
LVS
Wikimedia Foundation owns its content-delivery network. The public load balancers and caching proxies are located in all data centres (especially those with the sole role of being an edge cache, also known as "pop").
» See also
Data centers
and
PoPs
The caching servers are implemented as a reverse proxy consisting of three layers: TLS termination, frontend caching, backend caching. Each cache server hosts all three of these layers.
» See also
Caching overview
TLS termination
and HTTP/2 handling, handled by HAProxy.
Frontend caching
: This is an in-memory HTTP cache (uses Varnish, called "Varnish frontend", or
varnish-fe
). The LVS load balancers route the request to a random cache proxy server to maximise the amount of parallel traffic we can handle. Each frontend cache server likely holds the same set of responses in its cache, the logical capacity for the frontend cache is therefore equal to 1 server's RAM.
Backend caching
: The backend HTTP caches are routed to by frontend caches in case of a cache miss. Contrary to the frontends, these are routed by a consistent hash, and they also persist their cache on disk (instead of in memory). The backend caches scale horizontally and have a logical capacity equal to the total of all servers. In case of a surge in traffic to a particular page, the frontends should each get a copy and distribute from there. Because of consistent hashing, the same backend cache is always consulted for the same URL. We use request coalescing to avoid multiple requests for the same URL hitting the same backend server. For the backend cache, we use Apache Traffic Server (
ats-be
).
After the cache proxies we arrive at the
application servers
(that is, if the request was not fulfilled by a cache). The application servers are load-balanced via LVS. Connections between backend caches and app servers are encrypted with TLS, which is terminated locally on the app server using a local
Envoy
instance, which, in turn, hands the request off to the local
Apache
. Prior to mid-2020,
Nginx-
was used for TLS termination. Apache there is in charge of handling redirects, rewrite rules, and determining the document root. It then uses
php-fpm
to invoke the MediaWiki software on the
app servers
. The application servers and all other backend services (such as Memcached and MariaDB) are located in "Core services" data centers, currently
Eqiad
and
Codfw
» See also
Application servers
for more about how Apache, PHP7 and php-fpm are configured.
Wikipedia request flow
MediaWiki components and appserver clusters
Wikimedia Foundation infrastructure around MediaWiki more broadly
App servers
See
Application servers
for more about how Apache and php-fpm are configured.
The application servers are divided in the following clusters:
Kubernetes deployment
Purpose
Servergroup
mw-web
Default catch-all for HTTP requests to wiki domains, if sampled into
MediaWiki On Kubernetes
, and not served by another group.
Hostnames:
mw-web-ro.discovery.wmnet
mw-web.discovery.wmnet
kube-mw-web
mw-api-ext
Public HTTP to wiki domains, with paths
/w/api.php
or
/w/rest.php
Hostnames:
mw-api-ext-ro.discovery.wmnet
mw-api-ext.discovery.wmnet
kube-mw-api-ext
mw-api-int
Internal HTTP to wiki domains, with paths
/w/api.php
or
/w/rest.php
Hostnames:
mw-api-int-ro.discovery.wmnet
mw-api-int.discovery.wmnet
kube-mw-api-int
mw-cron
Internal php-cli processes that run a MediaWiki maintenance script as scheduled
mw-cron jobs
kube-mw-cron
mw-debug
Public HTTP to wiki domains, with
X-Wikimedia-Debug
and select
k8s-mwdebug
as backend.
Hostnames:
mwdebug.discovery.wmnet
kube-mw-debug
mw-jobrunner
Internal HTTP from ChangeProp-JobQueue to wiki domains, with path
/rpc/RunSingleJob.php
Hostnames:
mw-jobrunner.discovery.wmnet
kube-mw-jobrunner
mw-misc
Public HTTP request to
noc.wikimedia.org
Hostnames:
mw-misc.discovery.wmnet
FIXME:
mw-parsoid
Internal HTTP from RESTBase to wiki domains, with path
/w/rest.php
to call
Parsoid
Hostnames:
mw-parsoid.discovery.wmnet
parsoid
mw-script
Internal php-cli processes that run a MediaWiki maintenance script ad-hoc from a
deployment server
via
mwscript
kube-mw-script
mw-videoscaler
Internal HTTP from ChangeProp-JobQueue to wiki domains, with path
/rpc/RunSingleJob.php
, to run
videoscaling
jobs.
Hostnames:
videoscaler.discovery.wmnet
kube-mw-videoscaler
Debug servers (bare metal)
Public HTTP to wiki domains, with
X-Wikimedia-Debug
. This is used by
WikimediaDebug
Hostnames:
mwdebug####.{eqiad,codfw}.wmnet
appserver
Snapshot hosts (bare metal)
Internal php-cli processes that run
mwscript
. These perform scheduled work to produce XML dumps.
Hostnames:
snapshot####.{eqiad,codfw}.wmnet
other
For web requests to Apache on in
MediaWiki On Kubernetes
, the
$_SERVER['SERVERGROUP']
environment variable in the
operations/deployment-charts.git
repo explicitly for each helm chart. See also
MediaWiki On Kubernetes/How it works
Logstash messages from MediaWiki carry a
servergroup
label that is set to
$_SERVER['SERVERGROUP']
Prometheus metrics (e.g. Grafana dashboards and Icinga alerts) carry a
cluster
field set to the "Hiera cluster" value.
Former groups
For web requests to Apache on bare metal appservers, the
$_SERVER['SERVERGROUP']
environment variable was automatically set based on the "Hiera cluster" value.
Main
app servers
(bare metal): Default catch-all for HTTP requests to wiki domains for anything not served by another group. This includes
/w/index.php
and
/w/load.php
. Notably excluded are
X-Wikimedia-Debug
/w/api.php
and
/w/rest.php
Hostnames:
appservers-ro.discovery.wmnet
appservers-rw.discovery.wmnet
Servergroup:
appserver
Decomission: Replaced by
mw-web
API app servers (bare metal): Public HTTP to wiki domains, with paths
/w/api.php
or
/w/rest.php
Hostnames:
api-ro.discovery.wmnet
api-rw.discovery.wmnet
Servergroup:
api_appserver
Decomission: Replaced by
mw-api-ext
and
mw-api-int
Videoscalers (bare metal): Internal HTTP from ChangeProp-JobQueue to wiki domains, with path
/rpc/RunSingleJob.php
, to run
videoscaling
jobs.
Hostnames:
videoscaler.discovery.wmnet
Servergroup:
jobrunner
Decomission: Replaced by
mw-videoscaler
Maintenance server
(bare metal): Internal php-cli processes that run a MediaWiki maintenance script. These are either from cron (systemd timers), or run on ad-hoc by MediaWiki developers shelling to a maintenance server.
Hostnames:
mwmaint####.{eqiad,codfw}.wmnet
Servergroup: other
Decomission: Replaced by
mw-cron
and
mw-script
MediaWiki configuration
For web requests not served by
the cache
, the request eventually arrives on an app server where Apache invokes PHP via
php-fpm
Document root
Example request:
The document root for a wiki domain like "en.wikipedia.org" is
/srv/mediawiki/docroot/wikipedia.org
source
).
The
/srv/mediawiki
directory on apps servers comes from the
operations/mediawiki-config.git
repository, which is cloned on the
Deployment server
, and then rsync'ed to the app servers by
Scap
The
docroot/wikipedia.org
directory is mostly empty, except for
w/
, which is symlinked to a wiki-agnostic directory that looks like a MediaWiki install (in that it has files like "index.php", "api.php", and "load.php"), but actually contains small stubs that invoke "Multiversion".
Multiversion
Multiversion
is a WMF-specific script (maintained in the operations/mediawiki-config repo) that inspects the hostname of the web request (e.g. "en.wikipedia.org"), and finds the appropiate MediaWiki installation for that hostname. The weekly
Deployment train
creates a fresh branch from the latest master of MediaWiki (including any extensions we deploy), and clones it to the deployment server in a directory named like
/srv/mediawiki/php-–
For example, if the English Wikipedia is running MediaWiki version
1.30.0-wmf.5
, then "en.wikipedia.org/w/index.php" will effectively be mapped to
/srv/mediawiki/php-
1.30.0-wmf.5
/index.php
. For more about the "wikiversions" selector, see
Heterogeneous deployment
The train also creates a stub
LocalSettings.php
file in this
php-...
directory. This stub
LocalSettings.
file does nothing other than include
wmf-config/CommonSettings.php
(also in the operations/mediawiki-config repo).
The
CommonSettings.php
file is responsible for
configuring MediaWiki
, this includes database configuration (which DB server to connect to etc.),
loading MW extensions
and configuring them, and general site settings (name of the wiki, its logo, etc.).
After CommonSettings.php is done, MediaWiki handles the rest of the request and responds accordingly.
Singleversion
Multiversion
has been THE way we have been deploying MediaWiki in the WMF since 2011. It has worked great and was a solution fitting to the kind of infrastructure available back then. In 2024, since hitting the 100% global traffic goal in of having MediaWiki On Kubernetes (mw-on-k8s) engineers have been thinking about how to make deployments, testing and evaluations easier. In the container world, with various deployment roadblocks and hindrances removed, Multiversion has emerged as the next contributing factor to slower deployments, mostly due to the fact that MediaWiki container images end up being multi GB in size, slowing down deployments. We have already picked lower hanging fruit, like utilizing caching and pregeneration mechanisms for our container images, but we felt that we could reap a lot by having single version containers, which would be 33% of the Multiversion ones.
This effort is named
Singleversion
in an effort to be explicit that this is different to
Multiversion
Rationale, documentation and code examples can be found at
Single Version MediaWiki
MediaWiki internals
To read more about how MediaWiki works in general, see:
Manual:Code
on mediawiki.org, about entry points and the directory structure of MediaWiki.
Manual:Index.php
on mediawiki.org, for what a typical MediaWiki entrypoint does.
Static files
There are broadly speaking two kinds of static assets served by
Apache
on MediaWiki application servers:
/w/**/*
/static/**/*
Application resources
Route:
/w/**/*?1234567
or
/w/**/*
Varnish: Strip cookies, fixed hostname.
Apache: Rewrite to
/w/static.php
source
).
Caching: public, 1 year, hostname-agnostic (Varnish object is shared across wiki domains).
Stats:
Grafana: MediaWiki Static
Versioned resources are the most common way we serve static files, and is generally how new code should use assets. These URLs are produced by MediaWiki's ResourceLoader or OutputPage component, and work by mapping the URL to a file on disk, hashing it, and appending that hash as a query string.
This offers the strongest performance (client-side immutable, and server-side wiki-agnostic shared cache), whilst also operating under relatively tight requirements (must know the exact version at the point where the file is requested). This is especially difficult in the face of
ParserCache
and our
CDN
, given that you would generally want to present users with a consistent experience from page-to-page where an icon or other visual aspect does not alternate based on when the page was last modified. The way we generally make this work is by linking to assets through one level of indirection, e.g. through a stylesheet or JavaScript manifest (see also:
mw:ResourceLoader/Architecture#Caching
).
On the backend, the requests for versioned resources are rewritten to
/w/static.php
. This implements important behaviours:
If given a version hash, match the request with the right version of the file by checking the two currently active MW branches in production.
If given a version hash, and the requested version is not found, we disable caching (reduce to 1 minute for clients and CDN). This avoids non-recovering cache poisoning around deployments, which would otherwise be possible given that we do not atomically group end-users and CDN servers and backend servers. More background about this eventual-consistency can be found in the
source
, and in
T47877
Without a version hash, serve the current version as found in the latest MW branch, regardless of hostname. In this case, it is expected that it is not important for changes to propagate immediately. They will generally propagate slowly over a 24-hour period, with any individual client always having a consistent experience between pages until a specific point where the resource is renewed and then all pages have the new version.
Below are examples of use cases where we cannot reasonably specify a version hash and thus request the current version without a hash parameter:
Gadgets and user scripts that augment core functionality and reuse some of our assets. For example, Wikipedia's
Vector.css override
references an SVG icon from MediaWiki. It is not versioned as the editor would otherwise have to keep it in sync with our deployments.
Debug mode from ResourceLoader, where we intentionally serve internal JS and CSS files directly without minification at their "current" version. Cache performance is not a concern in debug mode.
A tail of random things in core and extensions that reference static files that are not part of any UI code. Such as
Special:Version
linking the
COPYING
license file.
ULS web fonts (
T135806
). Upto 2021, files like this were sometimes served from "/static/current/**" which was deprecated in favour of simply "/w/**" in
T302465
WMF resources
Route:
/static/**/*
Varnish: Strip cookies, fixed hostname.
Apache: Serve file directly without involving PHP code.
Files are in
operations/mediawiki-config.git:/static/
Configuration:
puppet: apache/modules/expires.conf
Caching: public, 1 year, hostname-agnostic (Varnish object is shared across wikis).
These are custom assets, generally pointed to from settings in wmf-config.
The most prominent example are our project logos and favicons. We want to serve these from a stable URL that we can expose through APIs, to external organizations, be saved in databases, ParserCache, CDN, etc.. These URLs present a consistent experience to any given user, regardless of when the page they are on was last edited or purged. Changes to "static" resources should be rare as browsers are allowed to use their copy offline, without revalidation, for up to a year. This means that purging from the CDN does not mean users can be expected to get the latest copy.
The
/static
directory is external to MediaWiki and only used if and when explicitly configured so in
wmf-config
Timeouts
Request timeouts
Generally speaking, the app servers allow upto 60 seconds for most web requests (e.g. page views, HTTP GET), and for write actions we allow upto 200 seconds (e.g. edits, HTTP POST).
» See
HTTP timeouts#App server
for a detailed breakdown of the various timeouts on app servers.
Backend timeouts
MySQL/MariaDB
Setting
MySQL's
event_scheduler
core events master
core events replica
web requests user on running queries for read only replicas: 60s
web requests user for idle connections for read only replicas: 60s
web requests user read only replicas, on connection overload: 10s
web requests user for idle connections on read-write master: 300s
Type
Wall clock time
Notes
This was added as a measure to prevent pileups from a single event, as well as to overcome the (considered not ideal behavior) of terminated connections keeping running even if there will not be any socket open to report to. Implemented on MySQL's event scheduler for legacy reasons, but using
max_execution_time
or equivalent should be probably ideal.
Involved codebases
This section is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the
talk page
The "MediaWiki ecosystem" –
MediaWiki core
itself,
8 skins and 189 extensions
, and
production site configuration
(and
a static check-out
of our composer dependencies). This set of code totals (as of October 2022) approximately 2M lines of Wikimedia-maintained PHP (alongside another 1.2M of JavaScript),
plus third party libraries like Symfony, jQuery, Guzzle, and Vue.
The "
puppet
" Wikimedia server orchestration and configuration This set of code totals (as of November 2022) approximately 200K lines of Puppet/ERB (plus 55K each of YAML and Python, 35K of Ruby, and 10K of Bash and Perl).
Footnotes
Calculated using cloc v1.94 in a fresh check-out of production 1.40.0-wmf.10 with:
cloc
--not-match-d
'vendor|node_modules|lib'
--fullpath
--skip-uniqueness
cloc
--skip-uniqueness
vendor/wikimedia
vendor/oojs
vendor/wmde
vendor/diff
cloc
--skip-uniqueness
resources/lib/oo*
resources/lib/codex*
resources/lib/CLDRPluralRuleParser
resources/lib/wvui
Calculated using cloc v1.94 in a check-out of the production branch on 2022-11-21 with:
cloc
Pages in the
MediaWiki production
category
MediaWiki production
Application servers
MediaWiki at WMF
Heterogeneous deployment
History of job queue runners at WMF
History of job queue runners at WMF/Jobrunner
How to do a schema change
Obsolete:LocalisationUpdate
Maintenance server
MediaWiki and EtcdConfig
MediaWiki JobQueue
MediaWiki UDP logging
Memcached for MediaWiki
Mw-cron jobs
Mw-experimental
Mw-mcrouter
Mw-parsoid
Nutcracker
Parser cache
Php-wmerrors
Reattribute edits
Redis
Rename a wiki
SecurePoll
Wikimedia binaries
WikimediaDebug
See also
Infographics
User:Quiddity/How does it all work
, related notes and infographics
Retrieved from "
Categories
Pages with FIXME on them
MediaWiki production
MediaWiki at WMF
Add topic
US