This page can be improved by breaking up its content into other docs. See
phab:T232404
. Contributions welcome!
Tools and Toolforge users have access to two sets of databases:
Wiki Replicas
with read-only access to production MediaWiki data
user databases
(aka ToolsDB) to store data generated by the tools themselves.
On the wiki replicas, private user data has been redacted (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.
Connecting to the database replicas
Log in to
login.toolforge.org
over SSH and run the
sql
command to connect with a database replica.
To query a replica of the MediaWiki core database, specify its database name. For example,
enwiki
for the English Wikipedia database:
For Wikidata specify
wikidatawiki
, for Commons specify
commonswiki
, and so on. If you're unsure which database name to specify, check the
Wiki Info tool
To query a replica of a MediaWiki extension database, use the
--extension
option:
sql
--extension
termstore
wikidatawiki
To connect with
ToolsDB
, where you can create and write your own custom tables, specify:
The
sql
command is a shortcut for invoking the
mariadb
client, and automatically specifies the right credentials file, database hostname, and adds the
_p
suffix to database names like
enwiki_p
You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database.
For example, to connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.wikimedia.cloud) and the alias of the database replica (enwiki_p):
mariadb
--defaults-file
$HOME
/replica.my.cnf
-h
enwiki.analytics.db.svc.wikimedia.cloud
enwiki_p
To connect to the Wikidata cluster:
mariadb
--defaults-file
$HOME
/replica.my.cnf
-h
wikidatawiki.analytics.db.svc.wikimedia.cloud
To connect to Commons cluster:
mariadb
--defaults-file
$HOME
/replica.my.cnf
-h
commonswiki.analytics.db.svc.wikimedia.cloud
To connect to ToolsDB:
mariadb
--defaults-file
$HOME
/replica.my.cnf
-h
tools.db.svc.wikimedia.cloud
Databases
Replica databases
User databases
Toolforge tools have two options for creating databases:
ToolsDB
is a shared MariaDB database server suitable for relatively small amounts of data
Trove databases
support more use cases than ToolsDB, but take more resources (and so require a specific quota request to use)
Query Limits
One can use
max_statement_time
(unit is seconds, it allows decimals):
SET
max_statement_time
300
And all subsequent queries on the same connection will be killed if they run for longer than the given time.
For example:
mariadb
[(
none
)]
SET
max_statement_time
10
Query
OK
rows
affected
00
sec
mariadb
[(
none
)]
SELECT
sleep
20
);
-----------+
sleep
20
-----------+
-----------+
row
in
set
10
00
sec
It works on
Quarry
, too!
You can also set limits with a single SQL query. For example:
SET
STATEMENT
max_statement_time
300
FOR
SELECT
COUNT
rev_id
FROM
revision_userindex
INNER
JOIN
actor
ON
rev_actor
actor_id
WHERE
actor_name
'Jimbo Wales'
Code samples for common languages
Copied with edits from
mw:Toolserver:Database access#Program access
(not all tested, use with caution!)
In most programming languages, it will be sufficient to tell MariaDB to use the database credentials found in
$HOME/.my.cnf
assuming that you have created a symlink from
$HOME/.my.cnf
to
$HOME/replica.my.cnf
Below are various examples in a few common programming languages.
Bash
--
/dev/null
date
echo
/* Bash/SQL compatible test structure
* Run time: ?
*/
SELECT 1
;-- '
mariadb
-ch
tools.db.svc.wikimedia.cloud
enwiki_p
~/query_results-enwiki
date
#include
#include
...
char
host
"tools.db.svc.wikimedia.cloud"
MYSQL
conn
mysql_init
NULL
);
mysql_options
conn
MYSQL_READ_DEFAULT_GROUP
"client"
);
if
mysql_real_connect
conn
host
NULL
NULL
NULL
NULL
==
NULL
printf
"Error %u: %s
\n
mysql_errno
conn
),
mysql_error
conn
));
...
Perl
use
User::pwent
use
DBI
my
$database
"enwiki_p"
my
$host
"tools.db.svc.wikimedia.cloud"
my
$dbh
DBI
->
connect
"DBI:mysql:database=$database;host=$host;"
"mysql_read_default_file="
getpwuid
$<
->
dir
"/replica.my.cnf"
undef
undef
or
die
"Error: $DBI::err, $DBI::errstr"
Python
Without installing the toolforge library, this will work:
import
configparser
import
pathlib
import
pymysql
import
pymysql.cursors
replica
pathlib
Path
()
joinpath
"replica.my.cnf"
config
configparser
ConfigParser
()
config
read_string
replica
read_text
())
connection
pymysql
connections
Connection
host
"commonswiki.analytics.db.svc.wikimedia.cloud"
database
"commonswiki_p"
user
config
get
"client"
"user"
),
password
config
get
"client"
"password"
),
cursorclass
pymysql
cursors
DictCursor
with
connection
cursor
()
as
cur
cur
execute
query
# Or something....
connection
close
()
Using
User:Legoktm/toolforge library
, however, is probably the easiest way. This wrapper library supports both Python 3 and legacy Python 2 applications and provides convenience functions for connecting to the Wiki Replica databases.
import
toolforge
conn
toolforge
connect
'enwiki'
# You can also use "enwiki_p"
# conn is a pymysql.connection object.
with
conn
cursor
()
as
cur
cur
execute
query
# Or something....
We used to recommend
oursql
as well, but
as of 2019-02-20
it seems to be abandoned or at least not actively maintained and failing to compile against MariaDB client libraries.
Python: Django
If you are using Django, first install mysqlclient (inside your tool's virtual environment, accessed via a
webservice shell
):
export MYSQLCLIENT_CFLAGS="-I/usr/include/mariadb/"
export MYSQLCLIENT_LDFLAGS="-L/usr/lib/x86_64-linux-gnu/ -lmariadb"
pip install mysqlclient
Then insert the database in the settings.py file as following, with s12345 as your user name:
import
configparser
import
os
os
environ
get
'HOME'
#get environment variable $HOME
replica_path
'/replica.my.cnf'
if
os
path
exists
replica_path
):
#check that the file is found
config
configparser
ConfigParser
()
config
read
replica_path
else
'replica.my.cnf file not found'
DATABASES
'default'
'ENGINE'
'django.db.backends.mysql'
'NAME'
's12345__mydbname'
'USER'
config
'client'
][
'user'
],
#for instance "s12345"
'PASSWORD'
config
'client'
][
'password'
],
'HOST'
'tools.db.svc.wikimedia.cloud'
'PORT'
''
$ts_pw
posix_getpwuid
posix_getuid
());
$ts_mycnf
parse_ini_file
$ts_pw
'dir'
"/replica.my.cnf"
);
$db
new
PDO
"mysql:host=enwiki.analytics.db.svc.wikimedia.cloud;dbname=enwiki_p"
$ts_mycnf
'user'
],
$ts_mycnf
'password'
]);
unset
$ts_mycnf
$ts_pw
);
$q
$db
->
prepare
'select * from page where page_id = :id'
);
$q
->
execute
array
':id'
=>
843020
));
print_r
$q
->
fetchAll
());
?>
$ts_pw
posix_getpwuid
posix_getuid
());
$ts_mycnf
parse_ini_file
$ts_pw
'dir'
"/replica.my.cnf"
);
$mysqli
new
mysqli
'enwiki.analytics.db.svc.wikimedia.cloud'
$ts_mycnf
'user'
],
$ts_mycnf
'password'
],
'enwiki_p'
);
unset
$ts_mycnf
$ts_pw
);
$stmt
$mysqli
->
prepare
'select * from page where page_id = ?'
);
$id
843020
$stmt
->
bind_param
'i'
$id
);
$stmt
->
execute
();
$result
$stmt
->
get_result
();
print_r
$result
->
fetch_all
MYSQLI_BOTH
));
?>
Java
Class
forName
"com.mysql.jdbc.Driver"
).
newInstance
();
Properties
mycnf
new
Properties
();
mycnf
load
new
FileInputStream
System
getProperty
"user.home"
"/replica.my.cnf"
));
String
password
mycnf
getProperty
"password"
);
password
password
substring
((
password
startsWith
"\""
))
password
length
()
((
password
startsWith
"\""
))
));
mycnf
put
"password"
password
);
mycnf
put
"useOldUTF8Behavior"
"true"
);
mycnf
put
"useUnicode"
"true"
);
mycnf
put
"characterEncoding"
"UTF-8"
);
mycnf
put
"connectionCollation"
"utf8_general_ci"
);
String
url
"jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/enwiki_p"
Connection
conn
DriverManager
getConnection
url
mycnf
);
Node.js
mysql2
client provides a promise-based interface.
const
mysql
require
'mysql2/promise'
);
async
function
sample
()
const
connection
await
mysql
createConnection
({
host
'tools.db.svc.wikimedia.cloud'
port
3306
database
's12345__mydbname'
user
's12345'
password
''
});
const
rows
fields
await
connection
execute
'SELECT * FROM table WHERE name = ? AND age > ?'
'Morty'
14
]);
for
let
row
in
rows
console
log
row
);
Credentials
Database credentials are generated on account creation and placed in a file called
replica.my.cnf
in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users. (If the file does not exist,
ask for help
. Do not try to create it yourself: it will not work.)
Symlinking the access file can be practical:
ln
-s
$HOME
/replica.my.cnf
$HOME
/.my.cnf
In addition, for tools using a
custom-built image
, or otherwise running without
shared storage
access, there are two sets of credentials provided through environment variables:
TOOL_TOOLSDB_USER
and
TOOL_TOOLSDB_PASSWORD
are the user and password to connect to ToolsDB.
TOOL_REPLICA_USER
and
TOOL_REPLICA_PASSWORD
are the user and password to connect to the Wiki Replicas.
Even if the credentials are currently the same for the replicas and ToolsDB, we strongly recommend using the variables for each specific service as they might change in the future.
See also
Communication and support
Support and administration of the WMCS resources is provided by the
Wikimedia Foundation Cloud Services team
and
Wikimedia movement volunteers
. Please reach out with questions and join the conversation:
Discuss and receive general support
Stay aware of critical changes and plans
Use a subproject of the
#Cloud-Services
Phabricator
project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
US