SRE/Production access - Wikitech
Jump to content
From Wikitech
SRE
For instructions on accessing public Cloud VPS and Toolforge instances, see
Help:Accessing Cloud VPS instances
Production
(sometimes called
prod
) is the network of servers that run the real, live
Wikimedia websites
. Access to production is necessary for
deploying updates
and other
site reliability engineering
work, as well as for
accessing sensitive data
. This page explains how to request and set up this access.
Remember: production access is extremely sensitive
. With production access, it's possible to break our websites or steal private data about users' activities. If you have access, act carefully and take
the server access responsibilities
seriously. Immediately
contact the SRE team
if you have any doubts about security or if something goes wrong.
List of users and groups
Production shell users, their keys, and their permissions are managed in
modules/admin/data/data.yaml
in the
operations/puppet.git
repository. Here is a quick summary of the major groups:
ops
- Root access to all 2,400 Wikimedia servers.
deployment
- Ability to do
backport
deployments, run maintenance scripts, and read/edit
MariaDB
production databases.
restricted
- Ability to run maintenance scripts and read/edit MariaDB production databases.
analytics-privatedata-users
- Read-only access to private production databases. This group contains several different levels detailed at
Data Platform/Data access#Access Levels
. The highest levels provide read-only access to many databases including MariaDB,
Presto
Spark
, and
Hive
... and many other smaller groups detailed at
modules/admin/data/data.yaml
Some groups (those marked with
require_fido: true
) require that the user's SSH keys are
FIDO/U2F authenticator-backed SSH keys
Careful not to confuse these groups with
LDAP groups
, which are something else. In general, production access groups are for SSHing into servers. LDAP groups are for logging into protected systems in a web browser.
Eligibility
To minimize risk to the sites, only a small number of people outside of the
SRE team
hold any production access, and that access is limited to specific systems and processes. Access is managed through groups rather than people; a person (technically, an ssh-authenticated account) belongs to one or more groups, and each group has its own list of access privileges. All access privileges require a
clear, ongoing need
for the access. If you have a one-time need for data, request the data from the
Data Engineering team
instead.
There are three distinct processes for changing production access:
Change the privileges of an access group
An existing access group, usually with existing members, can be granted additional privileges, to allow members of the group to perform additional work.
a ticket in Phabricator
Use the tag SRE-Access-Requests.
Include the name of an existing group
Include the requested change in access, in as much specific detail (host names, etc) as possible.
Include the reason the change is requested, including the impact if the change is rejected
Add WMF/WMDE Staff to an access group
For WMF and WMDE staff, membership in an access group is at the discretion of their manager, who should request access on behalf of the person as detailed below.
Add a volunteer to an access group
Volunteer access is granted at the discretion of the
SRE team
You must have a
non-disclosure agreement
with the Wikimedia Foundation. Follow
the volunteer NDA process
You must have
support from a relevant Wikimedia Foundation employee
: this should be the employee you will be collaborating with.
Complete the access request process as detailed below.
Access Request Process
Shells!
If you've satisified the eligibility requirements above, follow these steps to submit a request.
Accounts
To follow these instructions, you'll need the following accounts:
Phabricator
account. If you don't have one, see the instructions
for creating an account on mediawiki.org
Wikimedia developer account
. If you don't have one, follow the link.
Signing the agreement
Next, read and sign the
Acknowledgement of Wikimedia Server Access Responsibilities
. Make sure you actually read it; this is a legal agreement and by signing it, you are committing to follow the security practices it describes.
Generating your SSH key
Since production access uses the
Secure Shell protocol
(SSH), you'll have to generate a
new
SSH keypair. Do
not
reuse an existing key; this presents an unacceptable security risk.
GitHub has a
good help page
(note that you can switch between Mac, Windows, and Linux documentation right under the title).
We recommend that you use an ED25519 key. Do
not
use DSA keys as they are insecure and rejected by our SSH servers.
To generate an ED25519 key (recommended), run the following command in your terminal:
ssh-keygen
-t
ed25519
If you can't use an ED25519 key, then you can use a 4096-bit RSA key instead. To generate one, run the following command in your terminal:
ssh-keygen
-t
rsa
-b
4096
-o
The newer
-o
option saves private keys in a slightly more secure format (OpenSSH rather than PEM)
Reminder: the key you use for production access must be different from the key you use for Cloud VPS, so do not paste it into the IDM SSH key management field.
Filing the request
Create a ticket requesting access
In the title, replace "RESOURCE" and "USER" with your name and the resource you need access to. (For new user requests, make a separate ticket for each user.)
Add the following information to the description:
USER's full name
USER's wikitech username
Your
developer access
username (that is, the one you use for Cloud VPS SSH or log into
with. This can be different than your wiki username). We will use this as your production shell username.
The public key from new your SSH keypair.
(See for example
the Gerrit instructions how to copy your public SSH key
.)
Requested group membership. A complete list of groups that USER should be added to. These groups change frequently, so consult the most recent available list where possible.
Analytics data access guidelines
(Analytics enabled Kerberos in December 2019, please check the new sections in the docs if you haven't done yet)
Complete and current list
A detailed reason for your request. In particular, describe which specific servers you need access to and why. We err on the side of giving fewer permissions rather than more, so the more detailed your request, the more likely you are to get all the permissions you need.
Get approvals from the following people as comments to the Phabricator task. The comments should be made directly through the web interface, not via email.
The relevant Wikimedia Foundation/Wikimedia Deutschland employee, as explained above.
The project lead where your access will be granted. (
NOTE: project lead approval is not required for
analytics-privatedata-users
access to
WMF
or
WMDE
staff.
Wait for SRE approval, if needed:
An SRE may ask for you to validate your public SSH key "off band", meaning via a direct communication outside of Phabricator.
If you're requesting the same level of access as the rest of your team already has (e.g. because you've joined the team and you're requesting to be added to the group) then no further approval is necessary; your manager's on-ticket approval (or for non-staff, your WMDE's manager or WMF sponsor's on-ticket approval) is sufficient.
Otherwise, if you request any new level of
sudo
privileges for a
group
(or for yourself individually, outside of your group membership), then your request must have a security review at a biweekly SRE meeting. Sudo access is granted on an extremely limited basis, and will typically apply to the smallest permissions possible (user/process restricted over all). Expect this process to take at least two business weeks.
When your request is approved, you will be asked to provide your full legal name, preferred email address for contact, and physical address to the Wikimedia Foundation Legal team (or your employee contact may forward this information on your behalf). This information will be used to customize a non-disclosure agreement, which you will be asked to read, comprehend, and electrically sign through the Foundation's contract management system. The agreement will be similar to the
Volunteer NDA
The Wikimedia Foundation employee that will be supervising your work will coordinate final sign off by an
Executive level staff of the Wikimedia Foundation
when all other criteria have been met before your access is granted.
Shell access and access to private data are different things. Access to data is granted to volunteers only if they have a formal collaboration with the research team.
If five business days pass without visible progress, please comment on the ticket to request an update, or directly contact the SRE on
Clinic Duty
that week.
Clinic Duty SRE: handling the request
This paragraph is for the SRE on Clinic Duty: See
SRE/Clinic_Duty/Access_requests#Production_shell_access
Setting up your access
Setting up your SSH config
The standard configuration for people not having root access is to have the ssh connection to be established on a bastion and proxy the command to the target host inside the cluster. To do this, add the following to your SSH config file (usually located at
$HOME/.ssh/config
), but change
YOURUSERNAME
to be your shell username on the Wikimedia servers:
# Turn CanonicalizeHostname on, needed for wikimedia.org and wmnet below.
CanonicalizeHostname
yes
# Defaults for all Wikimedia Foundation hosts.
Host
*.wikimedia.org,*.wmnet
ForwardAgent
no
IdentitiesOnly
yes
KbdInteractiveAuthentication
no
PasswordAuthentication
no
User
YOURUSERNAME
# Configure the initial connection to the bastion host, with the one
# HostName closest to you.
Host
bast
HostName
bast1004.wikimedia.org
IdentityFile
~/.ssh/prod.key
# In theory this User line shouldn't be necessary due to the Host block above,
# but in practice it seems to be. In any case, it doesn't hurt.
User
YOURUSERNAME
# Proxy all connections to internal servers through the bastion host.
Host
*.wmnet
*.wikimedia.org
!gerrit.wikimedia.org
!bast*.wikimedia.org
!gitlab.wikimedia.org
ProxyJump
bast
IdentityFile
~/.ssh/prod.key
User
YOURUSERNAME
# Configure direct connection to the bastion hosts.
Host
bast*.wikimedia.org
IdentityFile
~/.ssh/prod.key
User
YOURUSERNAME
Host
gerrit.wikimedia.org
Port
29418
IdentityFile
~/.ssh/cloud.key
Host
gitlab.wikimedia.org
IdentityFile
~/.ssh/cloud.key
Replace
~/.ssh/cloud.key
with your local ssh key for git/gerrit.
Picking your closest bastion server
In the example above you may replace
bast1004.wikimedia.org
with the bastion that is physically closest to you:
bast1004.wikimedia.org
in the
eqiad data center
in Virginia, United States
bast2003.wikimedia.org
in the
codfw data center
in Texas, United States
bast3007.wikimedia.org
in the
esams data center
in Amsterdam, The Netherlands
bast4006.wikimedia.org
in the
ulsfo data center
in San Francisco, United States
bast5004.wikimedia.org
in the
eqsin data center
in Singapore
bast6003.wikimedia.org
in the
drmrs data center
in Marseille, France
bast7002.wikimedia.org
in the
magru data center
in São Paulo, Brazil
Map of
bastion
hosts
codfw
eqiad
esams
ulsfo
eqsin
drmrs
magru
edit
Advanced: operations config
If you will be setting up new servers or doing other administration work, you can use the below advanced configuration instead. Otherwise, skip this section. If you're not sure, you almost certainly don't need this!
Advanced $HOME/.ssh/config for production root users
## Production & External Zones
Host
bast1004.wikimedia.org
bast2003.wikimedia.org
bast3007.wikimedia.org
bast4005.wikimedia.org
bast5004.wikimedia.org
bast6003.wikimedia.org
bast7001.wikimedia.org
restricted.bastion.wmcloud.org
StrictHostKeyChecking
yes
ProxyCommand
none
ControlMaster
auto
IdentitiesOnly
yes
# See https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents#Using_multiple_agents_via_systemd for setting up multiple agents using systemd
Host
*.wikimedia.org
!gerrit.wikimedia.org
!git-ssh.wikimedia.org
!gitlab.wikimedia.org
User
your_username_here
StrictHostKeyChecking
yes
IdentitiesOnly
yes
IdentityAgent
/run/user/
%i/ssh-prod.socket
IdentityFile
~/.ssh/your_production_ssh_key
UserKnownHostsFile
~/.ssh/known_hosts.d/wmf-prod
ProxyCommand
ssh
-a
-W
%h:%p
bast1004.wikimedia.org
## Internal Zones
Host
*.mgmt.eqiad.wmnet
*.mgmt.codfw.wmnet
*.mgmt.ulsfo.wmnet
*.mgmt.esams.wmnet
*.mgmt.eqsin.wmnet
*.mgmt.drmrs.wmnet
*.mgmt.magru.wmnet
User
root
KbdInteractiveAuthentication
yes
StrictHostKeyChecking
no
# See https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents#Using_multiple_agents_via_systemd for setting up multiple agents using systemd
Host
*.wmnet
User
your_username_here
StrictHostKeyChecking
yes
IdentitiesOnly
yes
IdentityAgent
/run/user/
%i/ssh-prod.socket
IdentityFile
~/.ssh/your_production_ssh_key
UserKnownHostsFile
~/.ssh/known_hosts.d/wmf-prod
Host
*.eqiad.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast1004.wikimedia.org
Host
*.codfw.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast2003.wikimedia.org
Host
*.esams.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast3006.wikimedia.org
Host
*.ulsfo.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast4004.wikimedia.org
Host
*.eqsin.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast5003.wikimedia.org
Host
*.drmrs.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast6002.wikimedia.org
Host
*.magru.wmnet
ProxyCommand
ssh
-a
-W
%h:%p
bast7001.wikimedia.org
## Networking Equipment
Host
*-eqiad.wikimedia.org
*-eqord.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast1004.wikimedia.org
Host
*-codfw.wikimedia.org
*-eqdfw.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast2003.wikimedia.org
Host
*-esams.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast3006.wikimedia.org
Host
*-ulsfo.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast4004.wikimedia.org
Host
*-eqsin.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast5003.wikimedia.org
Host
*-drmrs.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast6002.wikimedia.org
Host
*-magru.wikimedia.org
ProxyCommand
ssh
-a
-W
%h:%p
bast7001.wikimedia.org
## Gerrit and Cloud VPS
# See https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents#Using_multiple_agents_via_systemd for setting up multiple agents using systemd
Host
gerrit.wikimedia.org
User
your_username_here
StrictHostKeyChecking
yes
ProxyCommand
none
IdentitiesOnly
yes
IdentityAgent
/run/user/
%i/ssh-cloud.socket
IdentityFile
~/.ssh/your_development_ssh_key
UserKnownHostsFile
~/.ssh/known_hosts.d/wmf-cloud
Port
29418
Host
gitlab.wikimedia.org
User
your_username_here
StrictHostKeyChecking
yes
ProxyCommand
none
IdentitiesOnly
yes
IdentityAgent
/run/user/
%i/ssh-cloud.socket
IdentityFile
~/.ssh/your_development_ssh_key
UserKnownHostsFile
~/.ssh/known_hosts.d/wmf-cloud
# See https://wikitech.wikimedia.org/wiki/Managing_multiple_SSH_agents#Using_multiple_agents_via_systemd for setting up multiple agents using systemd
Host
*.eqiad1.wikimedia.cloud
*.wmcloud.org
User
your_username_here
IdentityFile
~/.ssh/your_development_ssh_key
IdentityAgent
/run/user/
%i/ssh-cloud.socket
StrictHostKeyChecking
no
UserKnownHostsFile
~/.ssh/known_hosts.d/wmf-cloud
ProxyCommand
ssh
-a
-W
%h:%p
restricted.bastion.wmcloud.org
Known host files
To ensure the validity of the hosts you connect to, enable the
StrictHostKeyChecking yes
option and create a local list of known hosts. A
utility script is available
to generate that list and keep it up to date. Read the instructions in the script's header for help on usage. If you need any additional help, contact the script's authors.
Before you can use the script, you'll need to bootstrap this setup with at least one bastion host. Disable strict host key checking, ssh to a bastion, and make sure the fingerprint matches what's listed at
Help:SSH Fingerprints
Security
See also:
Help:SSH Fingerprints
Do
not
use SSH agent forwarding (the
-A
command line option). Agent forwarding does not make it possible to steal your private key itself, but it does make it possible for someone to hijack your SSH agent and thus your identity, so we do not do it. The
-a
option (with a lower case "a")
disables
agent forwarding, and is thus included in the sample configurations above.
Do not use your production cluster SSH key for any other service, including Gerrit or Cloud VPS.
Other tips
Fundraising infrastructure config
Greg Grossmeier's SSH config
Managing multiple SSH agents
(experimental) Bash script to detect the correct bastion and auto-fix SSH config
ssh single letter domain shortcut
(allows you to ssh hostname.e rather than hostname.eqiad.wmnet
Consider using
sshecret
, a wrapper around ssh that ensures you are only using a single key, read the explanation there. Written by
Tyler Cipriani
Debugging
If your production access has been approved but you aren't able to log in, you can ask for help in the Phabricator ticket for your access request. If you got access a long time ago and it's a new problem, you can file a new ticket and tag it with
#sre
Wherever you ask for help, make sure you include your SSH configuration (but not your key itself!) and the output you get when you run your ssh command with the
-v
option (verbose mode).
If you are prompted for a password when attempting to SSH into production, it generally means that your client is misconfigured -- most often you are presenting the wrong public key to the server.
ssh -v
can help you debug this. When debugging, in order to keep things clear, it's best to attempt to connect directly to a bastion host, e.g.
ssh -v bast1002.eqiad.wmnet
If you had not logged in for a while, make sure not to connect to servers which got decommissioned in the meantime. See
Category:Servers
for the list of servers.
See also
Help:Accessing Cloud VPS instances
for instructions on accessing Cloud VPS and Toolforge instances
Help:SSH Fingerprints
for fingerprints of ssh bastion servers
Proxy access to cluster
for direct web access to production servers behind the firewall
Yubikey-SSH
and
Yubikey4 and gpg-agent
for instructions on using a YubiKey device to manage your ssh key
Managing multiple SSH agents
for help configuring separate ssh-agent instances for different security realms
Fundraising/tech/ssh config
for help configuring ssh for access to hosts in the
frack
environment
Notes
The form automatically adds the ticket to the
SRE-Access-Requests
project so the SRE team will see your request.
You can also put your public key on your wiki user page, in a Phabricator paste, or in a Gerrit patchset you upload, but you can't include it in an email reply to the task.
This protects against
email spoofing
Retrieved from "
Categories
How-To
Operations policies
SRE/Production access
Add topic