⚓ T308738 Ensure that all tables have all the necessary indexes
Page Menu
Phabricator
Create Task
Maniphest
T308738
Ensure that all tables have all the necessary indexes
Closed, Resolved
Public
3 Estimated Story Points
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
Daimona
Authored By
Daimona
May 19 2022, 10:29 AM
2022-05-19 10:29:03 (UTC+0)
Tags
CampaignEvents
(Backlog)
Campaign-Registration
(Darkship)
Data-Persistence (work done)
Connection-Team (Connection-Current-Sprint)
(Done 🏁)
MW-1.39-notes (1.39.0-wmf.28; 2022-09-05)
Referenced Files
None
Subscribers
Aklapper
cmelo
Daimona
Ladsgroup
ldelench_wmf
vaughnwalters
Description
AC
Tables should have all the indexes recommended by Amir:
T308738#8103421
Those indexes should also be created (manually) on beta
NOTE:
Since we're not doing this in a backwards-compatible way, you will have to create the index manually on your local once the patch is merged
Details
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Add indexes to the schema
mediawiki/extensions/CampaignEvents
master
+62
-2
Customize query in gerrit
Related Objects
Mentions
Mentioned In
T350947: Performance review of Extension:WikimediaCampaignEvents
T318595: Create database schema for the CampaignEvents extension on testwiki, test2wiki, and officewiki
T318593: Review database schema of the CampaignEvents extension
T302858: Performance review of CampaignEvents extension
T316943: Request access to deployment-prep
T316765: investigation: preliminary performance assessment
rUCAM14a487e04cd0: Add indexes to the schema
Mentioned Here
T311126: Implement strategy to account for DST variations
T312870: Add timestamps to ce_organizers table [V1]
T308694: Sorting in TablePager doesn't work for aggregates
T312229: Make the search on Special:MyEvents and Special:AllEvents case-insensitive
Event Timeline
Daimona
created this task.
May 19 2022, 10:29 AM
2022-05-19 10:29:03 (UTC+0)
Restricted Application
added a subscriber:
Aklapper
View Herald Transcript
May 19 2022, 10:29 AM
2022-05-19 10:29:03 (UTC+0)
ldelench_wmf
subscribed.
May 26 2022, 4:30 PM
2022-05-26 16:30:32 (UTC+0)
Comment Actions
Discussed: this should come after we have created all our special pages (probably a simple task, but we don't have enough information). Essentially, all V0 features should be in place.
ldelench_wmf
moved this task from
Backlog
to
To be Estimated/Discussed
on the
Connection-Team
board.
Jun 30 2022, 3:48 PM
2022-06-30 15:48:27 (UTC+0)
ldelench_wmf
moved this task from
Backlog
to
V0
on the
Campaign-Registration
board.
ldelench_wmf
triaged this task as
High
priority.
Jun 30 2022, 4:31 PM
2022-06-30 16:31:36 (UTC+0)
ldelench_wmf
set the point value for this task to
ldelench_wmf
edited projects, added
Connection-Team (Campaign-Tools-Sprint-16)
; removed
Connection-Team
cmelo
subscribed.
Jul 15 2022, 5:45 PM
2022-07-15 17:45:21 (UTC+0)
Comment Actions
I think the indexes we already have are fine, I just would add more 3:
event_meeting_country_and_meeting_type
Would be used when querying events by country and meeting type (Can be useful for the event calendar in the future).
event_meeting_country
Would be used when querying events by country (Can be useful for the event calendar in the future).
event_status
Used when querying events by status.
cmelo
changed the task status from
Open
to
In Progress
Jul 15 2022, 5:45 PM
2022-07-15 17:45:55 (UTC+0)
cmelo
claimed this task.
cmelo
moved this task from
Ready 🎬
to
In Progress 💻
on the
Connection-Team (Campaign-Tools-Sprint-16)
board.
cmelo
moved this task from
In Progress 💻
to
Review/Feedback 💬
on the
Connection-Team (Campaign-Tools-Sprint-16)
board.
Jul 15 2022, 5:56 PM
2022-07-15 17:56:22 (UTC+0)
Daimona
added a comment.
Jul 16 2022, 12:16 AM
2022-07-16 00:16:03 (UTC+0)
Comment Actions
In
T308738#8081199
@cmelo
wrote:
I think the indexes we already have are fine, I just would add more 3:
event_meeting_country_and_meeting_type
Would be used when querying events by country and meeting type (Can be useful for the event calendar in the future).
event_meeting_country
Would be used when querying events by country (Can be useful for the event calendar in the future).
The thing about the country is that we're likely to change the schema anyway for V1, so I'm not sure if we need to account for a future use case now. Also, the second index would be redundant because the country is already the leftmost column of the first index.
event_status
Used when querying events by status.
Slightly unsure about this one, the column can only have 2 values right now (and even in future versions its cardinality is going to remain very low), so I'm not sure if an index would help much.
I'm creating a prod-like set of tables with thousands of events locally and will run some queries on it to see what indexes would make a difference to the optimizer. I'll work on it in the weekend and I expect it'll take some time to come back with the results.
Ladsgroup
subscribed.
Jul 16 2022, 3:00 PM
2022-07-16 15:00:25 (UTC+0)
Daimona
added a comment.
Jul 18 2022, 4:37 PM
2022-07-18 16:37:13 (UTC+0)
Comment Actions
I played with the dataset a little bit, and here are my conclusions. This is not an extensive analysis, but I felt like it wasn't worth investing more time into it. I also think we shouldn't consider this a blocker for the beta deployment for the following reasons:
We don't expect huge datasets on beta, especially not for the first days after the deployment.
Schema updates will have to be run manually anyway because we don't use the local wiki database, so we can still ignore backwards compatibility-related concerns (e.g., writing actual schema change files). This is the reason why I wanted to do this before the deployment, before realizing that it's actually a non-concern.
However, we should still address this as soon as possible.
@cmelo
, I asked
@Ladsgroup
if he could give us his recommendations on the schema, and he kindly agreed to meet with us in the upcoming two weeks. Amir, below you can find information about the schema, the queries, and the questions I have. This comment is long, I know it and I apologize for it. I tried to divide it in multiple sections, so that you can skip whatever you believe is not relevant to you. Thanks in advance for your help and for reading this wall of text.
Current schema
See the
db_patches directory
. See
autogenerated schema for MySQL
. The purpose of the tables is fairly self-explanatory:
campaign_events
contains information about an event (name, event page, meeting links etc.). Note that currently we can only have an event per page, and this won't be changed anytime soon if at all.
ce_participants
contains a row for each (event,user) pair.
cep_event_id
references
campaign_events.event_id
(PK)
ce_organizers
contains a row for each (event,user,role) 3-uple. For now there's only a single role, so you can ignore that part, but in the future the same user could have more than one role in organizing a given event.
ceo_event_id
references
campaign_events.event_id
(PK)
We currently don't have many indexes, except for primary keys and some uniqueness constraints.
Test dataset
As I wrote in
T308738#8081669
, I wanted to create a large dataset locally with production-like data so that I could play with it and see which changes were worth implementing. If you want the code that I used to generate the data, please ask for it, but note that I'll have to rewrite it first because I made some mistakes that had to be later accounted for in the code. Here are the relevant specifications given the simple use cases we have so far:
The campaign_events table has a total of 102000 event rows
The ce_organizers table has one row for each event (so 102000 rows), and I tried a few different distributions:
Only two different ceo_user_id, divided as 60/40
An average of 6 events by organizer, generated with
ceo_user_id = mod(ceo_id, 20000)
For the ce_participants table, I added an average of ~20 participants for each event, using an N(20, 100) distribution, rounded and capped at 0.
Schema access in the code
Each table is managed by (almost) a single class:
campaign_events
EventStore.php
ce_participants
ParticipantsStore.php
ce_organizers
OrganizersStore.php
I said "almost" because we also have
EventsPager.php
which queries all tables directly. The (current) purpose of EventsPager is to list the events organized by a given user. Also, EventStore needs to query
ce_participants
and
ce_organizers
as well.
Current queries
This is a list of queries for the current version of the extension. We will definitely add more in the future, but those should not be optimized now.
Get an event by its unique ID. This is a simple PK lookup and can be ignored.
Get an event by page:
SELECT
FROM
campaign_events
WHERE
event_page_namespace
AND
event_page_title
AND
event_page_wiki
Get all the events organized by a given user:
SELECT
FROM
campaign_events
JOIN
ce_organizers
ON
event_id
ceo_event_id
WHERE
ceo_user_id
Get all events that a user is actively participating in:
SELECT
FROM
campaign_events
JOIN
ce_participants
ON
event_id
cep_event_id
WHERE
cep_user_id
AND
cep_unregistered_at
IS
NULL
Add a participant to an event. This can either insert a new row for a new participant (no need to optimize), or update
cep_unregistered_at
or
cep_registered_at
for an existing but soft-deleted record. This is done via IDatabase::upsert, so the actual query will vary depending on the DBMS. For MySQL, this is:
INSERT
INTO
ce_participants
cep_event_id
cep_user_id
cep_registered_at
cep_unregistered_at
VALUES
NULL
ON
DUPLICATE
KEY
UPDATE
cep_unregistered_at
NULL
cep_registered_at
Remove one or more participants from an event:
-- Single participant
UPDATE
ce_participants
SET
cep_unregistered_at
WHERE
cep_event_id
AND
cep_unregistered_at
IS
NULL
AND
cep_user_id
-- Multiple participants
UPDATE
ce_participants
SET
cep_unregistered_at
WHERE
cep_event_id
AND
cep_unregistered_at
IS
NULL
AND
cep_user_id
IN
x_1
...,
x_n
Get the participants of an event; supports basic pagination, parts in square brackets are optional:
SELECT
cep_id
cep_user_id
cep_registered_at
FROM
ce_participants
WHERE
cep_event_id
AND
cep_unregistered_at
IS
NULL
AND
cep_id
ORDER
BY
cep_id
LIMIT
Check whether a user participates in an event
SELECT
FROM
ce_participants
WHERE
cep_event_id
AND
cep_user_id
AND
cep_unregistered_at
IS
NULL
LIMIT
Count participants of an event
SELECT
COUNT
FROM
ce_participants
WHERE
cep_event_id
AND
cep_unregistered_at
IS
NULL
LIMIT
Get the organizers of an event; normally used with LIMIT:
SELECT
ceo_user_id
ceo_role_id
FROM
ce_organizers
WHERE
ceo_event_id
LIMIT
Check whether a user organizes an event
SELECT
FROM
ce_organizers
WHERE
ceo_event_id
AND
ceo_user_id
LIMIT
Count organizers of an event
SELECT
COUNT
FROM
ce_organizers
WHERE
ceo_event_id
LIMIT
Adding a new organizer is a simple insert and can be ignored.
Then we have EventsPager, used on Special:MyEvents to display a list of event that you organize. The events are sortable by start date, name, and number of participants. You can also filter them by status. The basic query is:
SELECT
event_id
event_name
event_page_namespace
event_page_title
event_page_prefixedtext
event_page_wiki
event_status
event_start
event_meeting_type
num_participants
FROM
SELECT
event_id
event_name
event_page_namespace
event_page_title
event_page_prefixedtext
event_page_wiki
event_status
event_start
event_meeting_type
COUNT
cep_id
AS
`num_participants`
FROM
`campaign_events`
LEFT
JOIN
`ce_participants`
ON
((
event_id
cep_event_id
))
LEFT
JOIN
`ce_organizers`
ON
((
event_id
ceo_event_id
))
WHERE
event_deleted_at
IS
NULL
AND
cep_unregistered_at
IS
NULL
AND
ceo_user_id
GROUP
BY
cep_event_id
event_id
event_name
event_page_namespace
event_page_title
event_page_prefixedtext
event_page_wiki
event_status
event_start
event_meeting_type
`tmp`
ORDER
BY
LIMIT
We use a temporary table because otherwise we could not sort by the number of participants, due to limitations in the core pager logic and for postgres support (see
T308694
). Also, note that the query above is valid under ONLY_FULL_GROUP_BY in MySQL < 5.7.5, where it still didn't detect functional dependency, which explains why all fields are appearing in the GROUP BY even if they're redundant.
The list can be filtered by name and status; in both cases we change the WHERE condition of the inner query;
Filtering by event name is done with
event_name LIKE '%abc%'
. I know, this sucks and cannot use any indexes. But it will be improved as part of
T312229
Filtering by event status is a simple
event_status = X
. Currently, the possible values are 1 and 2.
The possible orderings are:
By start date, default:
ORDER BY event_start,event_name,event_id
By name:
ORDER BY event_name,event_start,event_id
By number of participants:
ORDER BY num_participants,event_start,event_id
Potential performance issues
Below I'm listing the potential issues I noticed with my questions.
@Ladsgroup
I'd appreciate if you could answer those.
The first issue I noticed is when selecting events organized by a given user (either as a simple list, or the paginated version for Special:MyEvents). The current index is
ceo_event_id, ceo_user_id, ceo_role_id
, so it cannot be used for the JOIN (because
ceo_user_id
is needed first). As such, the whole campaign_events table is scanned, and then filtered later. This could be resolved by indexing the
ceo_user_id
column to result in a better query plan where the join condition uses the index and cuts down the number of rows. However, I think we will still need both indexes, because
ceo_event_id
can be used without
ceo_user_id
when retrieving the count/list of organizers of a specific event.
Getting all the events that a user participates in shows similar issues and would benefit from having an index on the user ID. Again though, we also need
cep_event_id
to be indexed, as it's used e.g when counting participants.
The "get the participants of an event" one seems good, but it filesorts because the PK index is not used (it uses the one on event+user instead). I'm not sure if it's worth adding the ID to a composite index, since for now only low LIMITs can be used.
Now for the pager one. First of all, this query would also benefit from the index on
ceo_user_id
proposed above. Then, the inner query has
Using temporary; Using filesort
for ce_organizers. I'm sure this is caused by the GROUP BY but I'm not sure how to fix that. The outer query (on the derived table) also filesorts, but I'm not sure if we can avoid that. Overall, this query is noticeably slow if an organizer has many events (which is maybe not very realistic, but still worth noting). I don't think adding indexes to
campaign_events
for the where condition would make any difference, because ce_organizer is scanned first and only the corresponding rows in campaign_events are returned. Also, I was wondering at
T308738#8081669
if an index on
event_status
would help at all, since the cardinality is just 2. As for indexes on the ORDER BY columns: since it's the derived table which is ordered, I'm not sure if they would help. I think MySQL should be able to use indexes from inner tables in some situations, but this does not seem to be the case here.
As a final doubt/question: the current schema forces us to use joins + group by + temp tables in the pager query for the number of participants. I was wondering if it would make sense to denormalize the value, adding a
campaign_events.event_num_participants
field that is updated whenever the ce_participants table is updated. This would greatly simplify the query, but I'm not sure if it's really worth doing.
Ladsgroup
added a comment.
Jul 19 2022, 11:11 AM
2022-07-19 11:11:59 (UTC+0)
Comment Actions
Adding indexes requires not just the schema and queries but also data and sometimes optimizer bugs and so on. At some point it basically turns into an art than actual engineering/architecture work but I doubt we would get there.
We also have the space/speed tradeoff. We are rather okay in space in x1 so we can err on side of more indexes.
Here is rule of thumb:
This should be the order of importance for index (and the ones that must get on the leftmost part of the index):
columns used to join
columns used for ordering. This is important in our infra as I've seen mariadb basically throwing away everything and sticking to the index that's using order by columns
columns used for conditions
columns that have higher selectivity take precedence. e.g. Age takes precedence over gender.
High Performance MySQL says you should swap the columns order in some special cases. When almost exclusively in all select queries you have condition on that column. E.g. gender can take precedence if you always condition it (vs. you might condition on age in some queries)
columns that are used in select fields.
It's called "covering index" and speeds up the select but it's more of a nice to have (unless the pressure on read is quite high which is not the case here)
With these notes, let me take a look at your schema and queries. I saw you already have index on ceo_event_id which is great.
Ladsgroup
added a comment.
Edited
Jul 19 2022, 11:33 AM
2022-07-19 11:33:12 (UTC+0)
Comment Actions
ce_organizers doesn't need more indexes for performance. it might be a good idea to add a unique index on ceo_event_id, ceo_user_id to avoid one user having multiple rows. I leave that to you.
ce_participants needs index on cep_event_id, cep_unregistered_at for the count measurement.
I'm generally not a big fan of upsert. It can easily start wasting auto increment values in updates. I suggest doing a query first and if it exists, just going with update. If not, then upsert (to handle rare race conditions).
I'm looking at campaign_events and its queries now.
Ladsgroup
added a project:
Data-Persistence (work done)
Jul 19 2022, 11:33 AM
2022-07-19 11:33:26 (UTC+0)
Ladsgroup
added a comment.
Jul 19 2022, 12:13 PM
2022-07-19 12:13:33 (UTC+0)
Comment Actions
On joining to campaign_events table, you definitely need to add these indexes (I know it's opposite of the join column but join order is important, these will be the first tables in the join order so they should have the join on where condition):
ceo_user_id, ceo_event_id on ce_organizers
cep_user_id, cep_unregistered_at, cep_event_id on ce_participants
Add index on event_id, event_deleted_at on campaign_events and I think you'd be good to go.
ldelench_wmf
edited projects, added
Connection-Team (Campaign-Tools-Sprint-17)
; removed
Connection-Team (Campaign-Tools-Sprint-16)
Jul 20 2022, 1:28 AM
2022-07-20 01:28:56 (UTC+0)
ldelench_wmf
moved this task from
Ready 🎬
to
Review/Feedback 💬
on the
Connection-Team (Campaign-Tools-Sprint-17)
board.
Jul 20 2022, 1:33 AM
2022-07-20 01:33:59 (UTC+0)
Daimona
added a comment.
Jul 26 2022, 12:18 AM
2022-07-26 00:18:22 (UTC+0)
Comment Actions
Amir, thanks for your detailed replies. Sorry if I didn't get back to you sooner, but with the release happening last week I had so many things on my mind.
In
T308738#8087224
@Ladsgroup
wrote:
ce_organizers doesn't need more indexes for performance. it might be a good idea to add a unique index on ceo_event_id, ceo_user_id to avoid one user having multiple rows. I leave that to you.
Right now we already have a unique index on
ceo_event_id, ceo_user_id, ceo_role_id
(in the future, each user could have multiple roles for the same event).
I'm generally not a big fan of upsert. It can easily start wasting auto increment values in updates. I suggest doing a query first and if it exists, just going with update. If not, then upsert (to handle rare race conditions).
Yup, we're already doing that, see
here
. Now that I think about it, this is also something I wanted to ask you: right now we're doing that in an atomic section + a FOR UPDATE lock. I think the lock should be enough and the atomic section wouldn't be necessary though, am I right?
Summing up, here are the final indexes to have (including those that already exist) if I understood your comments:
-- On campaign_events
PRIMARY
KEY
event_id
),
UNIQUE
INDEX
event_page
event_page_wiki
event_page_namespace
event_page_title
),
INDEX
event_id_deleted
event_id
event_deleted_at
-- On ce_organizers
PRIMARY
KEY
ceo_id
),
UNIQUE
INDEX
ceo_event_user_role
ceo_event_id
ceo_user_id
ceo_role_id
),
-- Used when retrieving organizers for a single event
INDEX
ceo_user_event
ceo_user_id
ceo_event_id
-- Used for JOINs
-- On ce_participants
PRIMARY
KEY
cep_id
),
UNIQUE
INDEX
cep_event_participant
cep_event_id
cep_user_id
),
-- Uniqueness constraint
INDEX
cep_event_unregistered
cep_event_id
cep_unregistered_at
),
-- For COUNT
INDEX
cep_user_unregistered_event
cep_user_id
cep_unregistered_at
cep_event_id
-- For JOIN
Is this correct? I'm also wondering if we could avoid one of those indexes on
ce_participants
Thank you again!
Ladsgroup
added a comment.
Jul 29 2022, 5:24 PM
2022-07-29 17:24:09 (UTC+0)
Comment Actions
Sorry it took me a bit to respond, I was out sick (and half-time for a while due to sickness)
In
T308738#8103421
@Daimona
wrote:
Yup, we're already doing that, see
here
. Now that I think about it, this is also something I wanted to ask you: right now we're doing that in an atomic section + a FOR UPDATE lock. I think the lock should be enough and the atomic section wouldn't be necessary though, am I right?
I doubt we would ever hit race conditions like that. Still you can do a read with a replica for getting the id (to avoid insert) and then do a read with FOR UPDATE if needed. The thing is that specially if you select ranges or if the id doesn't exist, it might lock way more rows (e.g. gap lock) than you intend to. So I'm not sure you'd FOR UPDATE first, try replica (which would catch most cases I assume), then master with FOR UPDATE
Summing up, here are the final indexes to have (including those that already exist) if I understood your comments:
-- On campaign_events
PRIMARY
KEY
event_id
),
UNIQUE
INDEX
event_page
event_page_wiki
event_page_namespace
event_page_title
),
INDEX
event_id_deleted
event_id
event_deleted_at
-- On ce_organizers
PRIMARY
KEY
ceo_id
),
UNIQUE
INDEX
ceo_event_user_role
ceo_event_id
ceo_user_id
ceo_role_id
),
-- Used when retrieving organizers for a single event
INDEX
ceo_user_event
ceo_user_id
ceo_event_id
-- Used for JOINs
-- On ce_participants
PRIMARY
KEY
cep_id
),
UNIQUE
INDEX
cep_event_participant
cep_event_id
cep_user_id
),
-- Uniqueness constraint
INDEX
cep_event_unregistered
cep_event_id
cep_unregistered_at
),
-- For COUNT
INDEX
cep_user_unregistered_event
cep_user_id
cep_unregistered_at
cep_event_id
-- For JOIN
Is this correct? I'm also wondering if we could avoid one of those indexes on
ce_participants
Yes, it looks good. If you are worried about storage (which I'm not much given that it's going to be on x1), you can avoid adding
cep_event_unregistered
and for counts, it would scan all rows of an event which should be just slightly more than the rows it would scan with the index.
HTH
ldelench_wmf
edited projects, added
Connection-Team (Connection-Current-Sprint)
; removed
Connection-Team (Campaign-Tools-Sprint-17)
Aug 14 2022, 9:30 PM
2022-08-14 21:30:07 (UTC+0)
ldelench_wmf
moved this task from
Upcoming / refining 💡
to
Code Review 💬
on the
Connection-Team (Connection-Current-Sprint)
board.
Aug 14 2022, 9:37 PM
2022-08-14 21:37:11 (UTC+0)
Daimona
moved this task from
Code Review 💬
to
Upcoming / refining 💡
on the
Connection-Team (Connection-Current-Sprint)
board.
Aug 15 2022, 11:52 AM
2022-08-15 11:52:28 (UTC+0)
Comment Actions
Thanks Amir, this definitely helps!
@cmelo
Would you like to implement the suggestions above, or should I?
vyuen
moved this task from
V0
to
Darkship
on the
Campaign-Registration
board.
Aug 24 2022, 11:27 AM
2022-08-24 11:27:52 (UTC+0)
vyuen
reassigned this task from
cmelo
to
Daimona
Aug 25 2022, 1:15 PM
2022-08-25 13:15:00 (UTC+0)
vyuen
changed the task status from
In Progress
to
Open
Aug 25 2022, 5:04 PM
2022-08-25 17:04:57 (UTC+0)
Daimona
updated the task description.
(Show Details)
Aug 26 2022, 2:07 PM
2022-08-26 14:07:26 (UTC+0)
gerritbot
added a comment.
Aug 26 2022, 2:30 PM
2022-08-26 14:30:33 (UTC+0)
Comment Actions
Change 826877 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):
[mediawiki/extensions/CampaignEvents@master] Add indexes to the schema
gerritbot
added a project:
Patch-For-Review
Aug 26 2022, 2:30 PM
2022-08-26 14:30:34 (UTC+0)
Daimona
moved this task from
Upcoming / refining 💡
to
Code Review 💬
on the
Connection-Team (Connection-Current-Sprint)
board.
Aug 26 2022, 2:30 PM
2022-08-26 14:30:45 (UTC+0)
Daimona
added a comment.
Aug 26 2022, 3:01 PM
2022-08-26 15:01:22 (UTC+0)
Comment Actions
I've implemented the suggestion above, and can confirm that the queries are really fast on the test dataset.
vyuen
changed the task status from
Open
to
In Progress
Aug 30 2022, 12:13 PM
2022-08-30 12:13:09 (UTC+0)
gerritbot
added a comment.
Aug 31 2022, 3:54 AM
2022-08-31 03:54:03 (UTC+0)
Comment Actions
Change 826877
merged
by jenkins-bot:
[mediawiki/extensions/CampaignEvents@master] Add indexes to the schema
Daimona
mentioned this in
rUCAM14a487e04cd0: Add indexes to the schema
Aug 31 2022, 3:54 AM
2022-08-31 03:54:38 (UTC+0)
ReleaseTaggerBot
added a project:
MW-1.39-notes (1.39.0-wmf.28; 2022-09-05)
Aug 31 2022, 4:00 AM
2022-08-31 04:00:36 (UTC+0)
Maintenance_bot
removed a project:
Patch-For-Review
Aug 31 2022, 4:30 AM
2022-08-31 04:30:18 (UTC+0)
Daimona
added a comment.
Aug 31 2022, 4:13 PM
2022-08-31 16:13:06 (UTC+0)
Comment Actions
Commands that I will run on beta:
ALTER
TABLE
campaign_events
ADD
INDEX
event_id_deleted
event_id
event_deleted_at
);
ALTER
TABLE
ce_participants
ADD
INDEX
cep_event_unregistered
cep_event_id
cep_unregistered_at
);
ALTER
TABLE
ce_participants
ADD
INDEX
cep_user_unregistered_event
cep_user_id
cep_unregistered_at
cep_event_id
);
ALTER
TABLE
ce_organizers
ADD
INDEX
ceo_user_event
ceo_user_id
ceo_event_id
);
Stashbot
added a comment.
Aug 31 2022, 4:39 PM
2022-08-31 16:39:07 (UTC+0)
Comment Actions
Mentioned in SAL (#wikimedia-releng)
[2022-08-31T16:39:06Z] Applying schema change to the wikishared DB on beta for the CampaignEvents extension #
T308738
Daimona
moved this task from
Code Review 💬
to
QA 🐛
on the
Connection-Team (Connection-Current-Sprint)
board.
Aug 31 2022, 4:42 PM
2022-08-31 16:42:05 (UTC+0)
Daimona
mentioned this in
T316765: investigation: preliminary performance assessment
Aug 31 2022, 5:23 PM
2022-08-31 17:23:28 (UTC+0)
vaughnwalters
mentioned this in
T316943: Request access to deployment-prep
Sep 2 2022, 3:45 PM
2022-09-02 15:45:06 (UTC+0)
vyuen
mentioned this in
T302858: Performance review of CampaignEvents extension
Sep 6 2022, 1:03 PM
2022-09-06 13:03:27 (UTC+0)
Daimona
added a comment.
Sep 8 2022, 4:10 PM
2022-09-08 16:10:24 (UTC+0)
Comment Actions
See notes at
T312870#8222116
vaughnwalters
subscribed.
Edited
Sep 9 2022, 6:29 PM
2022-09-09 18:29:23 (UTC+0)
Comment Actions
Index recommendations:
-- On campaign_events
PRIMARY
KEY
event_id
),
UNIQUE
INDEX
event_page
event_page_wiki
event_page_namespace
event_page_title
),
INDEX
event_id_deleted
event_id
event_deleted_at
SHOW INDEX FROM campaign_events;
MariaDB [wikishared]> SHOW INDEX FROM campaign_events;
+-----------------+------------+-------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| campaign_events | 0 | PRIMARY | 1 | event_id | A | 114 | NULL | NULL | | BTREE | | |
| campaign_events | 0 | event_page | 1 | event_page_wiki | A | 57 | NULL | NULL | | BTREE | | |
| campaign_events | 0 | event_page | 2 | event_page_namespace | A | 57 | NULL | NULL | | BTREE | | |
| campaign_events | 0 | event_page | 3 | event_page_title | A | 114 | NULL | NULL | | BTREE | | |
| campaign_events | 1 | event_id_deleted | 1 | event_id | A | 114 | NULL | NULL | | BTREE | | |
| campaign_events | 1 | event_id_deleted | 2 | event_deleted_at | A | 114 | NULL | NULL | YES | BTREE | | |
| campaign_events | 1 | event_timezone_id | 1 | event_timezone | A | 2 | NULL | NULL | | BTREE | | |
| campaign_events | 1 | event_timezone_id | 2 | event_id | A | 114 | NULL | NULL | | BTREE | | |
+-----------------+------------+-------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Index recommendations:
-- On ce_organizers
PRIMARY
KEY
ceo_id
),
UNIQUE
INDEX
ceo_event_user_role
ceo_event_id
ceo_user_id
ceo_role_id
),
-- Used when retrieving organizers for a single event
INDEX
ceo_user_event
ceo_user_id
ceo_event_id
-- Used for JOINs
SHOW INDEX FROM ce_organizers;
MariaDB [wikishared]> SHOW INDEX from ce_organizers;
+---------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ce_organizers | 0 | PRIMARY | 1 | ceo_id | A | 133 | NULL | NULL | | BTREE | | |
| ce_organizers | 0 | ceo_event_user_role | 1 | ceo_event_id | A | 133 | NULL | NULL | | BTREE | | |
| ce_organizers | 0 | ceo_event_user_role | 2 | ceo_user_id | A | 133 | NULL | NULL | | BTREE | | |
| ce_organizers | 0 | ceo_event_user_role | 3 | ceo_role_id | A | 133 | NULL | NULL | | BTREE | | |
| ce_organizers | 1 | ceo_user_event | 1 | ceo_user_id | A | 133 | NULL | NULL | | BTREE | | |
| ce_organizers | 1 | ceo_user_event | 2 | ceo_event_id | A | 133 | NULL | NULL | | BTREE | | |
+---------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Index recommendations:
-- On ce_participants
PRIMARY
KEY
cep_id
),
UNIQUE
INDEX
cep_event_participant
cep_event_id
cep_user_id
),
-- Uniqueness constraint
INDEX
cep_event_unregistered
cep_event_id
cep_unregistered_at
),
-- For COUNT
INDEX
cep_user_unregistered_event
cep_user_id
cep_unregistered_at
cep_event_id
-- For JOIN
SHOW INDEX FROM ce_participants;
MariaDB [wikishared]> SHOW INDEX FROM ce_participants;
+-----------------+------------+-----------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+-----------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ce_participants | 0 | PRIMARY | 1 | cep_id | A | 276 | NULL | NULL | | BTREE | | |
| ce_participants | 0 | cep_event_participant | 1 | cep_event_id | A | 276 | NULL | NULL | | BTREE | | |
| ce_participants | 0 | cep_event_participant | 2 | cep_user_id | A | 276 | NULL | NULL | | BTREE | | |
| ce_participants | 1 | cep_event_unregistered | 1 | cep_event_id | A | 276 | NULL | NULL | | BTREE | | |
| ce_participants | 1 | cep_event_unregistered | 2 | cep_unregistered_at | A | 276 | NULL | NULL | YES | BTREE | | |
| ce_participants | 1 | cep_user_unregistered_event | 1 | cep_user_id | A | 276 | NULL | NULL | | BTREE | | |
| ce_participants | 1 | cep_user_unregistered_event | 2 | cep_unregistered_at | A | 276 | NULL | NULL | YES | BTREE | | |
| ce_participants | 1 | cep_user_unregistered_event | 3 | cep_event_id | A | 276 | NULL | NULL | | BTREE | | |
+-----------------+------------+-----------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
AC passes as index recommendations match what are currently in the tables on betacluster. One note though,
campaign_events
table also has the
event_timezone_id
index with columns
event_timezone
and
event_id
which are not mentioned in the index recommendations AC.
@Daimona
is this acceptable and correct? Wanted to check on this before marking this ticket as done.
Daimona
added a comment.
Sep 9 2022, 6:45 PM
2022-09-09 18:45:02 (UTC+0)
Comment Actions
In
T308738#8225467
@vaughnwalters
wrote:
AC passes as index recommendations match what are currently in the tables on betacluster. One note though,
campaign_events
table also has the
event_timezone_id
index with columns
event_timezone
and
event_id
which are not mentioned in the index recommendations AC.
@Daimona
is this acceptable and correct? Wanted to check on this before marking this ticket as done.
Yes, I just added that index for
T311126
vaughnwalters
moved this task from
QA 🐛
to
Done 🏁
on the
Connection-Team (Connection-Current-Sprint)
board.
Sep 9 2022, 6:47 PM
2022-09-09 18:47:06 (UTC+0)
Comment Actions
In
T308738#8225504
@Daimona
wrote:
In
T308738#8225467
@vaughnwalters
wrote:
AC passes as index recommendations match what are currently in the tables on betacluster. One note though,
campaign_events
table also has the
event_timezone_id
index with columns
event_timezone
and
event_id
which are not mentioned in the index recommendations AC.
@Daimona
is this acceptable and correct? Wanted to check on this before marking this ticket as done.
Yes, I just added that index for
T311126
Okay great, marking as done then.
vaughnwalters
closed this task as
Resolved
Sep 9 2022, 6:47 PM
2022-09-09 18:47:30 (UTC+0)
Daimona
mentioned this in
T318593: Review database schema of the CampaignEvents extension
Sep 26 2022, 4:27 PM
2022-09-26 16:27:40 (UTC+0)
Daimona
mentioned this in
T318595: Create database schema for the CampaignEvents extension on testwiki, test2wiki, and officewiki
Sep 26 2022, 4:37 PM
2022-09-26 16:37:19 (UTC+0)
Daimona
mentioned this in
T350947: Performance review of Extension:WikimediaCampaignEvents
Mar 28 2024, 1:13 AM
2024-03-28 01:13:59 (UTC+0)
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits