⚓ T63111 Convert primary key integers and references thereto from int to bigint (unsigned)
Page Menu
Phabricator
Create Task
Maniphest
T63111
Convert primary key integers and references thereto from int to bigint (unsigned)
Open, Low
Public
Feature
Actions
Edit Task
Edit Related Tasks...
Create Subtask
Edit Parent Tasks
Edit Subtasks
Merge Duplicates In
Close As Duplicate
Edit Related Objects...
Edit Commits
Edit Mocks
Mute Notifications
Protect as security issue
Assigned To
None
Authored By
leucosticte
Feb 9 2014, 3:49 PM
2014-02-09 15:49:00 (UTC+0)
Tags
DBA
(Backlog)
Schema-change
(Change)
MediaWiki-General
(Backlog)
MW-1.42-notes (1.42.0-wmf.15; 2024-01-23)
MW-1.43-notes (1.43.0-wmf.4; 2024-05-07)
Referenced Files
None
Subscribers
Bawolff
Bugreporter
DannyS712
jcrespo
Jdforrester-WMF
Ladsgroup
Lucas_Werkmeister_WMDE
View All 14 Subscribers
Description
Some wikis (e.g. Inclupedia) may wish to import Wikipedia's logging, page, recentchange, revision, and user data, using the same primary key values for that imported data as Wikipedia, while also having their own data in those fields, starting at, e.g., log_id, page_id, rc_id, rev_id, and user_id 1 quadrillion for future-proofing purposes. This will require a change of these primary keys to bigint unsigned, since int unsigned maxes out at about 2.7 billion.
These keys are going to hit their maxes eventually anyway on enwiki, although it might take a few decades more. Going to bigint unsigned will use up more storage, though.
Another issue with this is extensive core use of intval() on these primary keys. One solution is to create a wfBigintval() global function that removes any decimal as intval does and removes any non-numeric characters. However, then it will return a string rather than an int; hopefully no one does any mathematical operations on these, or it will be necessary to come up with alternative ways of doing those.
I can change old_id to bigint unsigned as well, if you think that would be helpful. Should we just migrate all primary keys to bigint unsigned? See also bug 60962 (
T62962
), which this will at least partly fix. See also [[mw:Manual:Primary key storage in other fields]].
Version
: 1.23.0
Severity
: enhancement
Details
Reference
bz61111
Related Changes in Gerrit:
Subject
Repo
Branch
Lines +/-
Make rc_id a bigint
mediawiki/core
master
+777
-7
Make cuc_id a bigint
mediawiki/extensions/CheckUser
master
+300
-3
[DEMO] schema: Change all id columns and references to bigint
mediawiki/core
master
+283
-280
Track wikibase repo table auto increment usage above 25%
analytics/wmde/scripts
master
+81
-0
Customize query in gerrit
Related Objects
Search...
Task Graph
Mentions
Status
Subtype
Assigned
Task
Open
Feature
None
T63111
Convert primary key integers and references thereto from int to bigint (unsigned)
Resolved
Ladsgroup
T62962
The primary key of recentchanges (rc_id) table should be unsigned
Resolved
Umherirrender
T77005
Database field user_newtalk.user_id is not the same data type as user.user_id
Open
None
T291332
Alert when auto-increment fields on any MW-related databases reach a threshold
Declined
None
T305901
Reset geo_tags auto_increment value to 1 in dewiki
Resolved
Marostegui
T355609
Make cuc_id a bigint
Resolved
Marostegui
T355739
Switchover s6 master (db2114 -> db2129)
Declined
Marostegui
T364067
Switchover s6 master (db1173 -> db1231)
Resolved
Marostegui
T364299
Make rc_id a bigint
Resolved
Marostegui
T364523
Switchover s6 master (db1173 -> db1231)
Resolved
Marostegui
T364541
Switchover s8 master (db1209 -> db1192)
Resolved
Marostegui
T364703
Switchover s5 master (db2213 -> db2123)
Resolved
Marostegui
T365339
Switchover s8 master (db2161 -> db2165)
Resolved
Marostegui
T365783
Switchover s6 master (db2129 -> db2214)
Resolved
Marostegui
T366552
Switchover s1 master (db2203 -> db2212)
Resolved
Marostegui
T366687
Switchover s2 master (db1222 -> db1162)
Resolved
Marostegui
T367140
Switchover s3 master (db1223 -> db1157)
Resolved
Marostegui
T367145
Switchover s5 master (db2123 -> db2213)
Resolved
Marostegui
T367146
Switchover s5 master (db1230 -> db1183)
Resolved
Marostegui
T367378
Switchover s4 master (db1160 -> db1238)
Resolved
Marostegui
T367857
Switchover s7 master (db1236 -> db1181)
Resolved
BUG REPORT
abi_
T364681
Issues with translatable pages on Wikidata due to revision id overflow
Resolved
Marostegui
T365352
Stop referencing rev_id as signed int in revtag table to counter revision id overflow in wikidatawiki
Open
None
T365355
Maintenance script to regenerate revtag for list of pages
Open
None
T365445
Investigate potential signed int references to rev_id
Resolved
Marostegui
T365465
translate_reviews unsigned and revision big int
Resolved
Marostegui
T367856
Cleanup revision table schema
Resolved
Marostegui
T369020
Switchover s6 master (db1231 -> db1173)
Resolved
Marostegui
T369021
Switchover s6 master (db2129 -> db2214)
Resolved
Marostegui
T369130
Switchover s2 master (db2207 -> db2204)
Resolved
Marostegui
T369339
Switchover s2 master (db1162 -> db1222)
Resolved
Marostegui
T369478
Switchover s5 master (db2213 -> db2123)
Resolved
Marostegui
T369616
Switchover s5 master (db1183 -> db1230)
Resolved
Marostegui
T369691
Switchover s3 master (db2127 -> db2205)
Resolved
Volans
T369882
Switchover s7 master (db2121 -> db2218)
Resolved
Marostegui
T370019
Switchover s3 master (db1157 -> db1223)
Resolved
Marostegui
T370121
Switchover s7 master (db1181 -> db1236)
Resolved
Marostegui
T371205
Switchover s4 master (db2179 -> db2140)
Resolved
Marostegui
T371345
Switchover s1 master (db2203 -> db2212)
Resolved
Marostegui
T372524
Switchover s1 master (db1184 -> db1163)
Resolved
Marostegui
T381993
Switchover s8 master (db1193 -> db1209)
Mentioned In
T381877: Older wikis with GlobalBlocking installed have an incorrect type for gb_id
T297633:
T188730: [C-DIS][SW] Enable statement usage tracking on Commons and Co
T291332: Alert when auto-increment fields on any MW-related databases reach a threshold
T263644: Make wb_changes_dispatch.chd_seen unsigned
T62962: The primary key of recentchanges (rc_id) table should be unsigned
Mentioned Here
T408290: Ratio to Max Value of Auto Increment Dashboard is no longer receiving data
T263644: Make wb_changes_dispatch.chd_seen unsigned
T291829: Drop wb_changes_dispatch table
T192866: Some DjVu files have too much metadata to fit in their database column
T275268: Address "image" table capacity problems by storing pdf/djvu text outside file metadata
T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production)
T301039: Provide a dump of PDF/DjVU metadata
T291332: Alert when auto-increment fields on any MW-related databases reach a threshold
P8198 auto_increment monitoring 2019-03-14
T89737: Make several mediawiki table fields unsigned ints on wmf databases
T62962: The primary key of recentchanges (rc_id) table should be unsigned
Event Timeline
There are a very large number of changes, so older changes are hidden.
Show Older Changes
brooke
added a comment.
Feb 9 2014, 6:14 PM
2014-02-09 18:14:43 (UTC+0)
Comment Actions
Note that PHP's 'int' type is dependent on the underlying architecture's C-language 'int' type.
This means that 'int's and intval() are 32-bit on 32-bit architectures, but are 64-bit on most 64-bit architectures. (Windows may be an exception to this as Windows is weird. If you're using a Windows server, you may wish to consider a nice Linux virtual machine.)
leucosticte
added a comment.
Feb 9 2014, 6:53 PM
2014-02-09 18:53:41 (UTC+0)
Comment Actions
(In reply to comment #3)
Note that PHP's 'int' type is dependent on the underlying architecture's
C-language 'int' type.
This means that 'int's and intval() are 32-bit on 32-bit architectures, but
are
64-bit on most 64-bit architectures. (Windows may be an exception to this as
Windows is weird. If you're using a Windows server, you may wish to consider
nice Linux virtual machine.)
Good point. I'm running 32-bit Ubuntu Linux; it looks like I'll need to install the 64-bit flavor. That seems like a better solution than making such extensive changes to the core, which might introduce some undesired effects.
Bawolff
added a comment.
Feb 9 2014, 11:44 PM
2014-02-09 23:44:08 (UTC+0)
Comment Actions
The biggest two of those primary keys are (The other's are at least an order of magnitude smaller):
MariaDB [enwiki_p]> select max(rc_id) from recentchanges;
+------------+
max(rc_id)
+------------+
636232414
+------------+
1 row in set (0.04 sec)
MariaDB [enwiki_p]> select max(rev_id) from revision;
+-------------+
max(rev_id)
+-------------+
594743381
+-------------+
So in 13 years, we've managed to get up to an rc_id of 636,232,414 or about 30% of possible values (Since rc_id appears to be unsigned for some reason). It doesn't seem like this is something we should really be worrying about. It also doesn't seem like fixing this will really become any harder in the future than it is now. Thus this seems like a solution in search of a problem. (Although perhaps making the default schema have these fields be unsigned might be a good idea just for cleanliness sake)
leucosticte
added a comment.
Feb 10 2014, 12:17 AM
2014-02-10 00:17:38 (UTC+0)
Comment Actions
Yeah, if there weren't a particular use case I had in mind, I would totally say, Leave it as is for now. There are a few different options for Inclupedia, that I can think of. (1) discard the enwiki rev_ids of imported revisions. (2) keep those rev_ids in the mirrorbot table, in case they're needed later. (3) do what I'm doing now, which is import them with the enwiki rev_ids, and use higher rev_ids for Inclupedia-only revisions. (4) add a few fields to the revision table, or add new tables, similar to what you see at [[mw:Extension:MirrorTools#mirror-logging.sql]].
Option 3 seemed the simplest, especially since options 1 and 2 would still have required another field to be added indicating that the revisions were imported (it's pertinent information when people are browsing through). If I go with option 3, then an auto-increment starting point has to be selected, e.g. 3 billion. It might take decades, but there will eventually be a collision. These are wikis whose life spans are projected to be possibly decades or centuries for all we know. For peace of mind, I'd prefer to not set up the equivalent of a Y2K issue.
The next decision is, (a) change the schema for everyone, or (b) change it for just Inclupedia. I don't really mind changing it just for Inclupedia. Part of the point of filing the bug was to figure out which direction people wanted to go in. We can WONTFIX and then I'll just add SQL files to MirrorTools for the necessary changes. This is especially true in light of the fact that intval's behavior can be changed by going to a 64-bit system.
Since I was going to make a schema change anyway, I figured I'd ask, Hey, want to just change it in the core?
leucosticte
added a comment.
Feb 10 2014, 3:15 AM
2014-02-10 03:15:03 (UTC+0)
Comment Actions
See also [[m:Solution in search of a problem]]
leucosticte
added a comment.
Feb 11 2014, 9:06 PM
2014-02-11 21:06:07 (UTC+0)
Comment Actions
Thanks for the feedback; I'm WONTFIXing this. That saves me some work, as it's easier to make a schema change by MediaWiki extension than by MediaWiki core.
LucasWerkmeister
subscribed.
Feb 10 2019, 5:07 PM
2019-02-10 17:07:52 (UTC+0)
Comment Actions
For what it’s worth, on enwiki the
rc_id
has now crossed the halfway point (assuming it stays signed):
MariaDB [enwiki_p]> SELECT MAX(rc_id) FROM recentchanges; SELECT MAX(rev_id) FROM revision;
+------------+
| MAX(rc_id) |
+------------+
| 1128813734 | # 52.6% of 2³¹ - 1
+------------+
1 row in set (0.00 sec)
+-------------+
| MAX(rev_id) |
+-------------+
| 882667340 | # 20.6% of 2³² - 1
+-------------+
1 row in set (0.00 sec)
And after just 6½ years, the Wikidata
rev_id
is about to catch up to the enwiki one:
MariaDB [wikidatawiki_p]> SELECT MAX(rc_id) FROM recentchanges; SELECT MAX(rev_id) FROM revision;
+------------+
| MAX(rc_id) |
+------------+
| 891589879 | # 41.5% of 2³¹ - 1
+------------+
1 row in set (0.00 sec)
+-------------+
| MAX(rev_id) |
+-------------+
| 855257322 | # 20.0% of 2³² - 1
+-------------+
1 row in set (0.01 sec)
(The
rc_id
is less close, but will presumably also overtake enwiki before either of them come close to the size limit.) This still isn’t a problem
now
, but I reckon on Wikidata both of these IDs will exceed
signed int
range in less than 10 years. (Cracking the
unsigned int
of the
revision
table will still take a while longer.)
Krinkle
updated the task description.
(Show Details)
Feb 10 2019, 6:15 PM
2019-02-10 18:15:49 (UTC+0)
Krinkle
removed a subscriber:
wikibugs-l-list
Krinkle
renamed this task from
Change log_id, page_id, rc_id, rev_id, and user_id to bigint unsigned
to
Convert primary key integers and references thereto from int to bigint (unsigned)
Feb 10 2019, 6:31 PM
2019-02-10 18:31:13 (UTC+0)
Krinkle
reopened this task as
Open
Feb 10 2019, 6:42 PM
2019-02-10 18:42:11 (UTC+0)
Krinkle
added a project:
DBA
Krinkle
moved this task from
Untriaged
to
Schema changes
on the
MediaWiki-libs-Rdbms
board.
Krinkle
subscribed.
Comment Actions
Re-opening so as to let the DBAs triage this. One question I wasn't able to answer quickly is: What units and signed-ness do all our current integer fields use in core's default schema, and in WMF production?
In core, most integer fields that point to primary keys are
int unsigned
, e.g.
rc_oldid
rc_logid
page_latest
actor_user
, etc. The only signed fields are or fields that don't relate to a primary key and are meant to support negative values so that seems fine.
What I'm less unsure about is the primary keys themselves. Some them have
AUTO INCREMENT
and also specify
int unsigned
(such as
ipc_rev_id
), but we we also have primary key integers that don't specify whether they're signed or not. What is the default there? Are MySQL/MariaDB smart enough to assume unsigned there? Or, given they do allow manual insertion, perhaps they have to support negatives unless explicitly marked as unsigned?
Looking at the DESCRIBE output in production, it seems to not specify signed-ness for the ones that didn't specify it in the schema, so at least it's not entirely obvious what it expands to.
(enwiki)> DESCRIBE recentchanges
| Field | Type | Null | Key | Default | Extra |
| rc_id | int(8) | NO | PRI | NULL | auto_increment |
(enwiki)> DESCRIBE ip_changes;
| Field | Type | Null | Key | Default | Extra |
| ipc_rev_id | int(10) unsigned | NO | PRI | 0 | |
I also see wildly varying values for the size of the integer fields. I assume these are not hardcoded at that size but some kind of dynamic/automatic "growing feature" of MariaDB? That is, the schema doesn't limit the integer size and presumably MariaDB takes the shortest size that will fit the values as an optimisation for resource usage, and auto grows as needed?
Krinkle
mentioned this in
T62962: The primary key of recentchanges (rc_id) table should be unsigned
Feb 10 2019, 6:52 PM
2019-02-10 18:52:35 (UTC+0)
Krinkle
added a subtask:
T62962: The primary key of recentchanges (rc_id) table should be unsigned
Krinkle
reopened subtask
T62962: The primary key of recentchanges (rc_id) table should be unsigned
as
Open
Marostegui
moved this task from
Triage
to
Backlog
on the
DBA
board.
Feb 11 2019, 6:35 AM
2019-02-11 06:35:37 (UTC+0)
Marostegui
added subscribers:
Reedy
Marostegui
Comment Actions
In
T63111#4941569
@Krinkle
wrote:
Re-opening so as to let the DBAs triage this. One question I wasn't able to answer quickly is: What units and signed-ness do all our current integer fields use in core's default schema, and in WMF production?
We converted a bunch to
unsigned
not long ago at
T89737: Make several mediawiki table fields unsigned ints on wmf databases
(CC
@Reedy
In core, most integer fields that point to primary keys are
int unsigned
, e.g.
rc_oldid
rc_logid
page_latest
actor_user
, etc. The only signed fields are or fields that don't relate to a primary key and are meant to support negative values so that seems fine.
What I'm less unsure about is the primary keys themselves. Some them have
AUTO INCREMENT
and also specify
int unsigned
(such as
ipc_rev_id
), but we we also have primary key integers that don't specify whether they're signed or not. What is the default there? Are MySQL/MariaDB smart enough to assume unsigned there? Or, given they do allow manual insertion, perhaps they have to support negatives unless explicitly marked as unsigned?
If not specified otherwise, it is signed by default as far as I remember. (Also:
Looking at the DESCRIBE output in production, it seems to not specify signed-ness for the ones that didn't specify it in the schema, so at least it's not entirely obvious what it expands to.
(enwiki)> DESCRIBE recentchanges
| Field | Type | Null | Key | Default | Extra |
| rc_id | int(8) | NO | PRI | NULL | auto_increment |
(enwiki)> DESCRIBE ip_changes;
| Field | Type | Null | Key | Default | Extra |
| ipc_rev_id | int(10) unsigned | NO | PRI | 0 | |
I also see wildly varying values for the size of the integer fields. I assume these are not hardcoded at that size but some kind of dynamic/automatic "growing feature" of MariaDB? That is, the schema doesn't limit the integer size and presumably MariaDB takes the shortest size that will fit the values as an optimisation for resource usage, and auto grows as needed?
The value is just a visual thing - which is basically how many characters to display when using the client but storage related it doesn't mean anything. The column will not restrict any value until it reaches the maximum value for an int column (32 bit). Normally it is good not to specify any value just to avoid those misunderstandings.
Krinkle
edited projects, added
Schema-change
MediaWiki-General
; removed
MediaWiki-libs-Rdbms
Jul 18 2019, 8:34 PM
2019-07-18 20:34:15 (UTC+0)
jcrespo
subscribed.
Sep 11 2019, 10:50 AM
2019-09-11 10:50:44 (UTC+0)
Comment Actions
P8198 auto_increment monitoring 2019-03-14
enwiki:
root@db1114[sys]> select * FROM schema_auto_increment_columns LIMIT 4 \G
*************************** 1. row ***************************
table_schema: enwiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(8)
10
is_signed: 1
11
is_unsigned: 0
12
max_value: 2147483647
13
auto_increment: 1137690279
14
auto_increment_ratio: 0.5298
15
*************************** 2. row ***************************
16
table_schema: enwiki
17
table_name: cu_changes
18
column_name: cuc_id
19
data_type: int
20
column_type: int(11)
21
is_signed: 1
22
is_unsigned: 0
23
max_value: 2147483647
24
auto_increment: 854445063
25
auto_increment_ratio: 0.3979
26
*************************** 3. row ***************************
27
table_schema: enwiki
28
table_name: text
29
column_name: old_id
30
data_type: int
31
column_type: int(8) unsigned
32
is_signed: 0
33
is_unsigned: 1
34
max_value: 4294967295
35
auto_increment: 898750098
36
auto_increment_ratio: 0.2093
37
*************************** 4. row ***************************
38
table_schema: enwiki
39
table_name: revision
40
column_name: rev_id
41
data_type: int
42
column_type: int(8) unsigned
43
is_signed: 0
44
is_unsigned: 1
45
max_value: 4294967295
46
auto_increment: 887709732
47
auto_increment_ratio: 0.2067
48
4 rows in set, 36 warnings (0.04 sec)
49
50
wikidatawiki:
51
52
root@db1071.eqiad.wmnet[sys]> select * FROM schema_auto_increment_columns LIMIT 4 \G
53
*************************** 1. row ***************************
54
table_schema: wikidatawiki
55
table_name: recentchanges
56
column_name: rc_id
57
data_type: int
58
column_type: int(11)
59
is_signed: 1
60
is_unsigned: 0
61
max_value: 2147483647
62
auto_increment: 919219099
63
auto_increment_ratio: 0.4280
64
*************************** 2. row ***************************
65
table_schema: wikidatawiki
66
table_name: cu_changes
67
column_name: cuc_id
68
data_type: int
69
column_type: int(11)
70
is_signed: 1
71
is_unsigned: 0
72
max_value: 2147483647
73
auto_increment: 899023427
74
auto_increment_ratio: 0.4186
75
*************************** 3. row ***************************
76
table_schema: wikidatawiki
77
table_name: text
78
column_name: old_id
79
data_type: int
80
column_type: int(10) unsigned
81
is_signed: 0
82
is_unsigned: 1
83
max_value: 4294967295
84
auto_increment: 889703558
85
auto_increment_ratio: 0.2072
86
*************************** 4. row ***************************
87
table_schema: wikidatawiki
88
table_name: revision
89
column_name: rev_id
90
data_type: int
91
column_type: int(10) unsigned
92
is_signed: 0
93
is_unsigned: 1
94
max_value: 4294967295
95
auto_increment: 882728203
96
auto_increment_ratio: 0.2055
97
4 rows in set, 52 warnings (0.37 sec)
was run in march. We probably should have monitoring on this on all hosts. Recentchanges are the first to find the issues. Converting them to unsigned
T62962
would double the limit.
Marostegui
added a comment.
Jan 7 2020, 5:48 PM
2020-01-07 17:48:25 (UTC+0)
Comment Actions
Update of these numbers:
enwiki
root@db1089.eqiad.wmnet[sys]> select * FROM schema_auto_increment_columns LIMIT 4 \G
*************************** 1. row ***************************
table_schema: enwiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(8)
is_signed: 1
is_unsigned: 0
max_value: 2147483647
auto_increment: 1220497142
auto_increment_ratio: 0.5683
*************************** 2. row ***************************
table_schema: enwiki
table_name: cu_changes
column_name: cuc_id
data_type: int
column_type: int(11)
is_signed: 1
is_unsigned: 0
max_value: 2147483647
auto_increment: 908617190
auto_increment_ratio: 0.4231
*************************** 3. row ***************************
table_schema: enwiki
table_name: text
column_name: old_id
data_type: int
column_type: int(8) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 946292759
auto_increment_ratio: 0.2203
*************************** 4. row ***************************
table_schema: enwiki
table_name: revision
column_name: rev_id
data_type: int
column_type: int(8) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 934644421
auto_increment_ratio: 0.2176
4 rows in set, 32 warnings (0.09 sec)
wikidatawiki
*************************** 1. row ***************************
table_schema: wikidatawiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(11)
is_signed: 1
is_unsigned: 0
max_value: 2147483647
auto_increment: 1131254488
auto_increment_ratio: 0.5268
*************************** 2. row ***************************
table_schema: wikidatawiki
table_name: cu_changes
column_name: cuc_id
data_type: int
column_type: int(11)
is_signed: 1
is_unsigned: 0
max_value: 2147483647
auto_increment: 1114807877
auto_increment_ratio: 0.5191
*************************** 3. row ***************************
table_schema: wikidatawiki
table_name: text
column_name: old_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1104265812
auto_increment_ratio: 0.2571
*************************** 4. row ***************************
table_schema: wikidatawiki
table_name: revision
column_name: rev_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1093332597
auto_increment_ratio: 0.2546
4 rows in set, 52 warnings (0.07 sec)
Reedy
moved this task from
Unsorted
to
Change
on the
Schema-change
board.
Feb 24 2020, 6:14 AM
2020-02-24 06:14:26 (UTC+0)
gerritbot
added a comment.
Mar 5 2020, 10:15 AM
2020-03-05 10:15:43 (UTC+0)
Comment Actions
Change 577202 had a related patch set uploaded (by Addshore; owner: Addshore):
[analytics/wmde/scripts@master] Track wikibase repo table auto increment usage above 25%
gerritbot
added a project:
Patch-For-Review
Mar 5 2020, 10:15 AM
2020-03-05 10:15:43 (UTC+0)
Lucas_Werkmeister_WMDE
subscribed.
Edited
Sep 21 2020, 2:42 PM
2020-09-21 14:42:07 (UTC+0)
Comment Actions
Another table to keep an eye on is Wikidata’s
wb_changes
. Its
change_id
is thankfully unsigned, but it’s in the same ballpark as
old_id
and
rev_id
, currently at 1274061080 (23.7% of 2^32 - 1).
Edit: actually, it turns out we have a signed reference to
change_id
in a different table…
Edit 2: That reference is gone now (we made it unsigned in
T263644
and later dropped the whole table in
T291829
). We should still make the
change_id
a BIGINT, but it’s not super urgent.
Lucas_Werkmeister_WMDE
mentioned this in
T263644: Make wb_changes_dispatch.chd_seen unsigned
Sep 23 2020, 2:17 PM
2020-09-23 14:17:45 (UTC+0)
DannyS712
subscribed.
Sep 23 2020, 2:37 PM
2020-09-23 14:37:48 (UTC+0)
Ladsgroup
closed subtask
T62962: The primary key of recentchanges (rc_id) table should be unsigned
as
Resolved
Mar 13 2021, 9:02 PM
2021-03-13 21:02:20 (UTC+0)
LSobanski
raised the priority of this task from
Lowest
to
Low
Apr 26 2021, 10:09 AM
2021-04-26 10:09:33 (UTC+0)
gerritbot
added a comment.
Sep 15 2021, 8:39 AM
2021-09-15 08:39:02 (UTC+0)
Comment Actions
Change 577202
abandoned
by Addshore:
[analytics/wmde/scripts@master] Track wikibase repo table auto increment usage above 25%
Reason:
We do this as a nice % of ids available elsewhere now, thanks Amir!
Maintenance_bot
removed a project:
Patch-For-Review
Sep 15 2021, 9:11 AM
2021-09-15 09:11:52 (UTC+0)
Krinkle
mentioned this in
T291332: Alert when auto-increment fields on any MW-related databases reach a threshold
Sep 18 2021, 6:54 PM
2021-09-18 18:54:44 (UTC+0)
Lucas_Werkmeister_WMDE
mentioned this in
T188730: [C-DIS][SW] Enable statement usage tracking on Commons and Co
Oct 25 2021, 11:51 AM
2021-10-25 11:51:31 (UTC+0)
Marostegui
added a comment.
Nov 5 2021, 8:13 AM
2021-11-05 08:13:58 (UTC+0)
Comment Actions
Update:
enwiki
root@db2071.codfw.wmnet[sys]> select * FROM schema_auto_increment_columns LIMIT 4 \G
*************************** 1. row ***************************
table_schema: enwiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1440028796
auto_increment_ratio: 0.3353
*************************** 2. row ***************************
table_schema: enwiki
table_name: cu_changes
column_name: cuc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1081683349
auto_increment_ratio: 0.2518
*************************** 3. row ***************************
table_schema: enwiki
table_name: text
column_name: old_id
data_type: int
column_type: int(8) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1066746222
auto_increment_ratio: 0.2484
*************************** 4. row ***************************
table_schema: enwiki
table_name: revision
column_name: rev_id
data_type: int
column_type: int(8) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1053665251
auto_increment_ratio: 0.2453
4 rows in set, 32 warnings (0.056 sec)
commonswiki
root@db2073.codfw.wmnet[sys]> nopager;select * FROM schema_auto_increment_columns LIMIT 4 \G
PAGER set to stdout
*************************** 1. row ***************************
table_schema: commonswiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1806985765
auto_increment_ratio: 0.4207
*************************** 2. row ***************************
table_schema: commonswiki
table_name: externallinks
column_name: el_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1167723248
auto_increment_ratio: 0.2719
*************************** 3. row ***************************
table_schema: commonswiki
table_name: geo_tags
column_name: gt_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 681462505
auto_increment_ratio: 0.1587
*************************** 4. row ***************************
table_schema: commonswiki
table_name: cu_changes
column_name: cuc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 620880460
auto_increment_ratio: 0.1446
4 rows in set, 72 warnings (0.086 sec)
wikidatawiki
root@db2080.codfw.wmnet[sys]> nopager;select * FROM schema_auto_increment_columns LIMIT 4 \G
PAGER set to stdout
*************************** 1. row ***************************
table_schema: wikidatawiki
table_name: recentchanges
column_name: rc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1570382379
auto_increment_ratio: 0.3656
*************************** 2. row ***************************
table_schema: wikidatawiki
table_name: cu_changes
column_name: cuc_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1558035694
auto_increment_ratio: 0.3628
*************************** 3. row ***************************
table_schema: wikidatawiki
table_name: text
column_name: old_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1542817108
auto_increment_ratio: 0.3592
*************************** 4. row ***************************
table_schema: wikidatawiki
table_name: revision
column_name: rev_id
data_type: int
column_type: int(10) unsigned
is_signed: 0
is_unsigned: 1
max_value: 4294967295
auto_increment: 1522481089
auto_increment_ratio: 0.3545
4 rows in set, 48 warnings (0.064 sec)
We should work on
T291332: Alert when auto-increment fields on any MW-related databases reach a threshold
to avoid having to worry about this
Marostegui
added a subtask:
T291332: Alert when auto-increment fields on any MW-related databases reach a threshold
Nov 5 2021, 8:14 AM
2021-11-05 08:14:12 (UTC+0)
gerritbot
added a comment.
Dec 1 2021, 7:55 PM
2021-12-01 19:55:17 (UTC+0)
Comment Actions
Change 742998 had a related patch set uploaded (by Umherirrender; author: Umherirrender):
[mediawiki/core@master] [DEMO] schema: Change all id columns and references to bigint
gerritbot
added a project:
Patch-For-Review
Dec 1 2021, 7:55 PM
2021-12-01 19:55:19 (UTC+0)
Umherirrender
subscribed.
Dec 1 2021, 8:00 PM
2021-12-01 20:00:34 (UTC+0)
Comment Actions
In
T63111#7541928
@gerritbot
wrote:
Change 742998 had a related patch set uploaded (by Umherirrender; author: Umherirrender):
[mediawiki/core@master] [DEMO] schema: Change all id columns and references to bigint
Created a patch set to just demo how much work that would be to do it for all primary keys and the references in core.
Maybe this should/could be limited to tables holding user generated content like page/revision/user etc.
But bigint would also require double space than an integer, maybe not worse together with compression.
Umherirrender
removed a project:
Patch-For-Review
Dec 1 2021, 8:01 PM
2021-12-01 20:01:03 (UTC+0)
Ladsgroup
subscribed.
Dec 2 2021, 1:43 PM
2021-12-02 13:43:52 (UTC+0)
Comment Actions
So I think doing a cost-benefit analysis is important here. Doing schema changes for primary keys on a live database is really hard (and most of the time require master switchover) but the benefit is small atm. With the current pace, the only wiki that might run out of max value in five years is wikidata. The rest have a comfortable future. Same goes for rest of tables. I don't think we ever reach 3B pages so I recommend not doing it for any other table than rc and revision but it's still low priority because we have much bigger issues in our databases. In five years most large wikis will be unusable if we don't fix links tables.
Wellverywell
subscribed.
Dec 16 2021, 6:44 PM
2021-12-16 18:44:50 (UTC+0)
Umherirrender
unsubscribed.
Jan 2 2022, 9:02 PM
2022-01-02 21:02:56 (UTC+0)
gerritbot
added a comment.
Jan 2 2022, 9:03 PM
2022-01-02 21:03:28 (UTC+0)
Comment Actions
Change 742998
abandoned
by Umherirrender:
[mediawiki/core@master] [DEMO] schema: Change all id columns and references to bigint
Reason:
Ladsgroup
mentioned this in
T297633:
Jan 24 2022, 8:02 PM
2022-01-24 20:02:40 (UTC+0)
Aklapper
changed the subtype of this task from "Task" to "Feature Request".
Feb 4 2022, 11:13 AM
2022-02-04 11:13:20 (UTC+0)
Aklapper
removed a subscriber:
leucosticte
Krinkle
unsubscribed.
Mar 30 2022, 9:34 PM
2022-03-30 21:34:48 (UTC+0)
Zabe
subscribed.
Apr 6 2022, 10:39 AM
2022-04-06 10:39:03 (UTC+0)
Zabe
added a comment.
Apr 6 2022, 3:23 PM
2022-04-06 15:23:43 (UTC+0)
Comment Actions
In
T63111#7543602
@Ladsgroup
wrote:
so I recommend not doing it for any other table than rc and revision
I would add cu_changes to that list.
Ladsgroup
added a comment.
Apr 12 2022, 1:40 AM
2022-04-12 01:40:39 (UTC+0)
Comment Actions
Yes according to
T291332#7527840
. And text as well (but that table should be dropped, long story)
Zabe
added a comment.
Edited
Apr 12 2022, 9:42 AM
2022-04-12 09:42:46 (UTC+0)
Comment Actions
In
T63111#7543602
@Ladsgroup
wrote:
So I think doing a cost-benefit analysis is important here. Doing schema changes for primary keys on a live database is really hard (and most of the time require master switchover) but the benefit is small atm. With the current pace, the only wiki that might run out of max value in five years is wikidata. The rest have a comfortable future. Same goes for rest of tables. I don't think we ever reach 3B pages so I recommend not doing it for any other table than rc and revision but it's still low priority because we have much bigger issues in our databases. In five years most large wikis will be unusable if we don't fix links tables.
Would it be easier for DBA to for now only apply the change to codfw and wait applying it to eqiad until a datacenter switchover happens (which is likely going to happen at some point in the next few years)? This would eliminate all depooling of eqiad replicas and the need for master switchovers. We easily have the time to go this way. EDIT: This no longer works due to multi-dc stuff.
Bugreporter
subscribed.
Apr 12 2022, 11:38 AM
2022-04-12 11:38:54 (UTC+0)
Comment Actions
In
T63111#7847021
@Ladsgroup
wrote:
Yes according to
T291332#7527840
. And text as well (but that table should be dropped, long story)
Note even if
T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production)
is resolved the text table will still be populated by other features: see for example
T301039: Provide a dump of PDF/DjVU metadata
Ladsgroup
added a comment.
Apr 12 2022, 1:15 PM
2022-04-12 13:15:24 (UTC+0)
Comment Actions
I'm one of two people who did the redesign of pdf/djvu metadata storage and I'm not following your comment. Currently the storage is a link to ES path inside img_metadata field and has nothing to do with text table. And we won't add back those to core databases as it is extremely big (~600GB uncompressed)
Bugreporter
added a comment.
Apr 12 2022, 4:50 PM
2022-04-12 16:50:25 (UTC+0)
Comment Actions
Oh, what jcrespo said ("Metadata in the image table just points to the text table, which points to ES content servers.") is wrong. But further check may be needed for other potential extensions directly writing to the table.
jcrespo
added a comment.
Apr 12 2022, 4:59 PM
2022-04-12 16:59:48 (UTC+0)
Comment Actions
Not sure where I said that- That quote is found on
T301039
, which was created by Mitar. But regardless, I don't see how that is relevant to this ticket.
Bugreporter
added a comment.
Edited
Apr 12 2022, 5:03 PM
2022-04-12 17:03:48 (UTC+0)
Comment Actions
For clarification, the purpose of my note:
"(text) table should be dropped" - which may still be used elsewhere (in other extensions) and migration is needed if we need to replace it with another storage.
such extensions (if exists) will make future insertions to text table (which will consume auto-increment) even if we no longer add new text rows on ordinal edits (i.e. content table will directly refer to ES addresses).
(Off-topic: for the sentence itself, see
T301039#7686427
Mitar
subscribed.
May 1 2022, 4:27 PM
2022-05-01 16:27:33 (UTC+0)
Comment Actions
I think I misunderstood in
T301039
from documentation that those pointers are pointing to the text table.
jcrespo
added a comment.
Edited
May 1 2022, 8:07 PM
2022-05-01 20:07:48 (UTC+0)
Comment Actions
In
T63111#7848858
@Bugreporter
wrote:
(Off-topic: for the sentence itself, see
T301039#7686427
I think you are mistaking how to intepret the "diff"- the part in red means "I deleted that part" (compare "old" vs "new tabs) " 0:-). I literally corrected the original Mitar's phrasing to be: "With
T275268
and
T192866
metadata for PDF and Djvu files is stored as blobs in External Storage and no longer in the image table which is what ladsgroup and I are saying :-). It was originally written by Mitar (not a bit deal, and again, not really relevant here- as that change changed the "width" of the image table, not the height). Commenting it just in case it helps understanding Phabricator better :-).
gerritbot
added a comment.
Jan 4 2023, 8:19 PM
2023-01-04 20:19:45 (UTC+0)
Comment Actions
Change 875333 had a related patch set uploaded (by Zabe; author: Zabe):
[mediawiki/core@master] Make rc_id a bigint
gerritbot
added a project:
Patch-For-Review
Jan 4 2023, 8:19 PM
2023-01-04 20:19:46 (UTC+0)
gerritbot
added a comment.
Dec 27 2023, 12:50 PM
2023-12-27 12:50:06 (UTC+0)
Comment Actions
Change 986190 had a related patch set uploaded (by Zabe; author: Zabe):
[mediawiki/extensions/CheckUser@master] Make cuc_id a bigint
gerritbot
added a comment.
Jan 22 2024, 3:12 PM
2024-01-22 15:12:26 (UTC+0)
Comment Actions
Change 986190
merged
by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Make cuc_id a bigint
ReleaseTaggerBot
added a project:
MW-1.42-notes (1.42.0-wmf.15; 2024-01-23)
Jan 22 2024, 4:00 PM
2024-01-22 16:00:43 (UTC+0)
Zabe
added a subtask:
T355609: Make cuc_id a bigint
Jan 22 2024, 10:21 PM
2024-01-22 22:21:04 (UTC+0)
Jdforrester-WMF
subscribed.
Jan 29 2024, 3:32 PM
2024-01-29 15:32:20 (UTC+0)
gerritbot
added a comment.
May 4 2024, 8:16 PM
2024-05-04 20:16:58 (UTC+0)
Comment Actions
Change #875333
merged
by jenkins-bot:
[mediawiki/core@master] Make rc_id a bigint
Maintenance_bot
removed a project:
Patch-For-Review
May 4 2024, 8:31 PM
2024-05-04 20:31:28 (UTC+0)
ReleaseTaggerBot
added a project:
MW-1.43-notes (1.43.0-wmf.4; 2024-05-07)
May 4 2024, 9:00 PM
2024-05-04 21:00:43 (UTC+0)
Zabe
added a subtask:
T364299: Make rc_id a bigint
May 20 2024, 10:50 AM
2024-05-20 10:50:34 (UTC+0)
Reedy
added a subtask:
T364681: Issues with translatable pages on Wikidata due to revision id overflow
May 20 2024, 11:21 AM
2024-05-20 11:21:50 (UTC+0)
Nikerabbit
closed subtask
T364681: Issues with translatable pages on Wikidata due to revision id overflow
as
Resolved
May 23 2024, 3:50 PM
2024-05-23 15:50:06 (UTC+0)
Marostegui
closed subtask
T355609: Make cuc_id a bigint
as
Resolved
Jun 4 2024, 7:27 AM
2024-06-04 07:27:28 (UTC+0)
Bugreporter
added a subtask:
T367856: Cleanup revision table schema
Jun 18 2024, 10:13 AM
2024-06-18 10:13:10 (UTC+0)
Marostegui
closed subtask
T364299: Make rc_id a bigint
as
Resolved
Jul 4 2024, 11:09 AM
2024-07-04 11:09:26 (UTC+0)
Dreamy_Jazz
mentioned this in
T381877: Older wikis with GlobalBlocking installed have an incorrect type for gb_id
Dec 10 2024, 1:43 PM
2024-12-10 13:43:09 (UTC+0)
Marostegui
closed subtask
T367856: Cleanup revision table schema
as
Resolved
Jan 7 2025, 5:25 AM
2025-01-07 05:25:02 (UTC+0)
Zabe
added a comment.
Jan 13 2026, 1:56 AM
2026-01-13 01:56:41 (UTC+0)
Comment Actions
From what I can tell we did convert the most relevant fields to
BIGINT
, i.e.
rc_id
cuc_id
and
rev_id
. The text table is close to empty nowadays due to
T183490
The core field which now has the highest ratio to max value seems to be
ct_id
with ~26.2%.
The wikidata tracking dashboard seems to be broken (see
T408290
).
Bugreporter
added a comment.
Edited
Jan 13 2026, 2:30 AM
2026-01-13 02:30:19 (UTC+0)
Comment Actions
The text table is close to empty nowadays
Though this will eventually be a concern for 3rd users importing Wikipedia database dump. As it is mostly empty (and no core column refers old_id as number instead of as string), it should be easy to do a schema change in WMF production.
we did convert the most relevant fields to BIGINT
So what to do in immediate future is to convert any columns referencing such IDs.
Log In to Comment
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct.
Wikimedia Foundation
Code of Conduct
Disclaimer
CC-BY-SA
GPL
Credits