User:SchlurcherBot - Meta-Wiki
Jump to content
From Meta, a Wikimedia project coordination wiki

SchlurcherBot

Function overview: Convert links from http:// to https://

Rationale:

Programming language: C#

Source code available: Main C# script: commons:User:SchlurcherBot/LinkChecker

Namespaces: This bot only edits on namespace 0 (Main) and 6 (File)

Function details: The link checking algorithm is as follows:

  1. The bot extracts all http-links from the parsed html code of a page.
    • It searches for all href elements and extracts the links.
    • It does not search the wikitext, and thus does not rely on any Regex.
    • This is also to avoid any problems with templates that modify links (like archiving templates).
    • Links that are subsets of other links are filtered out to minimize search and replace errors.
  2. The bot checks if the identified http-links also occur in the wikitext, otherwise they are skipped.
  3. The bot checks if both the http-link and the corresponding https-link is accessible.
    • This step also uses a blacklist of domains that were previously identified as not accessible.
  4. If both links redirect to the same page, the http-link will be replaced by the https-link (the link will not be changed to the redirect page, the original link path will be kept).
  5. If both Links are accessible and return a success code (2xx), it will be checked if the content is identical.
    1. If the content is identical, and the link is directly to the host, then the http-link will be replaced by the https-link.
    2. If the content is identical but not the host, it will be checked if the content is identical to the host link, only if the content is different, then the http-link will be replaced by the https-link.
      • This step is added as some hosts return the same content for all their pages (like most domain sellers, some news sites or pages in ongoing maintenance).
    3. If the content is not identical, it will be checked if the content is at least 99.9% identical (calculated via the en:Levenshtein distance).
      • This step is added as most homepages use dynamic IDs for certain elements, like for ad containers to circumvent Ad Blockers.
    4. If the content is at least 99.9% identical, the same host check as before will be performed.
    5. If any of the checked links fails (like Code 404), then nothing will happen.

Source for pages: The bot works on the list of pages identified through the external links SQL dump. The list was scrambled to ensure that subsequent edits are not clustered from a specific area.

Further comments: The bot respects the API etiquette and uses both a user-agent header as well as respects the maxlag parameter.

Status: (CentralAuth)

Approved as global bot (per this request) and thus flagged as bot on all projects that did not opt-out (per this list).

Project Request Pages Edit Description Used Status
commonswiki Approved 31'145'089 Fix http to https Running…
dewiki Approved 1'888'381 Bot: http → https Running…
enwiki Approved 8'570'327 Bot: http → https Running…
eswiki Approved 2'191'542 Bot: http → https Running…
frwiki Approved 2'970'187 Bot: http → https Running…
itwiki Approved 2'359'233 Bot: http → https Running…
jawiki Allows global bots 994'375 Bot: http → https Working Waiting
plwiki Approved 1'527'763 Bot: http → https Working Waiting
ptwiki Approved 1'214'889 Bot: http → https Working Waiting
ruwiki Allows global bots 1'797'992 Bot: http → https Working Waiting
zhwiki Allows global bots 1'105'051 Bot: http → https Working Waiting
dewikinews Approved 17'280 Bot: http → https Done
dewikiquote Approved 5'673 Bot: http → https Working Waiting
dewikisource Approved 97'284 Bot: http → https Done
dewikiversity Approved 9'301 Bot: http → https Done
dewikivoyage Approved 19'094 Bot: http → https Done
dewiktionary Approved 145'334 Bot: http → https Done
altwiki Approved (Village Pump) 864 Bot: http → https Working Waiting
arywiki [1] 8'953 Bot: http → https
bnwiki [2] 172'842 বট: http থেকে https-তে পরিবর্তন করছে
bnwikibooks Approved (Village Pump) 745 বট: http থেকে https-তে পরিবর্তন করছে Working Waiting
bswiki [3] 79'281 Bot: http → https
cswikibooks [4] 601 Bot: http → https
cswikisource [5] 46'231 Bot: http → https
cswikiversity [6] 1'946 Bot: http → https
dvwiki [7] 1'088 Bot: http → https
enwikisource [8] 117'079 Bot: http → https
enwiktionary [9] 443'242 Bot: http → https
eswikibooks [10] 3'417 Bot: http → https
eswikinews [11] 14'339 Bot: http → https
eswikisource Approved (Village Pump) 5'826 Bot: http → https Done
frwikibooks [12] 7'632 Bot: http → https
frwikinews [13] 19721 Bot: http → https
frwikisource Approved 42'309 Bot: http → https Working Waiting
frwikiversity [14] 4'126 Bot: http → https
frwikivoyage [15] 8'536 Bot: http → https
frwiktionary [16] 532'493 Bot: http → https
fywiki Approved (Village Pump) 30'516 Bot: http → https Working Waiting
glwiki [17] 213'696 Bot: http → https
hewikibooks Pending (Village Pump) 1'660 Bot: http → https  On hold
hewikisource Pending 98'820 Bot: http → https  On hold
hewikivoyage Allows global bots 2'038 Bot: http → https
hewiktionary Approved (Temporary) (Village Pump) 6'559 Bot: http → https Working Waiting
hiwiktionary Approved (Village Pump) 4'970 Bot: http → https Working Waiting
hrwikibooks Approved (Village Pump) 428 Bot: http → https Working Waiting
hrwikiquote Pending (Village Pump) 1'254 Bot: http → https Working Waiting
huwikibooks Pending (Village Pump) 18'488 Bot: http → https  On hold
huwikisource Pending (Village Pump) 7'222 Bot: http → https  On hold
idwiki [18] 673'383 Bot: http → https
iswiki Approved 30'026 Bot: http → https Working Waiting
iswikisource Approved (Village Pump) 38 Bot: http → https Working Waiting
iswiktionary Pending (Village Pump) 17'145 Bot: http → https  On hold
itwikinews [19] 12'880 Bot: http → https
itwikivoyage [20] 8'563 Bot: http → https
itwiktionary [21] 80'610 Bot: http → https
jawikibooks Approved 1'873 Bot: http → https Done
jawiktionary Pending (Village Pump) 8'834 Bot: http → https  On hold
kshwiki Approved (Village Pump) 1'364 Bot: http → https Working Waiting
lawikisource Approved (Village Pump) 9'453 Bot: http → https Working Waiting
liwikisource Approved (Village Pump) 1'080 Bot: http → https Working Waiting
liwiktionary [22] 86 Bot: http → https
mnwwiki [23] 1'010 Bot: http → https
mrwiki [24] 59'852 Bot: http → https
mrwikisource Approved (Village Pump) 1'372 Bot: http → https Working Waiting
mtwiki [25] 4'626 Bot: http → https
ndswiki [26] 24'342 Bot: http → https
nlwikivoyage [27] 2'385 Bot: http → https
nnwiki Approved (Village Pump) 144'877 Bot: http → https Working Waiting
outreachwiki [28] 6'136 Bot: http → https
plwikiquote Approved (Temporarily) 10'443 Bot: http → https Done
plwiktionary Approved (Temporarily) 92'252 Bot: http → https Running…
ptwikibooks Pending (Village Pump) 4'917 Bot: http → https  On hold
rowiki [29] 608'015 Bot: http → https
rowiktionary Approved (Temporarily) (Village Pump) 82'646 Bot: http → https Done
ruwikinews Approved 833'044 Bot: http → https Working Waiting
ruwikisource [30] 213'855 Bot: http → https
ruwiktionary [31] 19'274 Bot: http → https
slwiki [32] 135'056 Bot: http → https
slwikisource [33] 16'902 Bot: http → https
sourceswiki Approved (Village Pump) 26'478 Bot: http → https Working Waiting
specieswiki Approved (Village Pump) 640'405 Bot: http → https Working Waiting
srwiki [34]] 659'443 Bot: http → https
srwikibooks Pending (Village Pump) 935 Bot: http → https  On hold
svwikisource [35] 1'791 Bot: http → https
svwikiversity Approved (Village Pump) 374 Bot: http → https Working Waiting
svwikivoyage [36] 1'327 Bot: http → https
ukwiki [37] 1'280'019 Bot: http → https
urwiki [38] 148'606 Bot: http → https
vecwikisource Approved (Village Pump) 4'875 Bot: http → https Working Waiting
viwiki [39] 1'350'516 Bot: http → https
wuuwiki [40] 6'016 Bot: http → https
yuewiktionary Pending (Village Pump) 401 Bot: http → https  On hold