விக்கிப்பீடியா:பதிவிறக்கம் - தமிழ் விக்கிப்பீடியா
உள்ளடக்கத்துக்குச் செல்
கட்டற்ற கலைக்களஞ்சியமான விக்கிப்பீடியாவில் இருந்து.
இப்பக்கம்
தமிழாக்கம் செய்யப்பட வேண்டியுள்ளது
. இதைத்
தொகுத்துத்
தமிழாக்கம் செய்வதன் மூலம் நீங்கள் இதன் வளர்ச்சியில் பங்களிக்கலாம்.
விக்கிபீடியாவின் அனைத்து உரைப்பகுதிகளும்
குனூ தளையறு ஆவண உரிமம்
எனும் காப்புரிமையின் அடிப்படையிலேயே வழங்கப்படுகின்றன. ; இதுபற்றிய மேலதிக தகவல்களுக்கு
விக்கிபீடியா: காப்புரிமைகள்
பக்கத்தைப் பார்வையிடவும்.
விக்கியினை தொழிற்படச்செய்வதற்கான மென்பொருளை பெற்றுக்கொள்ள
விக்கிபீடியா: மீடியா விக்கி
பக்கத்தினைப் பார்வையிடவும்.. Another page has just the
database schema or layout
Why not just use a dynamic database download?
தொகு
Suppose you are building a piece of software that at certain points displays information that came from wikipedia. If you want your program to display the information in a different way than can be seen in the live version, you'll probably need the wikicode that is used to enter it, instead of the finished html.
Also if you want to get all of the data, you'll probably want to transfer it in the most efficient way that's possible. The wikipedia.org servers need to do quite a bit of work to convert the wikicode into html. That's time consuming both for you and for the wikipedia.org servers, so simply spidering all pages is not the way to go.
To access any article in xml, one at a time, link to (after logging):
Read more about this at
Special:Export
Weekly database dumps
தொகு
SQL database dumps on
download.wikimedia.org
have historically been updated approximately twice weekly. However, currently (as of
சூன் 1
2005
), the most recent database dump dates from 20050516. The status of the download server is discussed in
Wikipedia talk:Database download
. These can be read into a
மைசீக்குவெல்
தொடர்புசால் தரவுத்தளம்
for leisurely analysis, testing of the
Wikipedia software
, and with appropriate preprocessing, perhaps offline reading. There is also
a fuller archive of database dumps
, containing tables other than cur and old.
The database schema is explained in
schema.doc
. The
cur
tables contain the current revisions of all pages; the
old
tables contain the prior edit history. Approximate file sizes are given for the compressed dumps; uncompressed they'll be significantly larger. The files for the larger wikis are currently split into files of about 2GB called xaa, xab, etc. See
here
for information on sticking them back together.
Windows users may not have a
bzip2
decompressor on hand; a
command-line Windows version of bzip2
(from
here
) is available for free under a BSD license. An LGPL'd GUI file archiver,
7-zip
[1]
, that is also able to open bz2 compressed files is available for free. Mac OS X ships with the command-line bzip2 tool and
StuffIt Expander
, a graphical decompressor.
Currently (as of 2004-09-20) a
compressed
database dump of all wikis is 26805MB (842MB for just current revisions). If you thought that's
26.2
கிகாபைட்டு
, you're absolutely correct. On a 56kb/sec standard dial-up modem connection, it will take you only 44.3 days to download! The english version alone (as of 2004-09-20) is about 11.7GB compressed and about 40GB uncompressed.
Images and uploaded files
தொகு
Unlike the article text, many images are not released under GFDL or the public domain. These images are owned by external parties who may not have consented to their use in Wikipedia. Wikipedia uses such images under the doctrine of
நியாயமான பயன்பாடு
under United States law. Use of such images outside the context of
Wikipedia
or similar works may be illegal. Also, many images legally require a credit or other attached copyright information, and this copyright information is contained within the text dumps available from
download.wikimedia.org
. Some images may be restricted to non-commercial use, or may even be licensed exclusively to Wikipedia. Hence, download these images at your own risk.
As of 2004-07-19 the image archive is unavailable for unknown reasons (the links points to non-existent files) But This link is working
download.wikimedia.org/archives/en/
the name of the files for the image are 20040702_wikipedia_en_upload.tar.aa and 20040702_wikipedia_en_upload.tar.ab Before that, only the files uploaded to the English wikipedia were available to download. These might be re-instated, and others may follow later. The file archives, like the text archives, available at
are split into 1.9 GB chunks.
Static HTML tree dumps for mirroring or CD distribution
தொகு
Terodump
is an alpha quality wikipedia to static html dumper, made from wikipedia code. Static html dump (beta quality)
wikipedia-terodump-0.1.tar.bz
. This dump is made of a database that is some months old. -
User:Tero
Wiki2static
is an experimental program set up by
User:Alfio
to generate html dumps, inclusive of images, search function and alphabetical index. At the linked site experimental dumps and the script itself can be downloaded. As an example it was used to generate these copies of
English WikiPedia 24 April 04
Simple WikiPedia 1 May 04
(old database) format and
English WikiPedia 24 July 04
Simple WikiPedia 24 July 04
WikiPedia Francais 27 Juillet 2004
(new format).
BozMo
uses a version to generate periodic static copies at
fixed reference
If you'd like to help set up an automatic dump-to-static function,
please drop us a note on
the developers' mailing list
see also
Wikipedia:TomeRaider database
Possible problems during local import
தொகு
See
Wikipedia:Database dump import problems
Please do not use a web crawler
தொகு
Please do not use a
வலை ஊர்தி
to download large numbers of articles. Aggressive crawling of the server can cause a dramatic slow-down of Wikipedia. Our
robots.txt
restricts bots to one page per second and blocks many ill-behaved bots.
Sample blocked crawler email
தொகு
IP address
nnn.nnn.nnn.nnn
was retrieving up to 50 pages per second from wikipedia.org addresses. Robots.txt has a rate limit of one per second set using the Crawl-delay setting. Please respect that setting. If you must exceed it a little, do so only during the least busy times shown in our site load graphs at
. It's worth noting that to crawl the whole site at one hit per second will take several weeks. The originating IP is now blocked or will be shortly. Please contact us if you want it unblocked. Please don't try to circumvent it - we'll just block your whole IP range.
If you want information on how to get our content more efficiently, we offer a variety of methods, including weekly database dumps which you can load into MySQL and crawl locally at any rate you find convenient. Tools are also available which will do that for you as often as you like once you have the infrastructure in place. More details are available at
Instead of an email reply you may prefer to visit #mediawiki at irc.freenode.net to discuss your options with our team.
Importing sections of a dump
தொகு
The following
பெர்ள்
script is a parser for extracting the Help sections from the SQL dump:
s/^INSERT INTO cur VALUES //gi;
s/\n// if (($j++ % 2) == 0);
s/(\'\d+\',\'\d+\'\)),(\(\d+,\d+,)/$1\;\n$2/gs;
foreach (split /\n/) {
next unless (/^\(\d+,12,\'/);
s/^\(\d+,\d+,/INSERT INTO cur \(cur_namespace,cur_title,cur_text,cur_comment,cur_user,
cur_user_text,cur_timestamp,cur_restrictions,cur_counter,cur_is_redirect,cur_minor_edit,
cur_is_new,cur_random,cur_touched,inverse_timestamp\) VALUES \(12,/;
s/\n\s+//g;
s/$/\n/;
print;
NOTE: Using the current meta.special dump (as at 2005-05-16) the order of the fields in the cur table has changed. inverse_timestamp now comes BEFORE cur_touched. This may cause Windows users no end of grief because all of a sudden your MediaWiki starts sprouting PHP errors about dates that are negative or occur before 1 January 1970 being passed to gmdate and gmmktime functions in GlobalFunctions.php. The reason is that the fields are swapped around and so there is rubbish data in these two fields. Maybe the Unix versions of these functions are smarter or do not cause PHP to spit a Warning message into the HTML script output, or else people have php.ini configured to not display these.
In other words, check that the field order in the script aligns with those in the dump. Better still, we should look at changing the script to retain whatever field order the dump uses 8-)
You can run the script and get a resulting help.sql file with this command:
bzip2 -dc
US