Wikisource talk:ProofreadPage - Wikisour

Wikisource talk:ProofreadPage - Wikisource
Jump to content
From Wikisource
Latest comment:
3 years ago
by Pspviwki in topic
Transcluded page not showing as expected
Features requests
edit
Q : Do we want another pagequality level for pages that are incomplete
edit
incomplete = text not completely transcribed
I think it would be useful. --
Zyephyrus
14:42, 10 May 2009 (UTC)
Reply
Incomplete or incompletely proofed? Or either/both? In any case, I'd use it. --
Spangineer
23:22, 12 May 2009 (UTC)
Reply
It might mean both, either one or the other: it would mean that the page requires an action from the user whatever the action. --
Zyephyrus
06:03, 13 May 2009 (UTC)
Reply
Incomplete
means that the text is only partially transcribed. It seems to me that something such as
incompletely proofed
would do more harm than good, because it is useless for other users to know that you have incompletely proofed a page, if they do not know which part it is.
ThomasV
08:33, 13 May 2009 (UTC)
Reply
Ok. In the long run, a part of the page might be colored as
not proofread
and another part as
proofread
, so the page would be incomplete too; but I understand that it is not a feature for now. Sorry if I have distorted what you meant. --
Zyephyrus
09:16, 13 May 2009 (UTC)
Reply
I want this feature and think it would provide a great deal of guidance for new transcribers as to where they can quickly begin improving a page.
Aharonium
talk
21:20, 16 June 2014 (UTC)
Reply
"Without text"
edit
I've been playing around with bulk validation of blank pages these last few days. The idea is that you don't need to read a blank page to validate it; in most cases it suffices to glance at a small thumb of the page image to determine that, yes, it really is blank. So I wrote a script to make me a gallery of images of unvalidated allegedly blank pages.
One thing I have learned from this is that there are
a lot
of allegedly blank pages that are in fact not blank. The most common cause is pages with a picture but no text—since there is no text to extract from the djvu, the djvutext.py bots mark those pages as blank. It is also not uncommon for our bots to mark text pages as blank, because the OCR failed for that page. And I found one case where every page of an entire book is tagged as blank, because a bot tried to upload OCR text from a djvu that had no text layer.
My conclusion is that allegedly blank pages do need to go through a validation process just like every other page. It worries me a little that the new proofreading system shunts blank pages off into a separate class, making it impossible to distinguish between allegedly blank pages and definitely blank pages. Could we think a little more about this change before we go live with it?
Hesperian
01:31, 28 May 2009 (UTC)
Reply
Robots are not supposed to modify the status of a page.
the 'without text' status will have to be set manually, and the corresponding button will be accessible from any other state.
if robots becomes a problem, I might block robot edits that attempt to modify page status.
ThomasV
23:05, 29 May 2009 (UTC)
Reply
I think you are saying that "Without text" will like "Validated as blank". That is okay.
Hesperian
03:55, 3 June 2009 (UTC)
Reply
Sorting out pages that I can validate per project
edit
I have been working on the enWS
project of the month
doing lots of the first proof, and some of the validation. From the proofread status
en:s:Index:Omnibuses and Cabs.djvu
, I am finding it hard to determine which I can validate, and which I proofread. It would be nice to have a means where I could highlight those that I can go and validate. At the moment, the only way that I can determine is to enter the Page: namespace and see whether the option to validate exists.
Even to expand that thought a little. It would be nice to have a means to monitor work on a Index: namespace project. To be able to see what is happening to subpages of a project, to be able to see at a glance how many pages are validated for a work, how many need to be done, etc. This would give us a reasonable means to have a completion schedule. I have some good underlying thoughts on what would be useful, but am unsure of what is technically feasible, and especially easily and quickly feasible. Thx.
--
Billinghurst
00:39, 18 June 2009 (UTC)
(PS. ThomasV. Your extension is bloody marvellous!)
Reply
these are interesting ideas.
I agree, it would be very useful to visualize which pages were proofed by oneself. This is not possible currently, because the identity of the proofreader is not stored in the database, but only in the text of the pages. In the future I plan to have this stored in the database, and what you describe will be possible. This modification, however, will require a schema change (a modification of the database), and it is not likely to happen soon. In the meantime, I can only suggest to proceed in reading order, in order to remember which pages you proofed :-) [ThomasV, 05:37, 18 June 2009 (UTC)]
Boo, you are no fun whatsoever! Order smorder :-P [Billinghurst 11:58, 18 June 2009 (UTC)]
concerning your second suggestion, Zephyrus added a 'rc' link to some indexes, where you can visualize the modifications of the pages. However, it requires a complicated template, that has to be built manually. I was also planning to create a 'special' page, where index pages are listed with detailed page counts. This too will require a schema change, though.
ThomasV
05:37, 18 June 2009 (UTC)
Reply
Even a page where there was a count of the total number of pages in a work, and a summary of their status
Work's name
Total pages
Count of Validated
Count of Proofread
Count of Problematic
Count of Blank
then a list of works (all or some). It wouldn't need to be a dynamic list, it may be something that is updated daily by a cron if there are changes to a total. [Billinghurst 11:58, 18 June 2009 (UTC)]
The rc link is in all the last PotM :
Look here
on the right, higher than the Contents box. Does this answer your needs? --
Zyephyrus
09:37, 18 June 2009 (UTC)
Reply
Better than nothing, and I think that I scrambled through them. --
Billinghurst
11:58, 18 June 2009 (UTC)
Reply
Billinghurst, do you mean a thing like
this one
made by Kipmaster on fr.ws? --
Zyephyrus
12:27, 19 June 2009 (UTC)
Reply
Yes, Zeph. Something along those lines. However, I was thinking automated rather than manual.--
Billinghurst
15:40, 19 June 2009 (UTC)
Reply
assorted wishes
edit
From bawolff:
Off the top of my head - handle namespaces sanely, It should work out of the box (no hidden steps like importing userland js or setting up namespaces), some of the page status stuff should be in either the page_prop table or somewhere else in the db so its queryable, most of the js needs a thourogh review for security
(That's from my vauge memory of looking at the extension once about a year ago, so things could have changed)
Preview in Third Column
edit
To aid with "Show Preview" before saving the page, it would be helpful to editors to see the formatted, unsaved text as a third column near the scanned text. Currently, I see the preview above the ProofreadPage gadget before saving. There are many times when I may miss a format error that is only clear after the save and the formatted version appears next to the scanned image.
Given the number of wide-screen monitors in use, this should be tenable and a nice way to use all of those extra pixels "in the gutter". Add a gadget option to allow seeing the edit pane, scan pane and an new preview pane side-by-side-by-side. The content need only be updated after pressing the "show preview", not in real-time. Is there a way to get the same result with CSS and div tags without needing to change the code of ProofreadPage (if the above is difficult to do inside the code)? -
DutchTreat
talk
12:08, 1 August 2014 (UTC)
Reply
Sorting wheat and chaff
edit
Is there a ready means to find out which pages have been proofread and validated, though not transcluded into the main namespace. Here I am thinking of a means of checking by work that all pages have been transcluded, or at least a means to have a sanity check that pages not transcluded are not by accidental omission.
I would see that this would be looked at in two ways.
From a work's Index page where we are working upon the work, and want to have a check of the transclusion to the main namespace. Especially relevant for those pulling a work together.
From the perspective of random pages that have been proofed or validated and should be transcluded. Often the case that works are seen to be casually checked from Recent Changes, and one can never be certain of the status.I am wondering whether this could be a report via Special:IndexPages or as a subset/drill down from that page. There are a number of ways that I can think of interrogating things when generally having the janitorial hat on one's head.
--
Billinghurst
07:39, 19 October 2009 (UTC)
Reply
there's currently no way to do this on the wiki; you could get this information from the toolserver, though
ThomasV
09:03, 19 October 2009 (UTC)
Reply
Bug reports
edit
HTML comments ruin Index: page layout
edit
In normal wiki code, you can insert HTML comments within a table. On Index: pages, however, such comments ruin the table layout. I guess the problem is being caused by the "!" in "\nText...
. Thank you. --
Crower
13:20, 20 March 2011 (UTC)
Reply
Yes, we have also noticed a similar problem at french wikisource. Using any no-effect string, like the
nop
template from en.ws, or , or a html comment, is a temporary solution. The problem comes from the code mentioned by Crower, it'll have to be fixed.
Zaran
20:01, 26 September 2011 (UTC)
Reply
When I reported this on
bug #26028
(after
this discussion
) it was closed as a duplicate of
bug #12130
. You can
vote for it
if you also find this to be annoying.
Helder
11:03, 27 September 2011 (UTC)
Transclusion of images
edit
Latest comment:
14 years ago
4 comments
3 people in discussion
I have noticed that there is an issue with transclusions that contain images. It seems to me that often, images are included in books, "where they look nice", for example on the middle of a page, with an equal amount of text above and below it, irrelevant of where paragraphs begin and end. So long as the text is presented on printed pages, that is acceptable. However, when we tranclude text, so that what is presented as multiple pages in a book, becomes a single text on screen, a problem occurs. And that problem is that where a picture is inserted (in code), it creates a break, and what seemingly is a new paragraph, sometimes mid-sentence.
Take for example
en:s:CORSETS: An Analysis
. This article has several illustrations. If we look at the picture that is on page 8, you will see that the insertion of the picture has created a line break, so it looks like one paragraph ends with the words "the body against" (and not full stop) and the next paragraph begins "the downward pull" (without capitalisation). These are, however, both part of the same sentence. Is there a way to avoid this problem in a text that has illustrastions? Where there is a new paragraph on the same page as the illustration, the code for that picture could be moved to between the paragraphs. However, on some pages, he image is so large, or the paragraphs so long, that there are no new paraprahs starting on the same page as the image. What to do in these cases?
V85
07:46, 5 September 2011 (UTC)
Reply
[8]
talk about this trouble, see comment 6, the problem come from that image are included in

wrapper, so the code is

... image

, this is not valid html, div can't be inside p, so the parser close the paragraph, add the image and reopen the paragraph after the div. No known workaround except the one you pointed by moving the image outside a paragraph. —
Phe
16:40, 17 September 2011 (UTC)
Reply
We ran into the same problem on it.source. So far the best solution we found is to wrap the entire paragraph around a

(in our case, setting its style to make it look like a regular

). In this way, the Mediawiki software does not automatically wrap the paragraph in a

as usual, and the text is not interrupted by the image's presence. See my edits at pages
. See also
this page
for an example.
Candalua
19:58, 17 September 2011 (UTC)
Reply
Nice, thanks, I'll try to propagate it to fr:. By the way, on fr: we are searching a trick to solve the trouble "a cell table can't bypass a page boundary", the only solution we have use includeonly/noinclude tag to put the whole text in the same cell when transclusing pages... —
Phe
18:15, 18 September 2011 (UTC)
Reply
MediaWiki:Dictionary.js
edit
Latest comment:
14 years ago
1 comment
1 person in discussion
Moved to
MediaWiki talk:Dictionary.js
. —
Phe
19:42, 25 September 2011 (UTC)
Reply
MediaWiki:Modernisation.js
edit
Latest comment:
14 years ago
1 comment
1 person in discussion
Talk moved to
MediaWiki talk:Modernisation.js#MediaWiki:Modernisation.js
. —
Phe
19:07, 25 September 2011 (UTC)
Reply
Add an attribute to the pages tag to transclude only odd or even pages
edit
Latest comment:
14 years ago
3 comments
3 people in discussion
Marc suggested at French wikisource to have a solution to transclude only even or odd pages when using the tag. This is necessary with multilingual books, where you have a page in language A facing a page in language B and only want to transclude language A. I suggest adding an attribute to the tag (could be named
only
), where you could specify which pages to transclude. For example :

would transclude pages 12, 14 and 16 of the index
file.djvu
. This shouldn't be too hard to code.
Zaran
20:42, 26 September 2011 (UTC)
Reply
I support this idea, it can be a big issue when working on bilingual works. Unfortunately, nobody is developing this extension at the moment.--
Doug.
talk
contribs
01:15, 11 December 2011 (UTC)
Reply
I support it too, but we need perhaps something more general, something ala step=, step=2 will allow skipping odd or even page and an exclude= : exclude="12-15,21,33" will exclude page from 12 to 15, page 21 and page 33. Beside that an include= with the same syntax as exclude to allow to not use from= to= in some case. —
Phe
01:29, 11 December 2011 (UTC)
Reply
Automatic OCR layer extraction, pdf and russian letters
edit
Latest comment:
14 years ago
1 comment
1 person in discussion
Automatic OCR layer extraction from pdf works incorrect with russian letters. For example, in
ru:s:Индекс:Цыбиков Г.Ц. том 2 О Центральном Тибете, Монголии и Бурятии.pdf
it extracts only latin letters and numbers but russian letters transforms to odd symbols
like
"��". Is it possible to solve this problem? Original pdf file placed to
commons:File:Цыбиков Г.Ц. том 2 О Центральном Тибете, Монголии и Бурятии.pdf
--
Вантус
14:43, 29 October 2011 (UTC)
Reply
Hi, I was wondering if you had an example of a wiki that has auto extraction of pdf text. In particular, I'm trying to search within pdfs so that for any OCR readable pdf that if I search for words that appear in the pdf that the search results return that pdf. Please let me know if you had any luck. Thanks in advance,
Tosection ignored
edit
Latest comment:
13 years ago
2 comments
1 person in discussion
Hello! Can anybody solve this problem, please?
The
page 89
has 2 sections, but in the transcluded page the "tosection" is ignored and appears the whole page (also the section of the next chapter). Any idea? Meanwhile, I'm using {{
Page
}}. Thanks! -
Aleator
15:16, 26 October 2012 (UTC)
Reply
Oh... the name of the section was not OK. :S Solved! -
Aleator
15:27, 26 October 2012 (UTC)
Reply
No Image
edit
Latest comment:
13 years ago
1 comment
1 person in discussion
I installed the extension on my wiki but when I create a Page:xyz.jpg there is no image at the right side. Via
Scan
I can see the image. What's wrong? --
87.146.17.135
13:10, 4 November 2012 (UTC)
Reply
Importing djvu image_metadata fails to create bytea
edit
Setup:
Postgres 9.3
Mediawiki 1.21.2
ProofreadPage 1.21
Ubuntu 13.04
Problem:
When I refresh a "Page" namespace, I am presented with a SQL error (I turned all debug on). It is trying to assign XML image metadata to a Postgres 'bytea' column and failing as the string is not in the proper format.
The DJVU file was created by 'pdf2djvu' (0.7.12-2ubuntu6). Sample PDF and DJVU files can be given if needed.
Error Message:
UPDATE "image" SET img_size = '4890564',img_width = '2758',img_height = '4142',img_bits = '0',img_media_type = 'BITMAP',img_major_mime = 'image',img_minor_mime = 'vnd.djvu',img_metadata = '

',img_sha1 = 'i7pmpn6xx8rl8fit5dg9djbhcq196oo' WHERE img_name = 'My_Test_DJVU.djvu'
Stack Trace:
/usr/local/mediawiki-1.21.2/includes/db/DatabasePostgres.php(482): DatabaseBase->reportQueryError('ERROR: invalid...', '22P02', 'UPDATE "image"...', 'LocalFile::upgr...', false)
/usr/local/mediawiki-1.21.2/includes/db/Database.php(983): DatabasePostgres->reportQueryError('ERROR: invalid...', '22P02', 'UPDATE "image"...', 'LocalFile::upgr...', false)
/usr/local/mediawiki-1.21.2/includes/db/Database.php(1840): DatabaseBase->query('UPDATE "image"...', 'LocalFile::upgr...')
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(546): DatabaseBase->update('image', Array, Array, 'LocalFile::upgr...')
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(495): LocalFile->upgradeRow()
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(454): LocalFile->maybeUpgradeRow()
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(349): LocalFile->loadFromRow(Object(stdClass))
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(464): LocalFile->loadFromDB()
/usr/local/mediawiki-1.21.2/includes/filerepo/file/LocalFile.php(716): LocalFile->load()
/usr/local/mediawiki-1.21.2/includes/filerepo/FileRepo.php(366): LocalFile->exists()
/usr/local/mediawiki-1.21.2/includes/filerepo/RepoGroup.php(146): FileRepo->findFile(Object(Title), Array)
/usr/local/mediawiki-1.21.2/includes/GlobalFunctions.php(3542): RepoGroup->findFile(Object(Title), Array)
/usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(195): wfFindFile(Object(Title))
/usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(429): ProofreadPage::load_index(Object(Title))
/usr/local/mediawiki-1.21.2/extensions/ProofreadPage/ProofreadPage.body.php(397): ProofreadPage::preparePage(Object(OutputPage), Array, false)
[internal function]: ProofreadPage::onBeforePageDisplay(Object(OutputPage), Object(SkinVector))
/usr/local/mediawiki-1.21.2/includes/Hooks.php(255): call_user_func_array('ProofreadPage::...', Array)
/usr/local/mediawiki-1.21.2/includes/GlobalFunctions.php(3883): Hooks::run('BeforePageDispl...', Array)
/usr/local/mediawiki-1.21.2/includes/OutputPage.php(2031): wfRunHooks('BeforePageDispl...', Array)
/usr/local/mediawiki-1.21.2/includes/Wiki.php(572): OutputPage->output()
/usr/local/mediawiki-1.21.2/includes/Wiki.php(458): MediaWiki->main()
/usr/local/mediawiki-1.21.2/index.php(62): MediaWiki->run()
{main}
unsigned
comment by
198.84.186.69
talk
) 00:49, 3 October 2013‎.
Recto/verso numbering through ?
edit
Latest comment:
11 years ago
2 comments
2 people in discussion
A lot of early books were numbered by leaf instead of by page-side, meaning when you open the book only the right-side page has a number. The page numbering is usually expressed in the same way as manuscripts, i.e. page number and
recto or verso
(see the image to the right). At the moment, it seems like there's no support for this style of numbering in the

function and the numbering has to be entered by hand for each page, like this:

Is there some way to achieve this that I'm missing? Or can "folio" (and "folioroman") styles be added so we can just mark them like this?

Cross-posted with
Extension talk:Proofread Page
because I'm not sure where the best place to request this is.
Michael Chidester
talk
19:20, 2 October 2014 (UTC)
Reply
The best place for this request is on
Bugzilla
. Feel free to
open a bug on it
Tpt
talk
08:54, 6 October 2014 (UTC)
Reply
I need a help for using ProofreadPage at Korean wikisource
edit
Latest comment:
11 years ago
1 comment
1 person in discussion
Hello, there. I need a help for using ProofreadPage at Korean wikisource. I tried to contact local admin. But nobody answered me. At first, please see this page
ko:색인:殺人書秘話 박문 제1집, 1938.10, 12-13 (2 pages).pdf
, which doesn't work well. I think something is wrong in this. How should do for using ProofreadPage at Korean wikisource?
HappyMidnight
talk
02:31, 11 January 2015 (UTC)
Reply
Could you remove space characters between pages in Thai, Chinese, Japanese etc ?
edit
Latest comment:
10 years ago
2 comments
2 people in discussion
Hi. This extension creates space characters between pages. Although many European languages put spaces between words, some Asian languages don't put spaces between words in
Thai
Mandarin
Cantonese
Japanese
etc. Could you create a parameter to get off/on the insertion of space characters between words for these languages ? Thank you for your maintenance. --
Akaniji
talk
10:02, 15 October 2015 (UTC)
Reply
Akaniji
: There is an
old phabricator bug
concerning this problem. IMO, the only thing we can do with it is to ping
Tpt
to set higher priority for this problem in his long queue. Or, maybe, to prepare a ProofreadPage patch for this ourselves and send it to Phe as ready-to-implement solution?
Ankry
talk
11:46, 18 October 2015 (UTC)
Reply
Ignore: Test edit
edit
Latest comment:
10 years ago
1 comment
1 person in discussion
Testing of the new anti-spam filter.
Varlaam
talk
18:08, 11 December 2015 (UTC)
Reply
Building an index page for multiple files (jpg/png/...)
edit
Latest comment:
9 years ago
1 comment
1 person in discussion
It has been indicated that we do not cover situations like where we have a string of individual images, eg. jpg, and we need to demonstrate the building of Page:...jpg links that string together to match the individual images. We talk about how they string together in but not how to build the page. Thoughts?
billinghurst
sDrewth
03:50, 22 July 2016 (UTC)
Reply
Arabic alphabet
edit
Latest comment:
8 years ago
1 comment
1 person in discussion
Hi. Is there any possibility that the Arabic alphabet could be OCRed? We really need it for the Persian wikisource. --
Yoosef Pooranvary
talk
09:24, 3 June 2017 (UTC)
Reply
Pages cannot be listed more than once on an index page
edit
Latest comment:
5 years ago
1 comment
1 person in discussion
I create 'Table of Contents' with links to the corresponding pages. If I have two headings on one page, then when I write the index, I get an error 'Pages cannot be listed more than once on an index page'.
Why?
This is not logical. One page can have two headings. For example, a title and subtitle or two (or more) works.
Karaby
talk
12:30, 1 June 2020 (UTC)
Reply
Transcluded page not showing as expected
edit
Latest comment:
3 years ago
1 comment
1 person in discussion
Is there a support for this extension? Installed on MediaWiki 1.38.2, PHP, 7.4.30 (fpm-fcgi), ProofreadPage – (05f73cd) 05:02, 26 October 2022, PDF Handler, LabeledSectionTransclusion – (67e3ec4) 05:08, 6 July 2022. All looks fine when in the Page namespace, except when using the page tage for transcluding from Pages to Main, it does not show the expected page from the Pages transcluded in the page in the main, but only a thin stripe; hovering mouse above it, it reads: 1 validated page, 0 only proofread pages and 0 not proofread pages. The template used for MediaWiki:Proofreadpage pagenum is a template from wikisource. Something missing somewhere?
Pspviwki
talk
16:39, 2 November 2022 (UTC)
Reply
Retrieved from "
Wikisource talk
ProofreadPage
Add topic