MinT - MediaWiki
Jump to content
From mediawiki.org
Translate this page
Languages:
Bahasa Melayu
Nederlands
aragonés
italiano
português
svenska
čeština
русский
қазақша
العربية
سنڌي
अंगिका
ⵜⴰⵎⴰⵣⵉⵖⵜ ⵜⴰⵏⴰⵡⴰⵢⵜ
中文
ꠍꠤꠟꠐꠤ
MinT
(Machine in Translation) is a machine translation service based on open-source neural machine translation models.
The service is hosted in the Wikimedia Foundation infrastructure, and it runs translation models that have been released by other organizations with an open-source license.
An open machine translation service can be a key piece of the
essential infrastructure of the ecosystem of free knowledge
This page captures the initiatives to scale the service and make this infrastructure more widely available.
You can try MinT as part of projects such as
Content Translation
and
translatewiki.net
, or directly
in a test instance
Overview of MinT initiatives
edit
Machine translation can be useful in different contexts.
As more products make use of MinT for different purposes, it is useful to differentiate those different contexts.
In this way, when users report a bug it is more clear where it needs to be fixed.
MinT Service.
The backend service running open-source neural machine translation models.
MinT test instance.
A basic interface to try the different translation models.
MinT for Translators.
Initiative to integrate the MinT Service with tools that support other machine translation services such as Content Translation and the Translate Extension.
MinT Client for Content Translation.
Client exposing the MinT Service as one of the machine translation services available in Content Translation.
MinT Client for Translate extension.
Client exposing the MinT Service as one of the machine translation services available in the Translate extension.
MinT for Wiki Readers.
Product to enable readers to use machine translation to read contents from other languages on a wiki.
You can read more below about each of the MinT initiatives.
Get involved
edit
Feel free to share any feedback
in the discussion page
Planned improvements
are captured in Phabricator
more info
), you can
report wrong behavior
or
propose feature enhancements
, track the progress of any task, and share your perspective on it.
For completed work you can also check
the status updates below
MinT Service
edit
The MinT Service is designed to provide translations from multiple machine translation models.
Currently, it uses the following models:
NLLB-200.
The latest model from the
No Language Left Behind project
by a research team at Meta. This model supports translation across
200 languages
, including many that are not supported by other vendors.
OpusMT.
The
OPUS (Open Parallel Corpus) project
from the University of Helsinki compiles multilingual content with a free license to train
the OpusMT translation models
. Anyone can easily help improve the translation quality by participating in the different projects that contribute data to OPUS. For example, when using
Content Translation
to create translations of Wikipedia articles,
the data on published translations
will be incorporated as a new resource to improve the translation quality for the next version of the model. Another quick way to contribute is to provide sentence translations with
Tatoeba
IndicTrans2.
The IndicTrans2 project
provides
translation models
to support
over 20 Indic languages
. These models were developed by AI4Bharat@IIT Madras, a research group at the
Indian Institute of Technology, Madras
Softcatalà.
Softcatalà is a non-profit organization with the goal to improve the use of Catalan in digital products. As part of the
Softcatalà Translation project
, translation models used in
their translator service
to translate 10 languages to and from Catalan
have been released
MADLAD-400.
MADLAD-400
is a multilingual machine translation model by Google Research that supports 419 languages.
MinT
supports over 200 languages
, with more than 70 languages not supported by other services (including 27 languages for which there is no Wikipedia yet).
You can
read more about the initial release of MinT
and check some frequently asked questions in
the summary page for the service
Technical details
edit
The translation models have been optimized for performance using
OpenNMT Ctranslate2 library
in order to
avoid the need for GPU acceleration
This makes it easier for organizations and individuals to build and run their own instances.
For more details you can check the following:
the source code
GitHub mirror
the API spec
a test instance
MinT provides a platform to run multiple translation models.
In order to support different initiatives, aspects such as
sentence segmentation
language detection
, pre/post-processing of contents, and rich format support has been developed on top of the plain-text based models.
Test instance
edit
The
MinT test instance
is a basic interface to try the different translation models.
It allow to translate contents across the selected language pairs and select the preferred translation model when multiple are available.
This allows different communities to check how well the models support their language.
This instance is intended for testing, so performance and availability may be reduced compared to other MinT-based products.
You can check
the availability status of the MinT test instance
MinT for translators
edit
Mobile translation using MinT
Translation is a common way to contribute in the Wikimedia ecosystem for multilingual users.
Machine translation can provide a useful initial translation for users to review and improve.
The Language team has developed tools to support translations in their workflows that can integrate different machine translation services to speed up their processes.
Once MinT was available, integrating it with these tools was a logical next step to amplify their impact.
MinT is available in the following projects:
Content Translation.
Content Translation
provides guidance to create a translation of a Wikipedia article into another language. Content Translation
integrates several translation services
to provide an initial translation. You can check
which languages supported by MinT are available in Content Translation
Localization infrastructure.
The
Translate extension
provides the infrastructure used to translate our software and multilingual pages. Communities of translators use it on
translatewiki.net
Wikimedia Meta-wiki
MediaWiki.org
and more.
MinT for wiki readers
edit
Tracked in
Phabricator
Task T341196
The number of topics and the amount of information a reader can learn about from Wikipedia and other wikis depends on the languages they speak.
Machine translation can help people to learn more about their topics of interest when the content is not available in their language.
This initiative explores how to surface the machine translation support from MinT in Wikipedia articles in a way that:
Allows readers to learn more about the topics of interest from other languages.
Clearly differentiates automatically generated content from community-created one.
Encourages to access and contribute to community-created content when possible.
At the moment the Language team is working on
the initial implementations
for this initiative based on
the research
and
the designs
Learnings based on data and community input will determine the next steps for the initiative.
MinT more widely available
edit
Working on the previous initiatives will help to polish and solidify the system.
For now, the MinT API is only available for Wikimedia products.
As the system gets ready, we'll consider a wider exposure.
Providing a service that can be used by communities in innovative ways can be a very powerful tool.
New initiatives to make MinT more widely available will be captured here in the future.
Meanwhile, feel free to configure your own MinT instance to experiment with it.
Disclaimer
edit
Accuracy of MinT’s Translations - The accuracy of translations generated by MinT may vary. Translations may not be entirely accurate or may not always convey the intended meaning or context of the original content. Wikimedia makes no representations or warranties regarding the accuracy or adequacy of the automatically translated content.
Limitation of Liability - Wikimedia, its affiliates, and employees are not liable for any direct, indirect, incidental, punitive, or consequential damages, including but not limited to damages for goodwill, use, data, or any other intangible losses arising out of or in connection with the use of MinT or translations generated with MinT.
Creative Commons Compliance - Translations generated with MinT are considered derivative works under the applicable Creative Commons license governing the original content. Users shall comply with the terms of the applicable Creative Commons license when using translated content.
Terms of Use and Privacy Policy - Use of MinT is subject to Wikimedia's
and
Status updates
edit
February 2024
edit
Adjusted translation limits for Punjabi
after community request to make them less strict due to improved quality of machine translation.
Research on MinT for Wikipedia Readers is complete. Two reports
were published at the research page
multi-model support
for the MinT test instance. Allowing communities supported by multiple translation models to try, compare assess the quality to determine which one works the best.
January 2024
edit
Infrastructure
updates to benefit from newer Python versions
December 2023
edit
A new larger instance
has been created for the MinT.
Memory quota has been increased
to accommodate the needs for MinT as the usage and models available increase.
New design concepts for exposing MinT to Wikipedia readers have been created
based on the input from initial research. Multilingual prototypes have been updated to learn from the new concepts in the next round of research.
Adjusted exposure of MinT in the translate extension to
avoid showing translation suggestions for contents with wikitext markup
November 2023
edit
Better wikitext support by
improving error handling
when MinT processes wikitext.
Completed
Research plan
is complete and started research sessions.
Explored New advanced API for
sentence segmentation
to support needs for
EditCheck
use case and others.
Improved responsiveness of the MinT test instance by
avoiding some translation requests to get stuck
MinT
was set as the default translation service in Content Translation for Kurdish (ku) and Sesotho (st)
, languages where it is optional but frequently used.
A new larger instance
has been created for the MinT.
Memory quota has been increased
to accommodate the needs for MinT as the usage and models available increase.
New design concepts for exposing MinT to Wikipedia readers have been created
based on
input from the initial round of research
Published report
analyzing usage of machine translation services
October 2023
edit
MinT is now supported in Content Translation for
Fon
, a Wikipedia that graduated recently from incubator.
Announced sentencex library:
sentencex: Empowering NLP with Multilingual Sentence Extraction
- A python and js library to meet the needs of sentence segmentation for all the languages we support.
Proposed model card for language identification
as part of the creation of a LiftWing service to provide those capabilities for MinT and others.
The
new sentence segmentation approach
has been
exposed in Content and Section Translation
to validate it with real contents. Resolved community-reported issues such as the
problems translating court cases
MinT test instance
provides
consistent language names with Wikipedia
by using Wikipedia APIs instead of the limited browser localization capabilities.
Launched the Language Identification service
to automatically detect in which language is written a given text. The service supports the detection of 201 languages, and anyone can access
the API
to use the service or
read the model card
for more details. Machine Learning team
completed the last checks
after deploying to LiftWing and evaluating that the service can "easily withstand a high amount of traffic".
Basic support for rich text translation by
supporting transferring of markup
to apply styling such as words in bold from the source text into the equivalent ones in the machine translation (which lacks format since translation models operate with plain-text).
Completed the process to
enable MinT for languages with no Wikipedia yet
. Translation models in MinT support 25 languages for which there is no Wikipedia. These can be tested in MinT's
test instance
for speakers of those languages to assess quality, and ensures that translation tools are well-equipped once wikis are created for those languages (as it has been the case with the recent graduation of
Fon Wikipedia
out of incubator).
Completed the process to
enable MinT for closely-related languages based on Community input
. For some languages where machine translation is not available, Wikipedia editors have asked to have access to machine translation in Content Translation using a related language instead of having no support at all. With this enablement translators of Gan (gan) Wikipedia will have machine translation based on the traditional script variant of Chinese as a starting point.
Analysis of translation activity
on
55 languages
for which MinT provides machine translation for the first time shows how (a) translations have increased 2X since MinT is available, and (b) deletion rates have not increased. Activity levels for these 55 wikis changed from ~500 translations/month, to 1K+ translations/month after MinT was enabled. For example, a recent peak of 2.15K translations were published in August 2023 when MinT was available for those languages, which is a significant increase from 225 translations in August 2022 when MinT was not available for them.
Better visibility of translation quality by
including a tag in translations where unedited machine translation is close to the limits
. This will facilitate analysis about translation quality and limits.
Created prototypes for upcoming research
illustrating 5 concepts on how MinT can be used by Wikipedia readers and supporting the 4 languages we will conduct research in: Hindi, Chattisgarhi, Awadhi, and Korean.
Improvements for MinT to
process more predictably contents with new lines in them
September 2023
edit
Completed
initial design exploration
to illustrate 5 concepts on how to surface machine-translated contents from other languages for Wikipedia articles
Completed enablements of MinT in Content Translation
for Lingurian
, where the community requested further clarifications about MinT, and
the last set of 14 languages that could be supported with the NLLB-200 model
Enabled
MinT for translatable pages
on
test wiki
Expanded exposure of MinT with the
enablement of Content Translation mobile and desktop experiences as default in 7 Wikipedias supported by MinT
(Cherokee, Tongan, Hungarian, Kazakh, Kyrgyz, Minangkabau, and Sardinian).
Completed the validation for all languages supported by the translation models used by MinT as part of the final QA for
enabling the new translation service
Santhosh presented at
the 10th Workshop on Asian Translation
emphasizing the need for machine translation to be universal, free, and available in more languages. A message
well received by the attendees
Research planning started with an initial draft of the research brief for MinT on Wikipedia
Continuing technical explorations for applying machine translation beyond plain text (what underlying models provide) to support the Wikipedia context: A
new improved approach
for sentence segmentation (with
a demo page to try
) that provides a more accurate way to identify when a sentence ends in different languages, and with a preference to avoid splitting in case of doubt (preferred in the context of machine translation to avoid fragmenting the context of a translation, for example, misinterpreting the dot of an abbreviation as a fullstop).
August 2023
edit
Successful
exploration for the use of MinT to translate structured formats
such as HTML, SVG and markdown.
Completed
the deprecation of Youdao
, an external translation service that was failing for a long time.
Continued design exploration for MinT on Wikipedia
with new and updated workflows based feedback.
Identified languages
which can benefit the most from new OpusMT models
Made
MinT the default translation service for Zulu
in Content Translation
July 2023
edit
Enabled machine translation with MinT (and communicating with communities) for 75 new languages:
62 languages where the mobile translation experience is available
, and
13 languages where translation quality from other services may not be ideal
based on
the MT usage report
data and/or community feedback.
Validation of previous enablements: identified issues
with Bhojpuri
and
with Latvian
where MinT was not available due to mismatches with the language codes used by Wikipedias, MinT and the underlying translation models.
Initial design explorations and prototypes
on ways we could integrate MinT in Wikipedia
Improved Mint translation post-processing to better support languages using the Arabic script by
avoiding extra paces after fullstops
Completed
the integration of the IndicTrans2 model
by verifying the enablement of all their 23 supported languages.
Initial analysis of activity
for Wikipedia communities that are supported with MinT for the first time to identify potential pilot wikis for future research and as early adopters.
Enablement of MinT on translatewiki.net
for the use in localization of Wikimedia and other open projects.
Retrieved from "
Category
Localisation
MinT
Add topic