Building multimedia, technological, educational & citable resources for South Asian & other low-resourced languages
Fluxx IDR-GS-2601-21440
proposed start date2026-07-01
proposed end date2027-06-30
amount requested (local currency)49289.04 CAD
grant typeIndividual
funding regionSA
decision fiscal year2025-26
funding program roundRound 2
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.
Applicant information
[edit]- Organization name or Wikimedia Username for individuals. (required)
- Psubhashish
- Do you have any approved General Support Fund requests? (required)
- Yes, I have already applied and received a General Support Fund
- You are applying as a(n). (required)
- Individual
- Are your group or organization legally registered in your country? (required)
- N/A
- Do you have a fiscal sponsor?
- Yes
- Fiscal organization name.
- Kiwix
- Please provide links to the following documents if they are available
These documentation can be provided in your local language(s), no translations required.
- Organizational website
- Detailed financial reporting and/or audits
- Documentation of the governance structure, board list, governance processes
- Documentation of the general assembly decision on your plan
- Website: https://kiwix.org/
- 1. Please state the title of your proposal. This will also be a title for the Meta-Wiki page. (required)
- Building multimedia, technological, educational & citable resources for South Asian & other low-resourced languages
- 2. Do you want to apply for the multi-year base funding for 3 years? (required) (only for returning applicants)
- No
- 2.1. Provide a brief overview of Year 2 and Year 3 of the proposed plan and how this relates to the current proposal and your strategic plan? (required)
The second and third years of our proposed plan will build on the first implementation year of OpenSpeaks Archives. The first year saw a pathway from bringing low‑resourced language oral histories from private archives into Wikimedia projects with citation, scaling the pilot from a small set of languages and collections to a network of fellows, GLAM partners, and advisors across South Asia and allied regions. Building on the first phase, which demonstrated the feasibility of subtitling, reviewing, and citing oral history media in multiple languages, the next phase focuses on deepening community ownership and capacity building, strengthening workflows and tools, and formalising partnerships so that oral knowledge can systematically meet Wikimedia’s citation and verifiability standards.
In Year 2, we aim to support OpenSpeaks Fellows who will document, subtitle, and publish oral histories from their own and neighbouring language communities, while refining the FAIR-CARE-aligned framework for assessment, multilingual transcription, and community review described in our current work. Fellows will lead: identification of potential oral history recording areas and existing oral history recordings; community‑led verification and translation; subtitling into at least one neighbouring official language and English; and integration of the resulting media into Wikipedia, Wikidata, Wikimedia Commons and other Wikimedia projects. In parallel, we will improve the documentation of our tools and workflows as open educational resources to help archivist‑Wikimedians replicate the OERs in their language contexts, and we will co‑create new infrastructures, such as WikiVoice, a proposed digital language archive emerging from Wikimedia Futures Lab, to allow archivists to host their media and cite them in Wikimedia projects. We will also collaborate with partners within and outside of the movement, including our emerging collaboration with Factum in Sri Lanka (a pilot documentation in Sri Lankan Malay is underway) to generate citable media, tools, educational resources, and capacity‑building pathways.
In Year 3, we will shift our focus towards consolidation, advocacy and broader movement impact: documenting evidence from multiple fellow‑led implementations, jointly publishing learnings with partners, and advocating within Wikimedia movement structures for more equitable citation practices for oral histories. Fellows’ collaboration with strategic partners (GLAM and research institutions) will embed collections in institutional catalogues, test citational workflows, and link to wider initiatives in language documentation and Indigenous data governance. We will also align our tooling strategies (for example, exploring how AI‑assisted speech‑to‑text within our Subtitler and WikiVoice prototypes can assist communities in subtitling) with the needs of communities such as Songhay, which need to digitise and transcribe existing multimedia archives to enhance Wikipedia and other knowledge projects. Across both years, the work remains aligned with our longer‑term strategic plan: to create a sustainable framework that enables communities to record oral histories under FAIR–CARE principles, and for the recordings to be reused, cited, and trusted in Wikimedia projects and beyond.
- 3. Proposed start date. (required)
- 2026-07-01
- 4. Proposed end date. (required)
- 2027-06-30
- 5. Does your organization or group have an Affiliate or Organizational Annual Plan that can help us understand your proposal? If yes, please provide it. (required)
- Yes
- https://docs.google.com/document/d/e/2PACX-1vRF2OdnIdzYfkUiDGOByiJQ3EzIv2RCKjpk4s3wxysVTCg6HCAuHCECSbaZW-z_ZQ/pub
- 6. Does your affiliate, organization or group have a Strategic Plan that can help us understand your proposal? If yes, please provide it. (required)
- Yes
- https://docs.google.com/document/d/e/2PACX-1vTSW4u7HJMmZFDxtp8XLI9z86_ybLe88gEG1h7ihyY5PxuEQGMtfL80uUh8-ia07qBoqUv6jFdr9RMP/pub
- 7. Where will this proposal be implemented? (required)
- International (more than one country across continents or regions)
- India
Nepal Sri Lanka Mali
- 8. What are your programs, approaches, and strategies? What are the challenges that you are trying to address and how will your strategies support you in addressing these challenges? (required)
Our programmes support language activists, archivists, and Wikimedians to document, subtitle, and bring to life oral histories in low‑ and medium‑resourced languages, and to make these recordings usable and citable across Wikimedia projects. In this phase, we align with larger movement goals (discussed at Wikimedia Futures Lab), extending our work through new regional and international collaborations, focusing strongly on community‑relevant themes such as Indigenous biodiversity and nature conservation, experimental infrastructure (such as WikiVoice from the Wikimedia Futures Lab), and tooling that can respond to urgent needs of communities whose archives need digitisation and AI‑supported technologies to become accessible. These apart, we’re doubling (or significantly increasing) our engagement with the Wikimedia community and outside.
- Why these communities and these topics?
Oral knowledge documentation is a slow, expensive and trust‑based process, and many communities—particularly those who have experienced knowledge theft, misrepresentation or erasure—fear external interventions. We therefore prioritise communities that have expressed interest for language documentation and capacity building based on our prior and ongoing interaction. Together with them, we decide topics to be documented so the media will contribute to community education apart from being used in Wikimedia projects. This is a win-win goal. We aim to build long‑term relationships with such communities and start by listening to understand how they want their knowledge represented, and piloting small documentation projects to test whether a larger collaboration is possible and mutually beneficial. Despite land, biodiversity, climate variability, cultural practice and language-related topics being central to Indigenous and minoritised communities’ own priorities, they are often less documented in multimedia and under‑represented in Wikimedia projects. This year, we plan to focus on these areas so that the media can serve multiple purposes: preserving traditional knowledge (e.g., supporting youth in learning their language through local stories and knowledge of plants, animals, and lands), while also addressing gaps in Wikimedia’s coverage of diverse epistemic knowledge systems. Simultaneously, some elders have expressed discomfort with the way Wikipedia/Wikimedia platforms circulate knowledge, fearing misrepresentation or loss of community control; this feedback shapes where we work and how we design consent, licensing, and access. As we shared during Wiki Workshop in 2024, 2025 and 2026 (proposal accepted), our model is also building evidence about ethical, non-exploitative language documentation that address Wikimedia project-level gaps, prioritising community governance.
- Content and readership gaps
As Nigatu et al. note, both content and readership gaps are significantly high for low‑resourced languages and for orally transmitted knowledge. As Nguyen et al. underline, Wikimedia statistics show that Indigenous and other low‑resource languages are poorly documented or entirely missing across projects. Also, open, affordable tools for community‑led multimedia documentation are scarce. Research on online knowledge repositories highlights that for many African and other low‑resourced language communities, Wikipedia editions have low article counts, limited local relevance, and a strong dependence on sources and perspectives from dominant languages (Nigatu et al. 2024). Our own practice, OpenSpeaks Archives, has demonstrated that consciously and strategically documented oral histories can enrich Wikimedia projects as citable sources (see the framework). However, the pathway is complex and often unfamiliar to editors. For example, our Kusunda-language work showed that high‑quality, community‑reviewed audiovisual documentation, peer publication and GLAM acquisition can can even be considered for Wikisource and Wikipedia. These experiences suggest that there is a significant content gap around low‑resourced language oral histories, and that we also need to build trust and shared understanding between community archivists and Wikimedia projects so that this content can be used.
- Core programs and approaches
- OpenSpeaks Fellows/partners and community content
- Fellows coordinate documentation in five language clusters (eastern, northern and southern India, Nepal and Sri Lanka), working with elders and other knowledge holders and local organisations.
- They identify and record oral histories (including on community‑relevant themes such as biodiversity and conservation), review and subtitle existing material, and work with GLAM partners and Wikimedia communities to publish, peer‑review where appropriate, and integrate these materials into Wikimedia projects.
- Regional collaborations
- In Phase 1 we collaborated with several regional organisations in India and Sri Lanka, which led to strong community collaborations and language documentation, and helped us understand local media archives, workflows, and policy contexts around minoritised languages.
- In this phase, partners in three South Asian countries will help us reach additional low‑resourced and minoritised language communities and co‑design documentation and citation strategies that respect local governance, consent practices and sensitivities.
- WikiVoice and infrastructure for oral citations
- Through the Wikimedia Futures Lab we co‑created a conceptual model called WikiVoice together with Wikimedians from regions including Nigeria and Indonesia: a digital language archive that would enforce a FAIR-CARE principle-based peer-production process, allow archivists to host media, digitise it (speech‑to‑text, captioning, translation) and generate citable, time‑coded references for Wikimedia projects.
- This builds directly on the OpenSpeaks oral history framework (FAIR–CARE, community review, multilingual transcription) and explores what a shared, movement‑wide infrastructure for oral citations could look like, informed by our pilot work (such as the ‘’Gyani Maiya’’ documentary) that already connect documentary media to Wikimedia.
- Tooling for language documentation, digitisation and speech‑to‑text
- We will align our Subtitler (offline captioning and Commons integration) tooling and the WikiVoice concept with such needs, exploring AI‑assisted speech‑to‑text where ethically and technically feasible, while ensuring that community review, consent, and data sovereignty remain central and that communities retain control over their materials.
- Communities such as Raji, Tharu, Kumhali and Songhay have existing archival multimedia collections, including citable media from radio archives, that remain largely unusable because of digitisation gaps and the lack of language‑appropriate speech‑to‑text tools.
- Tools, workflows, and open educational resources
- We will refine and document open workflows for subtitling, media inspection, compression, duration calculation, and consent tracking that make oral history projects more feasible in small language communities.
- OpenSpeaks will continue to grow as a set of open educational resources (Wikiversity), providing frameworks and step‑by‑step guidance for archivist‑Wikimedians and linking to emerging infrastructures like WikiVoice so that communities can learn from each other’s practice.
- Community training and capacity building
- Fellows and partners will run remote and in‑person workshops, mentorship circles, and peer‑learning activities that strengthen community capacity for audiovisual documentation, digitisation, subtitling, equitable remuneration, and Wikimedia integration.
- Training will explicitly cover FAIR–CARE principles, community review of oral histories, and practical strategies for both navigating and constructively improving existing Wikimedia citation policies, including by using examples like the Gyani Maiya materials and other OpenSpeaks Archives case studies.
- GLAM collaborations and citational practice
- Strategic partnerships with notable GLAM and research institutions, together with the acquisition and co‑curation of high‑quality language media, will help create reliable sources, embed collections into catalogues, and test workflows for citing time‑coded oral histories within Wikimedia projects.
- Through these collaborations we will produce tangible case studies, including around biodiversity and traditional ecological knowledge, that demonstrate how oral histories can meet verifiability and reliability expectations while respecting community governance and Indigenous data sovereignty.
- Challenges and how our strategies address them
- Structural citation barriers and epistemic hierarchies
- Challenge: Existing Wikimedia policies often privilege written, institutionally published sources, making citation of community‑led oral history documentation difficult even when they are carefully and ethically documented.
- Strategy: Implement and document a multi‑step framework (FAIR–CARE alignment, community assessment, multilingual transcription, time‑coded references) and test it in environments like WikiVoice and specific projects (such as the integration of documented oral histories into Wikisource and Wikipedia), making verification processes legible to editors and policy discussions.
- Under‑representation of low‑resourced and minoritised languages
- Challenge: Many Indigenous and minoritised languages have limited presence on Wikimedia and few citable sources in or about the language.
- Strategy: Use fellowships and partnerships to anchor work in specific language clusters and themes, ensuring that recordings, subtitles, and derived content directly enrich multiple language editions and projects, and that local organisations remain co‑owners of the process.
- Digitisation and tooling gaps
- Challenge: Communities with existing multimedia archives require digitisation and language‑appropriate speech‑to‑text to make their materials usable in Wikimedia, but such tools and resources are scarce.
- Strategy: Co‑design digitisation workflows and AI‑assisted speech‑to‑text experiments within our Subtitler and WikiVoice concepts, ensuring ethical AI practices, community oversight, and careful documentation of successes and limitations to inform the wider movement.
- Limited infrastructure and technical capacity
- Challenge: Many communities lack tools, infrastructure, and institutional support for high‑quality audiovisual documentation, subtitling, and digitisation.
- Strategy: Co‑develop lightweight tools and workflows, document them as open resources, and provide training and mentorship so that archivist‑Wikimedians can adopt them with modest technical capacity, including remote configurations.
- Sustainability and institutional recognition
- Challenge: Community‑led documentation efforts are often short‑term and not embedded within institutional or Wikimedia‑wide infrastructures.
- Strategy: Formalise partnerships and participate in initiatives like Wikimedia Futures Lab so that OpenSpeaks Archives, WikiVoice and related tools inform longer‑term infrastructure and policy around oral citations and low‑resourced language support.
- 9. What categories are your main programs and related activities under? Please select all that apply. (required)
| Category | Yes/No |
|---|---|
| Education | No |
| Culture, heritage or GLAM | Yes |
| Gender and diversity | No |
| Community support and engagement | Yes |
| Participation in campaigns and contests | No |
| Public policy advocacy | No |
| Technology (software development) | Yes |
| Other | No |
Culture, heritage or GLAM
- 9.2. Select all your programs and activities for Culture, heritage or GLAM. (required)
- Documenting or incubating languages on Wikimedia projects, Introducing new approaches to underrepresented culture and heritage, e.g. decolonising or reparative work; oral and visual knowledge; outreach to communities of origin, indigenous and first nations self-determination, Partnering with institutions, professional associations, and allied organizations to raise awareness of open culture, ethical sharing, and related issues
- Other programs and activities if any: N/A
Community support and engagement
- 9.4. Select all your programs and activities for Community support and engagement.
- On-wiki training of community members, Off-wiki training of community members, Organizing meetups, conferences, and community events, Supporting community members' participation in events and conferences, Offering micro-funding and other financial support to community members , Offering non financial support and services to community members (equipment, space, books, etc.)
- Other programs and activities if any: N/A {{#ifeq:Yes|Yes|
Technology (software development)
[edit]- T1. Describe the technical project(s) or provide relevant links. (required) Include the following information
- Project goals, impact, and product strategy
- Technical approach, integrations, and dependencies
- Milestones, progress tracking and success metrics
- Demand and community consultations
As stated in this proposal, Technology is not a standalone programme or strategy but is a part of our second strategy, which focuses on technological tools, workflows, and open educational resources.
Our long-term roadmap is: Wikivoice will integrate Subtitler as its key subtitling engine; language-specific Automatic Speech Recognition (ASR) models will help with a rough draft (similar to Content Translation), and a Wikimedia translation API will help translate subtitles. But rather than implementing it all at once next year, we will move slowly and carefully together with the language speakers and Wikimedia communities. This proposed project will continue to focus on documentation, community and partnership building, archiving, and building educational resources rather than shifting the focus solely to tools.
Project goals, impact, and product strategy
[edit]- Project goals
As a community-based language documentation project, our key priorities are collaboration, documentation, and the creation of a permanent, citable archive of languages. We’re only bridging technological gaps that hinder achieving this goal. Our tools are also not Wikimedia community- or project-focused, but for serving various language communities. This is to ensure that community archivists can document and archive languages without requiring professional software, server infrastructure, or high-bandwidth connectivity. We build only what is absolutely necessary and where no adequate free alternative exists.
- Product strategy
Based on our experiences of rigorously using various tools, we published an early needs assessment and built a prototype in 2024-2025 (pilot phase). As our work grew during 2025-2026 (first phase), we used more tools, identified additional gaps, and began building a utility suite on the Tools page. We had collected feedback on our approach while presenting at the Wiki Workshop 2025. Early on, we reached out to the Indic MediaWiki Developers User Group to seek support and learn about potential overlaps. We also present the key tool, Subtitler, a linear subtitle editor that allows users to create subtitles for audio/video files, offline or online. Since almost all low-resourced languages lack ASR, Subtitler is a manual tool for now.
- Current and planned tools
- OpenSpeaks Subtitler (alpha release live on Toolforge): A browser-based, offline-capable captioning tool. It works with local as well as Commons audio and video files, uses silence-detection to generate draft subtitle segments, and allows fetching media and subtitles from Wikimedia Commons via OAuth (with a plan to upload subtitles in the future). No existing tool combines these features. Following Amara's announcement that its public workspace is closing (April 2026), the need for a robust Commons-native subtitling alternative is more urgent than before—though OpenSpeaks Archives can cover an Amara subscription for the grant period, this is not a sustainable option for most communities. Full source code for Subtitler is public at the Wikimedia GitLab repository under an MIT licence. Public documentation is at OpenSpeaks/Tools/Subtitler here on Meta-Wiki, allowing non-developer community members to submit bug reports and feature requests.
- WikiVoice (conceptual/roadmap stage): A proposed digital language archive co-developed with Wikimedians from Nigeria and Indonesia at Wikimedia Futures Lab (Frankfurt, January 2026). We are contributing to its roadmap rather than building it all by ourselves; infrastructure decisions will be made collectively with other movement partners. It would allow archivists to host media, apply speech-to-text processing, generate time-coded, citable references, and integrate them into Wikimedia projects—while enforcing FAIR-CARE principles and other ethical guidelines throughout.
- Speech-to-text (STT) exploration (research/pilot): We are carefully and modestly exploring STT for low-resourced languages rather than building ASR models from scratch. By evaluating existing open models (e.g., Whisper-based fine-tunes for specific language families), we’re trying to determine which model will be most helpful for our collaborators who are burdened with archival multimedia, such as radio programmes, that need digitisation for information extraction and citation. Media in Songhay remain inaccessible due to digitisation gaps and a lack of language-appropriate STT tools. ASR will benefit subtitling, as Content translation assists with article creation.
Impact (see more: Files produced through OpenSpeaks Archives (Phase 1) have already enhanced nearly 1000 pages across 127 Wikimedia projects, including approximately 100 Wikipedia language editions, with 166K unique pageviews in this month (April 2026) alone. Our proposed tools are designed to make this entire workflow faster, more accessible, and replicable by any community archivist, with or without OpenSpeaks.
Technical approach, integrations, and dependencies
[edit]Our approach is to build tools that address the challenges various language speakers face and keep them open to the broader open source developer community. We continue to learn from our close allies, such as Lingua Libre, about software sustenance. We have started discussing and collaborating with FOSS United, a community-based nonprofit arguably building India’s largest open source community, to plan for long-term sustainability together. The FOSS United developer community is slowly beginning to contribute to our tool suite. We do rely on the Wikimedia funding now for our programmatic work, but we do want to diversify the funding pathway, while keeping the door open to anyone with an interest in helping the software sustain. There is also interest from the Wikimedia community, identifying overlapping needs, recommending our tools (still in alpha) for subtitling, and sharing ideas for improvement/new features. It’s amply clear that there is a strong need to make Commons videos accessible, and we’ve been toying with experimental ways to ideate, educate and build resources for accessibility of Wikimedia content since 2017.
- Subtitler: Hosted on Wikimedia Toolforge under standard Toolforge governance. It will remain accessible and maintainable by the broader Wikimedia and open source technical community.
- WikiVoice: A movement-wide collaborative concept. We are one of major co-contributers.
- STT: We are evaluating open models and community-appropriate deployment approaches. No model will be deployed without mandatory community review of outputs.
- All tools are MIT-licensed, publicly documented on Meta-Wiki (OpenSpeaks/Tools) in plain language for non-developers, and designed to be adopted, forked, or built upon by any Wikimedia technical community.
- We actively research and engage with existing tooling before building. We have presented periodically at Wiki Workshops, in 2025 and 2026, where other subtitling tools were flagged, and we evaluate these before building to avoid duplication. We also plan to compare with other tools and check the existing Wikimedia infrastructure first.
Milestones, Progress Tracking, and Success Metrics
[edit]| Milestone | Timeline | Success metric |
|---|---|---|
| Subtitler: structured user-testing with OpenSpeaks Fellows before public wider release | Q1–Q2 | Documented feedback from ≥5 Fellows; identified issues resolved |
| Subtitler: ~4-week testing/iteration buffer before each major release | Per release cycle | Changelog and release notes published on Meta-Wiki |
| STT evaluation: comparative assessment of ≥2 open models for target language families | Q2 | Written evaluation shared publicly on Meta-Wiki |
| WikiVoice: participate in ≥1 cross-community roadmap session | Q2–Q3 | Meeting notes and shared roadmap published |
| Tool documentation: updated OER-style guides on Meta and Wikiversity | Q3 | Pages updated; usage tracked via pageviews and Fellow feedback |
| Post-grant maintenance: FOSS United collaboration formalised | Q4 | Written agreement or public commitment from FOSS United |
We do not yet have privacy-respecting usage metrics for the Subtitler (e.g. files processed, sessions, language distribution), as the tool has not yet been widely released. We know the kind of insights we want to document: (a) number of Commons files subtitled using the tool, (b) number of Fellows and archivists trained using it, (c) structured post-use feedback from Fellows, and (d) whether other Wikimedia communities adopt it independently (a proxy for genuine utility beyond our own use). We will develop these metrics during the grant year, in consultation with the Wikimedia technical community, without collecting personal data.
Demand and Community Consultation
[edit]- Native speakers directly identified the absence of an offline, subtitling tool as the primary bottleneck in their workflow. This drove the Subtitler. (documented in the [needs assessment section. This is further documented [[Community_Wishlist/W50
here].]
- Amara's public workspace closure (April 2026) has removed the most widely used free alternative, validating and emphasising the need.
- WikiVoice emerged from Wikimedia Futures Lab with Wikimedians from Nigeria and Indonesia, who independently identified the same structural gap: oral media cannot be cited in Wikimedia because there is no citable, verifiable archive infrastructure for it.
- We have engaged with the Indic MediaWiki Developers User Group (IMDUG) and FOSS United; both have expressed interest in collaboration for volunteer development and long-term maintenance.
- STT exploration directly responds to community requests: Songhay (large radio archives), and Raji/Tharu (legacy tapes) communities need digitisation support before subtitling is possible.
For context, our workflow is a series of interconnected things, such as community engagement, recording, acquiring and re-licensing, reviewing (new and archived) and processing media, publishing through GLAM collaborations, and engaging the Wikimedia community for use. We use a wide range of technological tools ourselves in this workflow and continuously seek ways to reduce the burden on all of us using technology, so we can focus on the core functions that require human intervention—this is our broader philosophy. So, as users of technology, we also notice gaps, and some major ones. As mentioned in our earlier response, captioning, subtitling and transcription remain extremely critical to what we publish. They are required for: a) external viewers to understand what is published, which is the case for the majority of Wikipedia readers, as our focus languages don’t have Wikipedias of their own for non-dominant languages; b) for accessibility—Wikipedia and Wikimedia projects must to be accessible for people with disabilities; c) for native users who might not be very fluent in their language or a dialect of their language and would like to access the content through subtitles (this is the case for many community users who might be more fluent in a dominant language than their own).
However, lightweight open source tools are integral to our programmatic work, and we provide a full account of that work here for transparency.
- T2. Describe the project team, maintenance, and risk management. (required) Include the following information
- Security and privacy considerations and expertise
- Mitigation of security or privacy risks
- Long-term maintenance, code documentation and licensing
- Team description with expertise, roles, contribution (hours & compensation)
Our technical work is in direct service of the Wikimedia mission: to make the sum of all human knowledge freely available to every person.
Oral knowledge held by millions of speakers of low-resourced and Indigenous languages is currently invisible on Wikimedia: not because it doesn't exist, but because the infrastructure to make it citable, verifiable, and accessible does not. Our tools address this structural gap directly:
- The Subtitler enables community archivists with no server infrastructure, professional software, or reliable high-speed connectivity to subtitle oral history recordings and publish them to Commons in accessible, standards-compliant formats (SRT, VTT, TimedText—based on international broadcast standards including EBU-TT-D used by the BBC). This directly enables multimedia content growth on Commons and accessibility across Wikipedia language editions—including relatively smaller South Asian languages (Santali, Odia, Assamese, Maithili), African languages (Igbo, Swahili, Hausa, Malagasy), and beyond.
- WikiVoice, if developed, would provide the first movement-wide infrastructure for hosting, reviewing, and generating citable references from oral history recordings—creating a citational pathway for non-written knowledge within the Wikimedia ecosystem for the first time.
- STT assistance, applied cautiously and with community oversight, could help communities such as Songhay (large radio archives) and Raji/Tharu (legacy tape collections) move from inaccessible analogue archives to subtitled, citable Commons media—similar to what OCR did for written historical texts.
All tools are open source (MIT licence), hosted on Wikimedia infrastructure (Toolforge, Commons), documented in plain language for non-developers, and designed to be adopted or forked by any Wikimedia technical community. They are not proprietary assets; they belong to the movement. We also plan to build only tools that are absolutely necessary and for which there is no better alternative.
The demonstrated impact from Phase 1 makes the case concretely: nearly 1000 pages across 127 Wikimedia projects enhanced, over 100 Wikipedia language editions reached, 152,571 unique pageviews in March 2026 alone. These tools are designed to scale this impact to more communities, more languages, and more archivists—without depending on OpenSpeaks as the bottleneck.
- T3. Approximately, how much of the requested budget will you dedicate to technical projects (local currency)? (required)
- 3400 CAD
}}
- 10. Please include a link to or upload a timeline (operational calendar) for your programs and activities. (required)
- Q1 (Months 1–3)
Confirm key personnel (OpenSpeaks Fellows, coordinators and institutional partners) and finalise priority language clusters and themes (including biodiversity and traditional knowledge where communities prioritise them).
Formalise collaboration plans with strategic partners for documentation, tooling, and co‑authored outputs, including partners in India, Nepal and Sri Lanka.
Co‑design the Phase 2 experimentation roadmap for Subtitler and WikiVoice (within or alongside Wikimedia Futures Lab discussions), so that archivists can host media and generate citable, time‑coded references and, where appropriate, explore AI‑assisted speech‑to‑text.
Run onboarding and refresher training for fellows and local collaborators on FAIR–CARE principles, ethical consent, equitable remuneration, and Wikimedia workflows.
Q2 (Months 4–6)
Begin or expand documentation and archival work in each language cluster: new recordings on community‑relevant topics, and inventorying of existing media (for example, Sri Lankan Malay collections, and Songhay radio archives where collaboration is possible).
Start systematic subtitling and translation using Subtitler, piloting AI‑assisted speech‑to‑text where appropriate and with community review built into the workflow.
Support digitisation pilots for legacy materials (e.g. radio programmes and tapes) in at least two communities, in collaboration with local partners and, where relevant, GLAM institutions.
Conduct language‑specific capacity‑building sessions (remote and in‑person) on audiovisual documentation, digitisation, subtitling, and integrating content into Wikimedia projects.
Q3 (Months 7–9)
Scale up documentation, subtitling and community review across all active language clusters, aiming for multiple fully processed recordings per language and topic.
Integrate time‑coded media into Wikimedia Commons, Wikipedia and Wikidata, using and refining citation patterns developed through OpenSpeaks Archives and early WikiVoice experiments, including where appropriate links to peer‑review pipelines such as Wikisource.
Test WikiVoice‑style workflows with at least a small group of archivist‑Wikimedians: hosting media, generating citable references, and linking them to Wikimedia content.
Host cross‑community peer learning and exchange sessions, including contributors from Sri Lanka, South Asia, and communities such as Songhay, to share challenges around digitisation, speech‑to‑text, and ethical use of oral knowledge in Wikimedia.
Q4 (Months 10–12)
Consolidate outputs for this year: complete subtitling and uploads for priority recordings in each language cluster; document at least one complete “pathway” per cluster from local archive (or field recording) to citable Wikimedia use.
Publish updated documentation (Meta pages, Wikiversity resources, blog posts, presentations) on workflows, remuneration, tooling (Subtitler, WikiVoice pilots), and ethical practices, including what we learnt from communities and from working with Wikimedia projects like Wikisource.
Conduct end‑of‑year evaluation with fellows, partners and selected community members focusing on community benefit, data sovereignty, usability of tools, and impact on Wikimedia projects.
Use the evaluation to define priorities for the following year (if funded), including which language clusters to deepen, where to expand, and which tooling or infrastructure strands (for example, WikiVoice or AI STT pilots) to continue or scale.
High‑level plan for the following year (if multi‑year funding becomes possible)
If further funding becomes available, we expect the following year to follow a similar seasonal rhythm (Q1 planning and onboarding; Q2–Q3 documentation, digitisation, subtitling and integration; Q4 consolidation and advocacy), with a greater emphasis on consolidation, mentoring new archivist‑Wikimedians, deepening institutional collaborations, and contributing our learnings on oral citations and infrastructure (such as WikiVoice) to movement‑wide discussions and policies.
- 11. Describe your team. (required)
The team for this phase combines OpenSpeaks Fellows/Coordinators, strategic institutional partners, and individual advisors who together bring community‑based language expertise, technical and archival skills, and global experience in language documentation and digital humanities. Fellows are community members who work either as coordinators for their and neighbouring languages or as leads for subtitling and reviewing content in their language. They will receive training, educational materials and be engaged with clearly scoped responsibilities and deliverables. Strategic partners will collaborate as institutional collaborators (through MOUs or project agreements as appropriate), and advisors will contribute in an honorary capacity, guiding methodology, ethics, and long‑term positioning; all individuals and institutions named here have been or will be informed about their proposed roles before final submission.
OpenSpeaks Fellows (coordinators and language leads):
- Opino Gomango – coordinator for Sora, Juray, Juang, and Gorum/Parengi, overseeing documentation and subtitling for this language cluster and consolidating frameworks for community‑led assessment and review.
- Taukeer Alam – fellow for Van Gujjari language, nature conservationist, writer, community educator and birder, supporting documentation of Van Gujjar oral histories and coordinating subtitling and community verification related to local plants and birds.
- Uday Raj Aaley – coordinator for Raji, Kumhali, and other northern Nepalese languages, supporting documentation of oral histories and coordinating subtitling and community verification across these under‑documented languages.
- Arun Gour – coordinator for Jaunpuri and Jaunsari, supporting identification, subtitling, and review of oral histories and liaising with local communities working in these languages.
- Kimmi Pal – coordinator for Rongpo, leading subtitling, terminology work, and community review for this severely threatened Himalayan language, building on prior collaborative documentation with elders.
- Sanjib Chaudhary – coordinator for Tharu languages from Nepal (specialising on Indigenous knowledge on local plants and food), leading documentation, review, community engagement and capacity building, educational material creation.
- While the language experts above have confirmed, a number of Fellows and coordinators, including some from the community-based strategic partners, will be recruited soon after the project finalisation
WikiVoice co-creators:
- Tochi Precious Friday, Igbo-language Wikimedian
- Biyanto R, Wikimedian and Senior Project Manager at the ESEAP Hub
We also plan to recruit an associate and one/more intern(s) to support with the coordination after the approval.
Fellows will primarily function as project staff (stipended for their engagement) rather than volunteers, with roles spanning community coordination, subtitling, metadata enrichment, and Wikimedia integration for their respective languages. They might also edit Wikimedia projects in volunteer capacity.
Technical Team:
- Jnanaranjan Sahu (advisor)
- Ranjith S (Subtitler lead)
- A developer team, mostly from the MediaWiki community, will be recruited for open source tool development’’
Potential strategic partners (institutional collaborators):
- Language Archive Cologne, Cologne, Germany – a major archive for endangered language documentation, bringing expertise in long‑term preservation, cataloguing, and standards for audiovisual language data. We will collaborate with them for training of archivists-Wikimedians.
- Design Beku, Bengaluru, India – a design and research collective led by Padmini Ray Murray that focuses on digital justice, critical infrastructure, and community‑centred design, contributing to interface, workflow, and documentation design.
Community-based strategic partners:
- Maee – Initiative in the Indian state of Uttarakhand to promote the Van Gujjari language. We will engage them for topic alignment, documentation and creation of educational resources.
- Devalsari Environment Protection and Technology Development Society – a non-profit in Uttarakhand working on local biodiversity conservation. We will engage them for identifying important areas for documentation as well as documentation and educational resource-building.
- Factum, Colombo, Sri Lanka – an Asia and Asia Pacific-focused think tank on diplomacy, tech-plomacy and communications, focusing on Sri Lanka and over 23 Asiab countries. They will be a focal point for community coordination, documentation, review and subtitling, and capacity building for languages in Sri Lanka.
- Rekhta Foundation, Delhi, India – a key organisation working on South Asian languages and literature. They will advise on digitisation, curation, metadata practices, tools and outreach in relevant language communities.
These partners will engage through collaboration agreements, joint activities (documentation, community consultation, subtitling, capacity building workshops, cataloguing pilots), and co‑authored outputs where appropriate. Specific roles within each institution will be documented in the final proposal once confirmed.
Advisors:
- Mandana Seyfeddinipur – a linguist specialising in audiovisual language documentation and head of the Endangered Languages Archive, with extensive experience in language documentation programmes and ethical frameworks for endangered languages. She will advise on alignment with international best practices in language documentation and on pathways to make community‑review processes legible to broader academic and archival audiences.
- Padmini Ray Murray – founder of Design Beku, a scholar and practitioner in digital humanities and critical infrastructure, advising on the design of equitable, community‑centred workflows, documentation, and partnerships.
Advisors will serve in a voluntary capacity, helping to connect OpenSpeaks Archives to wider conversations in language documentation, digital preservation, and decolonial citational practice. Additional roles, including operational coordination and technical development support, may be added as the proposal is refined; any such positions will be listed as vacant or to‑be‑recruited on Meta and will comply with the Wikimedia transparency expectations.
- 12. Will you be working with any internal (Wikimedia) or external partners? Describe the characteristics of these partnerships and bring a few examples of the most significant partnerships. (required)
Internal partners (potential):
- Odia Wikimedians User Group – co-lead in organising Wiki Loves Languages campaign, key Wikimedian liaisons are User:Aliva Sahoo, User:Chinmayee Mishra and User:Ssgapu22
- Dagbani Wikimedians User Group – co-lead in organising Wiki Loves Languages, key Wikimedian liaison is User:Shahadusadik
- Igbo Wikimedians User Group - co-lead in organising Wiki Loves Languages, key Wikimedian liaison is User:Tochiprecious
- Wikimedians of Santali Language User Group - co-lead in organising Wiki Loves Languages, key Wikimedian liaisons are User:R Ashwani Banjan Murmu and User:Ramjit Tudu
Potential strategic partners (external):
- Language Archive Cologne, Cologne, Germany – a major archive for endangered language documentation, bringing expertise in long‑term preservation, cataloguing, and standards for audiovisual language data. We will collaborate with them for training of archivists-Wikimedians.
- Design Beku, Bengaluru, India – a design and research collective led by Padmini Ray Murray that focuses on digital justice, critical infrastructure, and community‑centred design, contributing to interface, workflow, and documentation design.
Community-based strategic partners (external):
- Maee – Initiative in the Indian state of Uttarakhand to promote the Van Gujjari language. We will engage them for topic alignment, documentation and creation of educational resources.
- Devalsari Environment Protection and Technology Development Society – a not-for-profit in Uttarakhand working on local biodiversity conservation. We will engage them for identifying important areas for documentation as well as documentation and educational resource-building.
- Factum, Colombo, Sri Lanka – an Asia and Asia Pacific-focused think tank on diplomacy, tech-plomacy and communications, focusing on Sri Lanka and over 23 Asib countries. They will be a focal point for community coordination, documentation, review and subtitling, and capacity building for languages in Sri Lanka.
- ADS Sora development agency, Gajapati district in Odisha – a not-for-profit working in the Sora language cluster on promoting the Sora and Juray languages.
- Lanjia Saora Development Agency (LSDA), Gajapati district, Odisha – a public initiative supporting the Lanjia Sora community.
- Rekhta Foundation, Delhi, India – a key organisation working on South Asian languages and literature. They will advise on digitisation, curation, metadata practices, tools and outreach in relevant language communities.
These partners will engage through collaboration agreements, joint activities (documentation, community consultation, subtitling, capacity building workshops, cataloguing pilots), and co‑authored outputs where appropriate. Specific roles within each institution will be documented in the final proposal once confirmed.
Advisors (external):
- Mandana Seyfeddinipur – a linguist specialising in audiovisual language documentation and head of the Endangered Languages Archive, with extensive experience in language documentation programmes and ethical frameworks for endangered languages. She will advise on alignment with international best practices in language documentation and on pathways to make community‑review processes legible to broader academic and archival audiences.
- Padmini Ray Murray – founder of Design Beku, a scholar and practitioner in digital humanities and critical infrastructure, advising on the design of equitable, community‑centred workflows, documentation, and partnerships.
Advisors will serve in a voluntary capacity, helping to connect OpenSpeaks Archives to wider conversations in language documentation, digital preservation, and decolonial citational practice. Additional roles, including operational coordination and technical development support, may be added as the proposal is refined; any such positions will be listed as vacant or to‑be‑recruited on Meta and will comply with the Wikimedia transparency expectations.
- 13. In what ways do you think your proposal most contributes to the Movement Strategy 2030 recommendations. Select all that apply. (required)
- Increase the Sustainability of Our Movement, Ensure Equity in Decision-making, Coordinate Across Stakeholders, Invest in Skills and Leadership Development, Manage Internal Knowledge, Identify Topics for Impact, Innovate in Free Knowledge
- 14. Please select and fill out Wikimedia Metrics for your proposal. (recommended)
- 14.1. Number of participants, editors, and organizers.
All metrics provided are optional, please fill them out if they are aligned with your programs and activities.
| Metrics name | Target | Description |
|---|---|---|
| Number of all participants | 250 | Participants include external individuals who will join different outreach activities we will organise |
| Number of all editors | 100 | We plan to engage Wikipedians and Wikimedians in three ways. First, by inviting them to participate in events like Wiki Loves Languages, where they will actively improve Wikipedia articles and other Wikimedia project entries related to languages and language speaker communities. Second, by engaging them with our wider advocacy and resource building on Indigenous and low-resourced language oral history, citation, community and content diversity and inclusion, and epistemic knowledge curation. Third, and most importantly, inviting them as stakeholders in our core programmes. We will also be working closely with some of the Wikimedians who are developers and can provide technological support in a professional capacity. |
| Number of new editors | N/A | |
| Number of retained editors | N/A | |
| Number of all organizers | 10 | This is a conservative goal. We plan to engage Wikimedia organisers to work together for sprints, edit-a-thons and other similar Wikimedia events. |
| Number of new organizers | N/A |
- 14.2. Number of new content contributions to Wikimedia projects. (recommended)
| Wikimedia project | Created | Edited or improved |
|---|---|---|
| Wikipedia | 15 | 100 |
| Wikimedia Commons | 200 | |
| Wikidata | 200 | 200 |
| Wiktionary | ||
| Wikisource | 10 | |
| Wikimedia Incubator | ||
| Translatewiki | ||
| MediaWiki | ||
| Wikiquote | ||
| Wikivoyage | ||
| Wikibooks | ||
| Wikiversity | ||
| Wikinews | ||
| Wikispecies | ||
| Wikifunctions / Abstract Wikipedia |
- Description for Wikimedia projects contributions metrics. (optional)
The media we will create will directly be uploaded to Wikimedia Commons and will be used in Wikipedia articles in various languages. We will also contribute to Wikidata and Wikisource. All of these will be done in our volunteer capacity as Wikimedians.
- 15. Do you have other quantitative and qualitative targets for your project (other metrics)? (required)
- No
| Other Metrics | Description | Target |
|---|---|---|
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
- 16. Will you have any other revenue sources when implementing this proposal (e.g. other funding, membership contributions, donations)? (required)
- Yes
- 16.1. List other revenue sources. (required)
Based on the approval of this project, we will have donations between 5-7K USD
- 16.2. Approximately how much revenue will you have from other sources in your local currency? (required)
- 6823
- 17. Your local currency. (required)
- CAD
- 18. What is the total requested amount in your local currency? (required)
- 49289.04 CAD
| Year | Amount (local currency) |
|---|---|
| Year 1 | N/A CAD |
| Year 2 | N/A CAD |
| Year 3 | N/A CAD |
- Requested amount in USD
- 35552.18 USD [note 1]
| Year | Amount USD [note 1] |
|---|---|
| Year 1 | N/A USD |
| Year 2 | N/A USD |
| Year 3 | N/A USD |
- ↑ a b c The following amount in US dollars was calculated by Wikimedia Foundation staff using the fixed currency rates. This amount is approximate and may not reflect the actual currency exchange rates on the day of submission or distribution. If the application is funded, the funding will be sent in the recipient’s local currency.
- 19. Does this proposal include compensation for staff or contractors? (required)
- Yes
- 19.1. How many paid staff members do you plan to have? (required)
Include the number of staff and contractors during the proposal period. If you have short-term contractors or staff, please include them separately and mention their terms.
- There would be 15-20 professional contributors (individuals, groups, or organisations) as mentioned in the budget. They will be engaged for the exact duration of the tasks on a need-based basis.
These are for professional services outside volunteer Wikimedia contributions.
- 19.2. How many FTEs (full-time equivalents) in total? (required)
Include the total FTE of staff and contractors during the proposal period. If you have short-term contractors or staff, please include their FTEs with the terms separately.
- We won't have any full-time staff or consultants. We plan to recruit an associate and one/more interns to support coordination upon approval. There will be an open call, inviting members of the focus language community (most preferred), Wikimedia community (preferred) and others to apply. However, we will still continue with the task-based model like last year rather than a full-time role.
- 19.3. Describe any staff or contractor changes compared to the current year / ongoing General Support Fund if any. (required only for returning grantees)
- We have made a detailed budget, capping the number of hours (759), which is nearly double last year's budgeted number of hours. This is based on actual time taken for different tasks and will require more human hours. Since we're expanding to one more country (Sri Lanka) and a language cluster (South India), we will be engaging more OpenSpeaks Fellows/Coordinators this year. Many of the Fellows will be returning as coordinators.
- 20. Please provide an overview of your overall budget categories in your local currency. The budget breakdown should include only the amount requested with this General Support Fund (required).
| Budget category | Amount in local currency |
|---|---|
| Staff and contractor costs | 3232.88 CAD |
| Operational costs | 46056.16 CAD |
| Programmatic costs | 49289.04 CAD |
- 21. Please upload your budget for this proposal or indicate the link to it. (required)
Additional information
[edit]- 22. In this optional space you can add any other additional information about your proposal or organization that you think can help us when reviewing your proposal. (optional)
- Clarification for question T3 above
This question requires an important clarification. As stated in Q9 of this proposal, Technology (software development) is not a selected programme category. This was a deliberate decision. OpenSpeaks Archives is a culture/GLAM and community engagement project. We develop tools only insofar as they directly enable our programmatic work and where no adequate free alternative exists.
The total requested budget is 49,289.04 CAD (~35,552 USD), broken down as:
| Budget category | Amount (CAD) | Notes |
|---|---|---|
| Staff and contractor costs | 3,232.88 | Covers all paid professional services across the entire project—not technology alone |
| Operational costs | 46,056.16 | Fellows' stipends and reimbursements, media processing, documentation, community engagement, events, GLAM collaboration |
| Total | 49,289.04 |
There is no dedicated "technology development" budget line. This is intentional.
We strongly identify as a community-based language documentation and archiving project. Engineering/product will not be our focus for the foreseeable future. To emphasise further, we build only the tools that are absolutely needed but absent, and we cannot assume they will be built by someone someday.
Our technical infrastructure is built on free, open, community-owned platforms:
- Wikimedia Toolforge: hosted by the Wikimedia Foundation and provided to community for free; standard Toolforge governance ensures long-term accessibility independent of OpenSpeaks
- Wikimedia Commons: free media storage
- Wikimedia GitLab: free version control
- ELAR (Endangered Languages Archive) and LAC (Language Archive Cologne): permanent archival hosting of master files with DOIs, covered through partnership agreement (not this budget)
- Volunteer and open source community contributions: FOSS United, IMDUG, and the broader Toolforge developer community
Where specific development work does require paid engagement (e.g. a contracted developer for Subtitler improvements now from the ongoing project budget, future developments will be open CFP). This will be funded within the staff/contractor envelope and/or second strategy (Tools, workflows, and OER) creation and only to the extent that volunteer contributions do not materialise. If volunteer capacity from IMDUG or FOSS United does materialise, those funds will be redirected to other programmatic activities and reported transparently. We also plan to engage further with FOSS United to raise more funds for development, create internship, volunteer, and some paid work (via open CFPs) opportunities, and invite developers from their community to these opportunities.
Beyond the grant year, our sustainability strategy for tools includes: (a) FOSS United, India's largest FOSS community foundation, with whom we are building a shared long-term maintenance strategy; (b) Toolforge hosting, which is governed by the Wikimedia community; (c) MIT-licensed, open source, documented code that any developer community can maintain, fork, or improve independently; and (d) ELAR and LAC for permanent archival of all media with citable DOIs, ensuring communities never lose access to their knowledge even in a scenario where OpenSpeaks ceases to operate.
The grant review committee itself noted: "extremely lean and efficient budget—only 6.56% on paid services, with the vast majority going to programmatic work across multiple countries." This is precisely because we believe in building on open infrastructure, with the community, rather than creating proprietary tools or overt ownership that require sustained dedicated funding to survive.
By submitting your proposal/funding request you confirm that you have read and agree to the Application Privacy Statement, WMF Friendly Space Policy, and the Universal Code of Conduct.
- Yes