SIL Converters - SIL Language Technology
SIL Converters
Microsoft Word/COM support for TECkit, CC, and ICU, Perl and Python
SIL Converters — Frequently asked Questions
SIL Converters — Developer Documentation
About
This package provides tools through which you can change the encoding, font, and/or script of text in Microsoft Word and other Office documents, OpenOffice and LibreOffice documents, XML documents, and SFM text and lexicon documents. It also installs a system-wide repository to manage your encoding converters and transliterators (
TECkit
CC
ICU
, Perl, or Python based, as well as support for adding custom transduction engines).
For developers, it provides a simple COM interface to select and use a converter from the repository. It is easy to use from VBA, C++, C#, Perl, Python or any .NET/COM enabled language.
The core EncConverters assembly is fully integrated with
FLEx (FieldWorks Language Explorer)
Speech Analyzer
Phonology Assistant
Adapt It
and
OneStory Editor
software. It provides the same system-wide registry of installed and available encoding converters for all of these user programs. Additionally the package includes some extra utilities such as a clipboard converter for manipulating text between cut and paste operations.
The following picture illustrates the suite of tools, utilities, and applications that are available and how they interact:
Figure 1. SIL Converters Suite
Figure 1 shows the three distinct layers to SIL Converters.
At the top are various
client applications
. These user-oriented programs use the
EncConverters
core assembly to provide encoding conversion and other transduction facilities to their users.
The EncConverters core provides an abstraction layer so the client applications can access the various
transduction engines
without having to implement the interface to each one separately.
The transduction engines are the server applications that provide the actual conversion/text processing capability.
If you are an end user
, you are probably most interested in how to use EncConverters with client applications—for example:
Using the Bulk Word Document Converter to convert the encoding of text in one or more Word document to Unicode, or
Using Bulk SFM Converter to convert SFM documents into Unicode (typically texts and lexicons from
Shoebox
to
Toolbox
If you are a developer
, you may be interested in
Using EncConverters to gain access to the different transduction resources available by writing to the single EncConverters’ interface. See the
Developer Documentation page
for details and code snippets.
top
top
Downloads
Newest Release
There are two installers: one is for users who have a 32-bit Microsoft Office installation and the other is for the more commonly used 64-bit Microsoft Office installation. (See the
FAQ
if you do not know whether you have 32 or 64-bit Office.
Both installers may require an internet connection if you do not have .NET 4.8 installed. The installer will prompt you for necessary prerequisites.
There are no specific Installation instruction for this. However, you should find the installation instructions in the below pdf helpful although we realize it is out-of-date.
See
Downloads for Previous Versions
if you need an earlier version of SIL Converters.
Quick Installation Overview
Note:
You will need Administrator privilege on the computer to install this software.
The Master Setup program runs a series of installers:
Software prerequisites
—Necessary system updates and add-ons are installed on your computer.
SIL Encoding Converters 4.0 Setup
—Conversion applications are installed and conversion Maps and Tables are copied to your hard drive.
SIL Converters for Office 2003
—Currently this installer only installs an additional operating system update.
Converter Option Installer
—A utility that allows you to activate the conversion Maps and Tables you want to use.
Full installation instructions can be found in the
SILConverters 4.0 Installation documentation
(download below). This document is intended to guide you through the Master Installer installation screens and initial SIL Converters 4.0 Setup.
Documentation
SILConverters 4.0 Installation documentation
for all platforms
PDF | 359.6 KB | 29 Aug 2011
SILConverters Documentation
for all platforms
PDF | 1.95 MB | 29 Aug 2011
top
SIL Converters Maps and Tables
This section describes the encodings, font names, and converters contained in the different Maps and Tables packages available in the
SIL Converters
installer. You can check below for the fonts/encodings that you are interested in to see which Maps and Tables package to install.
Most end-users are interested only in a small number of encodings. Typically, computer support people have created
TECkit
maps and/or
CC
tables for the various encodings used in each entity, alleviating most end-users from having to create their own maps and tables.
Because there are hundreds of possible encoding converters and transliterators that different end-users may be interested in, they are packaged into logically-related groups of converters and are available via a two-step process.
Steps
Use the
SIL Converters
installer to install the package(s) or converter likely to be useful to you (e.g. based on your entity).
During installation, all the converter maps/tables in the selected package(s) will be installed into a fixed location on your computer (i.e.
C:Documents and Settings\All Users\Application Data\SIL\SILConverters40\MapsTables
on Windows XP or
C:\ProgramData\SIL\SILConverters40\MapsTables
on Windows 10).
Use the
Converter Options Installer
application to install the few applications you want into the EncConverters’
system repository
They become available to SILConverters client applications.
Note:
Installing maps and tables onto your computer with the SILConverters installer (step 1 above) will not make them available to SILConverters client applications unless you
explicitly
add them to the system repository using the Converter Options Installer or some other mechanism (see
Adding Converters to the System Repository
in the
Help for SILConverters
document).
Select Features — Optional Maps and Tables
The following sections give the details about fonts and encodings for different
Maps and Tables
packages:
top
Basic Converters
Converters and Transliterators common to all SIL. This includes the following:
Converter Name
Encoding Name
Font Names
SIL IPA93<>UNICODE
SIL-IPA93-2001
SILDoulos IPA93
SILManuscript IPA93
SILSophia IPA93
SIL-IPA-1990<>UNICODE
SIL-IPA-1990
SILDoulosIPA
SILManuscriptIPA
SILSophiaIPA
SIL Galatia<>UNICODE
SIL-GREEK_GALATIA-2001
SIL Galatia
ISO-8859<>UNICODE
ISO-8859-1
AMER PHON>UNICODE
(SIL)-Amer_Phon_SILDoulosL3-(2005)
SIL PUA 3.2<>UNICODE 4.1
SIL PUA 3.2<>UNICODE 5.0
SIL PUA 3.2<>UNICODE 5.1
Symbol<>cp1252
UTF8<>UTF16
ReverseString
For reversing the bytes of a “narrow” (bytes) string
null
No change to string, but can be used to apply a different font to some text (e.g. in the Data Conversion Macro)
NFC
Convert to normal form composed
NFD
Convert to normal form decomposed
top
ICU Transliterators
Configuration information for the following ICU transliterators are for Unicode-encodings only.
These are not the only transliterators available via the ICU Transliterator transduction engine, but are only a few of the pre-defined latinizing (or romanizing) transliterators that can be useful in different client applications for different ranges of Unicode.
Devanagari to Latin (aka. Devanagari-Latin)
Bengali to Latin (aka. Bengali-Latin)
Gujarati to Latin (aka. Gujarati-Latin)
Gurmukhi to Latin (aka. Gurmukhi-Latin)
Kannada to Latin (aka. Kannada-Latin)
Malayalam to Latin (aka. Malayalam-Latin)
Oriya to Latin (aka. Oriya-Latin)
Tamil to Latin (aka. Tamil-Latin)
Telugu to Latin (aka. Telugu-Latin)
Arabic to Latin (aka. Arabic-Latin)
Cyrillic to Latin (aka. Cyrillic-Latin)
Greek to Latin (aka. Greek-Latin)
Han to Latin (aka. Han-Latin)
Hangul to Latin (aka. Hangul-Latin)
Hebrew to Latin (aka. Hebrew-Latin)
Hiragana to Latin (aka. Hiragana-Latin)
Katakana to Latin (aka. Katakana-Latin)
Jamo to Latin (aka. Jamo-Latin)
NumericPinyin to Latin (aka. NumericPinyin-Latin)
Any to Latin (aka. Any-Latin)
Note:
These transliterators can be daisy-chained together to transliterate between non-Latin scripts using a Compound meta-converter. For example, chaining the
Devanagari-Latin
transliterator (in the Forward direction) with the
Arabic-Latin
transliterator (in the Reverse direction) gives a ‘Devanagari-Arabic’ transliterator.
FindPhone to IPA converters
Adds the following converters for dealing with FindPhone encoded data:
FindPhone>SIL IPA93
FindPhone>UNICODE
SAG Indic
Contains encoding converter map(s) for the following encoding/font:
Converter Name
Encoding Name
Font Names
Annapurna<>UNICODE
SIL-ANNAPURNA_05-2002
Annapurna
SAG IPA<>UNICODE
SIL-SAG-IPA
SAG-IPA SILDoulos
SAG-IPA SILManuscript
SAG-IPA SILSophia
SAG IPA Super<>UNICODE
SIL-SAG-IPA_Super
SAG-IPA Super SILCharis
SAG-IPA Super SILDoulos
SAG-IPA Super SILManuscript
SAG-IPA Super SILSophia
WinDTS Devanagari<>Unicode
SIL-WinDTS
WinDTS Devanagari
TransRoman<>UNICODE
SIL-SAG_TransRoman21-2002
TransRoman2 Charis
TransRoman2 Doulos
TransRoman2 Manuscript
TransRoman2 Sophia
AkrutiOriSarala99<>UNICODE
Oriya-AkrutiOriSarala-99
AkrutiOriSarala-99
Cameroon
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
Font Names
Cameroon<>UNICODE
Cameroon
Cam Cam SILDoulosL
Cam Cam SILSophiaL
Cam Cam SILManuscriptL
Cam2 Cam2 SILDoulos
Cam2 Cam2 SILSophia
Cam2 Cam2 SILManuscript
Cam Paratext SILDoulos
Cam Paratext SILSophia
Cam Paratext SILManuscript
Central Africa
Contains encoding converter map(s) for the following encoding:
Converter Name
Encoding Name
angb4<>UNICODE
SIL-angb4-2005
MarcelNgbaka<>UNICODE
SIL-MarcelNgbaka-2005
East Africa
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
Font Names
Times African<>UNICODE
Times African
Times African
Bantu Und<>UNICODE
Bantu Und
Bantu Und
top
Eastern Congo Group
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
Mayogo<>UNICODE
Mayogo
Komo<>UNICODE
Komo
KomoASCII to Unicode
KomoASCII
ECG<>UNICODE
ECG-Unicode(Jan.2005)
BuduASCII<>UNICODE
BuduASCII
BUDU<>UNICODE
BUDU
BheleASCII<>UNICODE
BheleASCII
top
West Africa
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
SIL-93linb-2005<>UNICODE
SIL-93linb-2005
UBS-Abidjan-2005<>UNICODE
UBS-Abidjan-2005
Bambara SIL Charis<>UNICODE
Bambara SIL Charis
SIL-BF Font Family-2005<>UNICODE
SIL-BF_Font_Family-2005
SIL-BF_Times-2006<>UNICODE
SIL-BF_Times-2006
X-SIL-Fulfulde<>UNICODE
X-SIL-Fulfulde
SIL-Ghana Doulos-2005<>UNICODE
SIL-Ghana_Doulos-2005
SIL-Mali Standard Font Family<>UNICODE
Mali Standard SILDoulos-2005
RCI Standard Doulos/Sophia/Manuscript<>UNICODE
SIL-RCI Standard-1994
X-SIL-Senufo<>UNICODE
X-SIL-Senufo
SIL-Karaboro-2006<>UNICODE
SIL-Karaboro-2006
SIL Samogho Doulos/Sophia/Manuscript<>UNICODE
SIL-Samogho-2006
SIL-Songhai-2006<>UNICODE
SIL-Songhai-2006
Tombouctou-Dutch<>UNICODE
SIL-Tombouctou-Dutch-2006
Burkina Faso Winye-2003<>UNICODE
SIL-Burkina_Winye_Unknown_Font-2005
top
Hebrew
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
Font Names
SIL Ezra<>UNICODE
SIL-HEBREW_STANDARD-1997
SIL Ezra
Hebrew Unicode 4.0<>Hebrew Unicode 5.0
SIL-HEBREW_Unicode_40-2004
Modifies Unicode Hebrew from 4.0 to 5.0
top
Indic Converters
ISCII Encodings
The following ISCII encodings are supported:
Converter Name
ISCII Devanagari<>UNICODE
ISCII Bengali<>UNICODE
ISCII Gurmukhi<>UNICODE
ISCII Gujarati<>UNICODE
ISCII Oriya<>UNICODE
ISCII Tamil<>UNICODE
ISCII Kannada<>UNICODE
ISCII Malayalam<>UNICODE
Himalli
The following Himalli encodings are supported:
Converter Name
Encoding Name
Font Names
HimaliNew Devanagari<>UNICODE
Devanagari-HimaliNew
For use with the Himali New font
Himallill Devanagari (Mac)<>UNICODE
Devanagari-HimallillMac-1999
For use with files that use the Mac version of Himallill font
Himallill Devanagari (PC 2001)<>UNICODE
Devanagari-HimallillPC-2001
For use with PC files using the Himallill font named Himallil.ttf, dated 11-Dec-2001
Himalli Devanagari (Mac)<>UNICODE
Devanagari-HimalliMac-1999
For use with files that use the Mac version of Himalli font
Himalli Devanagari (PC 1998)<>UNICODE
Devanagari-HimalliPC-1998
For use with PC files using the PC Himalli font named himalli.ttf dated 12-May-1998
Himalli Devanagari (PC 2002)<>UNICODE
Devanagari-HimalliPC-2002
For use with PC files using the PC Himalli font named himalli_.ttf (note underscore) dated 18-Dec-2002
Miscellaneous
TECkit
Converters
This package contains
TECkit
maps for the following Indic encodings:
Converter Name
Font Names
GujaratiLS<>UNICODE
GujaratiLS
KrutiDev010<>UNICODE
KrutiDev010
KrutiDev011<>UNICODE
KrutiDev011
KrutiDev290<>UNICODE
KrutiDev290
Kantipur Devanagari<>Unicode
Kantipur
Preeti Devanagari<>Unicode
Preeti
Shusha<>Unicode
Shusha
Tibetan Modern A<>Unicode
Tibetan Modern A
UniDevanagri<>UniIPA (phonetic)
Transliteration between Unicode Devanagari and Unicode IPA (phonetic) representation
top
Papua New Guinea
Contains encoding converter map(s) for the following encoding/fonts:
Converter Name
Encoding Name
Font Names
SIL PNG<>UNICODE
SIL-PNG_Fonts-1998
PNG SILCharis
PNG SILDoulos
PNG SILManuscript
PNG SILSophia Lit
PNG SILCharis Lit
PNG SILSophia CQLit
top
NLCI (India)
Contains encoding converter map(s) for the following encoding/font:
Converter Name
Encoding Name
Font Names
SL Oriya<>UNICODE
NLCI-SLOriya
Winscript/iLeap Devanagari<>UNICODE
CDAC-ISFOC_DEVANAGARI
DEV Panini
DV-TTYogesh
Winscript/iLeap Gujarati<>UNICODE
CDAC-ISFOC_GUJARATI
GUJ Gir
Winscript Malayalam<>UNICODE
NLCI-Malayalam
MAL Vayalar
Winscript Oriya<>UNICODE
NLCI-Oriya
ORI Asika
Winscript Tamil<>UNICODE
NLCI-Tamil
TAM Thiruvalluvar
Winscript Telugu<>UNICODE
NLCI-Telugu
TEL Nirmal
top
Downloads for Previous Versions
SIL Converters x86 installer (for 32-bit Office) 5.3.1.0
for Windows
EXE | 70.52 MB | 1 Dec 2024
SIL Converters x64 installer (for 64-bit Office and Paratext Plug-in) 5.3.1.0
for Windows
EXE | 70.99 MB | 1 Dec 2024
SIL Converters x86 installer (for 32-bit Office) 5.3.0.0
for Windows
EXE | 70.51 MB | 19 Sep 2024
SIL Converters x64 installer (for 64-bit Office and Paratext Plug-in) 5.3.0.0
for Windows
EXE | 70.98 MB | 19 Sep 2024
SIL Converters x86 installer (for 32-bit Office) 5.2.8.0
for Windows
EXE | 70.28 MB | 19 Jul 2024
SIL Converters x64 installer (for 64-bit Office and Paratext Plug-in) 5.2.8.0
for Windows
EXE | 70.75 MB | 19 Jul 2024
SIL Converters x86 installer (for 32-bit Office) 5.2.5.0
for Windows
EXE | 65.63 MB | 9 Mar 2024
SIL Converters x64 installer (for 64-bit Office) 5.2.5.0
for Windows
EXE | 70.65 MB | 9 Mar 2024
SIL Converters x86 installer (for 32-bit Office) 5.2.0.0
for Windows
EXE | 57.49 MB | 1 Sep 2023
SIL Converters x64 installer (for 64-bit Office) 5.2.0.0
for Windows
EXE | 62.50 MB | 1 Sep 2023
SIL Converters x86 installer (for 32-bit Office) 5.1.3
for Windows
EXE | 57.45 MB | 13 Apr 2023
SIL Converters x64 installer (for 64-bit Office) 5.1.3
for Windows
EXE | 62.46 MB | 13 Apr 2023
SIL Converters x86 installer (for 32-bit Office) 5.0
for Windows
EXE | 55.48 MB | 13 Oct 2021
SIL Converters x64 installer (for 64-bit Office) 5.0
for Windows
EXE | 60.45 MB | 13 Oct 2021
SIL Converters Standalone installer (includes addons like .NET) for offline installation (EXE file) 4.0
for Windows
EXE | 76.51 MB | 29 Aug 2011
SIL Converters Package only (no addons) for offline installation (EXE file) 4.0
for Windows
EXE | 25.42 MB | 29 Aug 2011
SIL Converters 3.1 (interactive web-based installer) 3.1
for Windows
EXE | 247.3 KB | 30 Jul 2010
SIL Converters 3.1 package only (no addons) for offline installation (EXE file) 3.1
for Windows
EXE | 24.95 MB | 30 Jul 2010
Upgrades in previous versions
For version 5.3.1.0
Includes the ‘Translation Helper’ dialog for use in the Clipboard EncConverter application.
Also includes the new multi-regular expression .Net converter that allows multiple .Net Regular Expressions in a single converter.
Other Transducers/Translators supported: Python 3.x, NLLB (hosted locally), Azure Open AI, Vertex AI Translators (including support for the Gemini™ LLMs; e.g. gemini-1.5-flash and gemini-1.5-pro). And the new Paratext Project EncConverter for use in the Paratext BackTranslation Helper dialog (e.g. to use a Serval-generated draft as a resource).
For version 5.2.8.0
Upgrade nuget package version to remove a vulnerability in v5.2.7.0 and v5.2.6.0
Includes support for Python 3.x, NLLB (hosted locally), Azure Open AI, and Vertex AI Translators.
For version 5.2.7.0 (this version is no longer available because of a vulnerability, please upgrade)
Bug fixes for
Paratext
functioning.
For version 5.2.6.0 (this version is no longer available because of a vulnerability, please upgrade)
Includes support for Python 3.x, NLLB (hosted locally), Azure Open AI, Vertex AI Translators, and the new
Paratext
Project EncConverter for use in the
Paratext
BackTranslation Helper dialog (e.g. to use a Serval-generated draft as a resource). See
Paratext Back Translation Helper plug-in tutorial
For version 5.2.5.0
Includes support for Python 3.x, NLLB (hosted locally), Azure Open AI, and Vertex AI Translators.
For version 5.2.0
Fix an issue with the Clear Bible
Paratext
plugin.
Updated
TECkit
to support Unicode 15.
For version 5.1
Includes support for the new Bing and DeepL Translators and the
Paratext
Back Translation Helper Plug-in. See
Paratext Back Translation Helper plug-in tutorial
Warning:
Do
not
use the Bing or DeepL translator as the ‘Transliterator’ in a
Paratext
‘Transliteration (using Encoding Converter)’ project. Instead, use the
Back Translation Helper
plug-in which works on a verse by verse basis.
For version 5.0
Support 64 bit version of Microsoft Office, as well as continued support for 32 bit versions. See the documentation below for how to determine which bitness of Microsoft Office you have, and therefore which installer you should download.
Support newer versions of Microsoft Office (including 2019 and 365).
TECkit
mapping editor can now show characters outside of the Basic Multilingual Plane (BMP), that is, Unicode characters above U+FFFF.
Updated
TECkit
maps.
Updated
TECkit
to support Unicode 13.
For version 4.0
This version (4.0) was released to fix various bugs including most significantly the removal of the core EncConverters assembly from the Global Assembly Cache. From v4.0 and following, client applications will redistribute the core assemblies
ECInterfaces.dll
and
SilEncConverters.dll
with their applications directly. They can still share the same global system repository of activated converters, but there will be less dependency between the various clients in terms of release requirements.
As of v3.1.1, a new transduction engine was added that provides support for the webpage-based converters in the
Scientific and Technical Hindi
Google Site
. The
Converters
section of that group contains a number of webpage-based encoding converters and transliterators for numerous Indic legacy encodings. The conversion code embedded in these web-pages can now be used to convert data with any SIL Converters client application (e.g. the Bulk Word Document Converter) by using the new Technical Hindi Html EncConverter Add-in. To activate this new add-in be sure to check the
Maps and Tables
Indic converters
feature during installation. Once installed, you can read
Help for Technical Hindi (Google group) Html Converter Plug-in
(in
Start
All Programs
SIL Converters
Help
) for further instructions.
The
Bulk Word Document Converter
was also updated to fix a few problems related to converting text that was inserted into a document a single character at a time (thru the
Insert Symbol
command).
SILConverters 4.0 corresponds to the same version of the core EncConverters assembly as
Fieldworks 7.1
and overcomes the uninstallation problem previously encountered in
Speech Analyzer 3.0.1
and
Phonology Assistant 3.0
. If you uninstalled any of the applications which used the earlier version of the EncConverters core, it would become unavailable to the other applications that used the same, earlier version of the EncConverters core until an installation
Repair
was done.
top
The Bulk Word Document Converter has been enhanced by adding a search feature which will search your hard-drive for documents containing specific fonts to be converted.
The
TECkit
Map Unicode Editor has also been enhanced to show character maps for both the left and right-hand side of a conversion so that a point-and-click approach to encoding conversion can be used in developing the map.
Related Resources
SIL Converters – much more than meets the eye!
— This webinar highlights many valuable applications of the SIL Converters tool
Slideshow
used in above video
OpenOffice Linguistic Tools
— this tool provides similar resources for LibreOffice
TECkit
— a Text Encoding Conversion toolkit
To compose or decompose, that is the question: Whether ’tis nobler in the mind to suffer NFC or NFD…
top
Contact
If you would like to report a problem, you can create an issue in SIL Converter’s
issue tracker
. Or, you can send an email via the contact form below.
Most SIL software is free to use, modify and redistribute according to the terms of open licenses such as the
SIL Open Font License
and the
MIT License
. Many of our open projects are hosted on our
Language Software development
Writing Systems Technology
, and
Bloom
GitHub pages.