A compiler for formal metadata
Tools for creation of formal metadata
mp: A compiler for formal metadata
This program is a compiler to parse formal metadata, checking the
syntax against the
FGDC Content Standard for Digital Geospatial Metadata
and
generating output suitable for viewing with a web browser or text
editor. It runs on Linux and UNIX systems and on PC's running all
versions of Microsoft Windows (95 and up including XP). MP generates
a textual report indicating errors in the metadata, primarily in the
structure but also in the values of some of the scalar elements (that
is, those whose values are restricted by the standard).
The compiler, its source code, executables for UNIX (Solaris and Linux)
and Microsoft Windows, and
its
own formal metadata
are available through

A separate document shows the
revision history of mp
(that page is generated automatically by extracting the information from
Process_Step elements found in mp's metadata).
Getting the software
Source:
src.tar.gz
Gzipped tar file containing source code of all programs and documentation
Executables:
mp-2.9.50.zip
Zip package for Microsoft Windows
Web version:
Metadata validation service
No need to download; this may be all you need
Diagram showing mp's function
Usage
Basic usage is
mp [options] input_file
where
input_file
is the name of a text file containing metadata
encoded as described in the
encoding format
document
or in SGML conforming to a specific
Document Type Definition (DTD)
. These command-line options are available:
-c cfile
obtains configuration information from
cfile
-e efile
directs syntax errors to
efile
-t tfile
creates text output in
tfile
-h hfile
creates html output in
hfile
-f ffile
creates FAQ-style html output in
ffile
-s sfile
creates sgml output in
sfile
-d dfile
creates DIF output in
dfile
-x xfile
creates XML output in
xfile
-l code
indicates element names are in the
language
identified by
code
Note that these symbols
cfile
ffile
and so forth
represent the full names of files appropriate for your operating
system, and may include directory path separators (on Windows this
is the backslash
and on Unix the slash
).
Syntax error messages indicate the nature of discrepancies between
the input file and the standard, and the line numbers of the relevant
elements in the input file. If
-e efile
is not specified,
syntax errors are written to
stderr
, which is usually the console
(for MS-DOS) or the terminal from which the compiler is launched.
Input
mp can read
Indented text
Since the FGDC Content Standard for Digital Geospatial Metadata, as
the name implies, specifies only the contents of metadata files and
not their encoding, it was necessary to devise a specification for
metadata encoding in order to develop and use this compiler. That
encoding format is purely textual and the fidelity of the compiler
to this format is fanatical.
XML or SGML
The FGDC metadata standard was written before XML existed. At that
time, SGML did exist but its use was not common outside some
specific communities such as the publishing community (remember
Ventura Publisher?). Since then, XML has become more widely
accepted and supported, and mp was modified to read
well-formed
XML
documents that use specified element names arranged in accord
with the
Document Type Definition
Note:
mp
does
not
read word-processor documents, it only
reads plain text, SGML, and XML!
Output
Aside from the error report, output is generated only in the
formats you specifically request (meaning that no other output is
generated by default). Use the command-line options to specify
the output file names and formats that you want.
Examples
In these examples, the name of the input file is
catfish.met
mp catfish.met -e catfish.err
A report of errors is written to the file
catfish.err
mp catfish.met -e catfish.err -h catfish.html
In addition to the error report, an outline-style HTML page is
created, named
catfish.html
mp catfish.met -e catfish.err -h catfish.html -f catfish.faq.html
In addition to the error report and outline-style HTML page, a
FAQ-style HTML page is created, named
catfish.faq.html
Because the command line can become long and complex, you can
simplify the process by
using a
configuration file
Notes on output formats
Text
output, if requested, follows the
encoding format
. This provides a check of the compiler; any such
program should be able to reproduce its input without significant loss of
information.
HTML
output, if requested, uses descriptive lists to arrange the elements
hierarchically. The
HEAD
element of the metadata contains
META
elements corresponding to the
Dublin
Core
FAQ-style HTML
output, if requested, uses the general arrangement
of information found in
Metadata in Plain Language
and re-expresses the metadata in a manner that is easier to read. This
format is not parseable in subsequent software processing. To see how
mp writes standard metadata elements in this output format, consult
a specially built input file
the config file used to process it with mp
, and
the FAQ-style output
that mp generated for it.
Dublin core elements
are added to the HEAD
element as META tags.
NOTE:
mp now provides in its HTML output a link to each of
the other output formats that you requested when running mp. These
links are relative to the current directory by default, and will work
correctly when someone retrieves a metadata record directly through
a web server. However, HTML metadata records retrieved through the
Clearinghouse gateway interface come tagged with the URL of the gateway,
consequently these links will not work by default with HTML records
found through the gateway interface. To make these links work without
regard to the retrieval method, place a
BASE
tag into the
HEAD
element of the output HTML code. As you might guess,
mp can do this for you, but it needs to know the URL where your
metadata will be available as web pages. It gets this information
from a
config file
entry as follows:
output:
html:
base:
URL
So if your web site has a URL like
that will contain your metadata records, put this into your config file:
output:
html:
base: http://www.our-data.org/metadata/
Obviously you have to use the
-c config_file
command line option
for mp, substituting for
config_file
the name of the actual config
file you'll be using.
SGML
output uses the eight-character tags proposed by
the FGDC Clearinghouse Working Group. The SGML output is designed to work
with a
SGML Document Type Definition (DTD)
that I have
developed and tested.
XML
output uses the eight-character tags given in the 1998
version of the CSDGM. The XML output is designed to work with a
XML Document Type Definition (DTD)
that
I have developed and tested.
NOTE:
the DTD won't
be displayed in the main browser window
because it consists of DTD
entries, all of which begin with < and end with >. The web browser
thinks these are some weird kind of HTML tag, so it ignores all of them,
leaving you with nothing. To see the DTD, choose Page Source from the
View menu of the browser. Or if you choose Save As from the File menu,
you can open the DTD in a text editor.
Directory
Interchange Format (DIF)
output will require editing to fix
inconsistencies between the DIF and FGDC metadata standards, and to add
information required by DIF that is not clearly identified in the FGDC
scheme, such as
Entry_ID
Language support
mp is able to interpret metadata element names written in several
languages. To use this feature, include the
-l
option
on the command line or the
language
option within the
input
section of the
config file
In each case the language is specified using a 2-letter abbreviation
from the following table:
en
English (default)
es
Spanish
ca
Catalan
id
Indonesian
fr
French
de
German
pt
Portuguese
tr
Turkish
Translations should be regarded as a work in progress. I seek help
with them, especially with languages that are not currently implemented
and with new profiles and extensions. A
separate
table
shows the current translation of element names in the various
languages supported. Feel free to discuss with me any ways you would
like to help with this effort.
Technical contact:
Peter N. Schweitzer
Mail Stop 954, National Center
U.S. Geological Survey
Reston, VA 20192

Tel: (703) 648-6533
FAX: (703) 648-6252
Email:
pschweitzer@usgs.gov