Books by Christian Chiarcos
Contributions to the 3rd Workshop on Linked Data in Linguistics (LDL-2014) and the associated dat... more Contributions to the 3rd Workshop on Linked Data in Linguistics (LDL-2014) and the associated data challenge, held in conjunction with the 9th Language Resource and Evaluation Conference (LREC-2014), May 2014, Reykjavik, Iceland
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data
Papers of the 2nd Workshop on Linked Data in Linguistics (LDL-2013), held in conjunction with the... more Papers of the 2nd Workshop on Linked Data in Linguistics (LDL-2013), held in conjunction with the 6th Conference on Generative Approaches to the Lexicon (GL-2013), Pisa, Italy, Sep 2013.

Linked Data in Linguistics
The explosion of information technology has led to substantial growth of web-accessible linguisti... more The explosion of information technology has led to substantial growth of web-accessible linguistic data in terms of quantity, diversity and complexity. These resources become even more useful when interlinked with each other to generate network effects.
The general trend of providing data online is thus accompanied by newly developing methodologies to interconnect linguistic data and metadata. This includes linguistic data collections, general-purpose knowledge bases (e.g., the DBpedia, a machine-readable edition of the Wikipedia), and repositories with specific information about languages, linguistic categories and phenomena. The Linked Data paradigm provides a framework for interoperability and access management, and thereby allows to integrate information from such a diverse set of resources.
The contributions assembled in this volume illustrate the band-width of applications of the Linked Data paradigm for representative types of language resources. They cover lexical-semantic resources, annotated corpora, typological databases as well as terminology and metadata repositories. The book includes representative applications from diverse fields, ranging from academic linguistics (e.g., typology and corpus linguistics) over applied linguistics (e.g., lexicography and translation studies) to technical applications (in computational linguistics, Natural Language Processing and information technology).
This volume accompanies the Workshop on Linked Data in Linguistics 2012 (LDL-2012) in Frankfurt/M., Germany, organized by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN). It assembles contributions of the workshop participants and, beyond this, it summarizes initial steps in the formation of a Linked Open Data cloud of linguistic resources, the Linguistic Linked Open Data cloud (LLOD).
Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning: Processing Texts Automatically
The book contains the papers presented at the 2009 bi-annual conference of the German Society for... more The book contains the papers presented at the 2009 bi-annual conference of the German Society for Computational Linguistics and Language Technology (GSCL), aswell as those of the associated 2nd UIMA@GSCL workshop. The main theme of the conference was computational approaches to text processing, and in conjunction with the UIMA (Unstructured Information Management Architecture) workshop, this volume offers an overview of current work on both theoretical and practical aspects of textdocument processing.

Proceedings of the 10th Biannual Meeting of the German Society for Cognitive Science (KogWis 2010)
As the latest biannual meeting of the German Society for Cognitive Science (Gesellschaft für Kogn... more As the latest biannual meeting of the German Society for Cognitive Science (Gesellschaft für Kognitionswissenschaft, GK), KogWis 2010 at Potsdam University reflects the current trends in a fascinating domain of research concerned with human and artificial cognition and the interaction of mind and brain. The Plenary talks provide a venue for questions of the numerical capacities and human arithmetic (Brian Butterworth), of the theoretical development of cognitive architectures and intelligent virtual agents (Pat Langley), of categorizations induced by linguistic constructions (Claudia Maienborn), and of a cross-level account of the “Self as a complex system“ (Paul Thagard). KogWis 2010 integrates a wealth of experimental research, cognitive modelling, and conceptual analysis in 5 invited symposia, over 150 individual talks, 6 symposia, and more than 40 poster contributions. Some of the invited symposia reflect local and regional strenghts of research in the Berlin-Brandenburg area: the two largests research fields of the university Cognitive Sciences Area of Excellence in Potsdam are represented by an invited symposium on “Information Structure” by the Special Research Area 632 (“Sonderforschungsbereich”, SFB) of the same name, of Potsdam University and Humboldt-University Berlin, and by a satellite conference of the research group “Mind and Brain Dynamics”. The Berlin School of Mind and Brain at Humboldt-University Berlin takes part with an invited symposium on “Decision Making” from a perspective of cognitive neuroscience and philosophy and the DFG Cluster of Excellence “Languages of Emotion” of Free University presents interdisciplinary research results in an invited symposium on “Symbolising Emotions”.

Salience. Multidisciplinary Perspectives on Its Function in Discourse
The volume addresses the role of salience in discourse and provides broad coverage of various per... more The volume addresses the role of salience in discourse and provides broad coverage of various perspectives on and functions of discourse salience. The range of multidisciplinary approaches adopted in the volume differ with regard to the underlying theoretical proposals and foci of research. The topics range from (i) entity-based salience to (ii) discourse-structural salience of utterances to (iii) extra-linguistic factors of salience in discourse. Accordingly, the volume is organized into three sections. Part I focuses on discourse referents and the choice of referring expressions. The contributions cover issues such as salience and demonstrativity in Russian, discourse salience and grammatical voice in the West Siberian language Eastern Khanty, the joined information of syntactic and semantic prominence, and a computational framework of salience metrics. The contributions to Part II are concerned with linguistic structures at or above the clause level. The salience of discourse segments is addressed with respect to the translation of discourse relations and position of verb arguments in Old High German. Part III extends the scope beyond purely linguistic phenomena and deals with the role of extra-linguistic salience in discourse processing. Visual salience in a situated-dialog context, salience marking by hypertextual links, and extra-linguistic salience derived from a mental representation of the described situation are all discussed here. The notion of salience is of relevance to discourse studies in theoretical linguistics, computational linguistics, as well as psycholinguistics.
Papers by Christian Chiarcos

OntoLex, the dominant community standard for machine-readable lexical resources in the context of... more OntoLex, the dominant community standard for machine-readable lexical resources in the context of RDF, Linked Data and Semantic Web technologies, is currently extended with a designated module for Frequency, Attestations and Corpus-based Information (OntoLex-FrAC). We propose a novel component for OntoLex-FrAC, addressing the incorporation of corpus queries for (a) linking dictionaries with corpus engines, (b) enabling RDF-based web services to exchange corpus queries and responses data dynamically, and (c) using conventional query languages to formalize the internal structure of collocations, word sketches, and colligations. The primary field of application of the query extension is in digital lexicography and corpus linguistics, and we present a proof-of-principle implementation in backend components of a novel platform designed to support digital lexicography for the Serbian language.
Rasprave: Časopis Instituta za Hrvatski Jezik i Jezikoslovlje, Dec 31, 2022
Semantic web, Aug 7, 2015
This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets curre... more This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. The OLiA ontologies represent a repository of annotation terminology for various linguistic phenomena on a great band-width of languages, they have been used to facilitate interoperability and information integration of linguistic annotations in corpora, NLP pipelines, and lexical-semantic resources.

The explosion of information technology has led to a substantial growth in quantity, diversity an... more The explosion of information technology has led to a substantial growth in quantity, diversity and complexity of linguistic data accessible over the internet. The lack of interoperability between linguistic and language resources represents a major challenge that needs to be addressed, in particular, if information from different sources is to be combined, like, say, machine-readable lexicons, corpus data and terminology repositories. For these types of resources, domain-specific standards have been proposed, yet, issues of interoperability between different types of resources persist, commonly accepted strategies to distribute, access and integrate their information have yet to be established, and technologies and infrastructures to address both aspects are still under development. The goal of the 2nd Workshop on Linked Data in Linguistics (LDL-2013) has been to bring together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections, including corpora, dictionaries, lexical networks, translation memories, thesauri, etc., infrastructures developed on that basis, their use of existing standards, and the publication and distribution policies that were adopted.
Converting Language Resources into Linked Data
Springer eBooks, 2020
In previous chapters, we discussed how to model linguistic data sets using the Resource Descripti... more In previous chapters, we discussed how to model linguistic data sets using the Resource Description Framework as a basis to publish them as linked data on the Web. In this chapter, we describe a methodology that can be followed in the transformation of legacy linguistic datasets into linked data. The methodology comprises of different tasks, including the specification, modelling, generation, linking, publication and exploitation of the data. We will discuss specific guidelines that can be applied in the transformation of particular types of resources, such as bilingual/multilingual dictionaries, WordNets, terminologies and corpora.

Procesamiento Del Lenguaje Natural, 2008
Resumen: Presentamos SPLICR, una plataforma de sostenibilidad para corpus y recursos lingüísticos... more Resumen: Presentamos SPLICR, una plataforma de sostenibilidad para corpus y recursos lingüísticos basada en web. El sistema está destinado a personas que trabajan en el campo de la lingüística o de la lingüística computacional. Consiste en una base de datos extensa para metadatos que puede ser explorada para buscar recursos lingüísticos, que pudieran ser apropiados para las necesidades específicas de una investigación. SPLICR también ofrece una interfaz gráfica, que permite a los usuarios buscar y visualizar los corpus. El proyecto, en el que se ha desarollado el sistema, aspira a archivar de modo sostenible aproximadamente sesenta recursos lingüísticos, que han sido construidos mediante la colaboración de tres centros de investigación. Nuestro proyecto tiene dos metas principales: (a) Procesar y archivar recursos de forma sostenible, de manera que los recursos sigan siendo accesibles para la comunidad científica dentro de cinco, diez, o incluso veinte años. (b) El permitir a los investigadores buscar en los recursos tanto a nivel de metadatos como a nivel de anotaciones lingüísticas. En términos más generales, nuestro objetivo es proporcionar soluciones que posibiliten la interoperabilidad, reutilización y sostenibilidad de compilaciones heterogéneas de recursos de lenguaje.
Language, Data and Knowledge, 2019
We introduce AnnoHub, an on-going effort to automatically complement existing language resources ... more We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.

Language Resources and Evaluation, May 1, 2018
Since 2013, the thesaurus of the Bibliography of Linguistic Literature (BLL Thesaurus) has been a... more Since 2013, the thesaurus of the Bibliography of Linguistic Literature (BLL Thesaurus) has been applied in the context of the Lin gu is tik portal, a hub for linguistically relevant information. Several consecutive projects focus on the modeling of the BLL Thesaurus as ontology and its linking to terminological repositories in the Linguistic Linked Open Data (LLOD) cloud. Those mappings facilitate the connection between the Lin gu is tik portal and the cloud. In the paper, we describe the current efforts to establish interoperability between the language-related index terms and repositories providing language identifiers for the web of Linked Data. After an introduction of Lexvo and Glottolog, we outline the scope, the structure, and the peculiarities of the BLL Thesaurus. We discuss the challenges for the design of scientifically plausible language classification and the linking between divergent classifications. We describe the prototype of the linking model and propose pragmatic solutions for structural or conceptual conflicts. Additionally, we depict the benefits from the envisaged interoperability -for the Lin gu is tik portal, and the Linked Open Data Community in general.
Etymology Meets Linked Data. A Case Study In Turkic
DH, 2016

Trait. Autom. des Langues, 2008
We present a general framework for integrating annotations from different tools and tag sets. Whe... more We present a general framework for integrating annotations from different tools and tag sets. When annotating corpora at multiple linguistic levels, annotators may use different expert tools for different phenomena or types of annotation. These tools employ different data models and accompanying approaches to visualization, and they produce different output formats. For the purposes of uniformly processing these outputs, we developed a pivot format called PAULA, along with converters to and from tool formats. Different annotations are not only integrated at the level of data format, but are also joined on the level of conceptual representation. For this purpose, we introduce OLiA, an ontology of linguistic annotations that mediates between alternative tag sets that cover the same class of linguistic phenomena. All components are integrated in the linguistic information system ANNIS: Annotation tool output is converted to the pivot format PAULA and read into a database where the data can be visualized, queried, and evaluated across multiple layers. For cross-tag set querying and statistical
Linguistic Annotation Workshop, Jun 23, 2011
This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora ... more This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora and lexicons as an OWL/DL ontology. Formalizing annotation schemes in OWL/DL has the advantages of enabling formally specifying interrelationships between the various features and making logical inferences based on the relationships between them. We show that this approach provides us with a top-down perspective on a large set of morphosyntactic specifications for multiple languages, and that this perspective helps to identify and to resolve conceptual problems in the original specifications. Furthermore, the ontological modeling allows us to link the MULTEXT-East specifications with repositories of annotation terminology such as the General Ontology of Linguistics Descriptions or the ISO TC37/SC4 Data Category Registry.
Zenodo (CERN European Organization for Nuclear Research), May 13, 2020
Modelling Linguistic Annotations
Springer eBooks, 2020
This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and N... more This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.
Journal for Language Technology and Computational Linguistics, Jul 1, 2008
This paper describes development and design of an ontology of linguistic annotations, primarily w... more This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing terminological resources (e.g. GOLD). The ontology is intended to be a platform for terminological integration, integrated representation and ontology-based search across existing linguistic resources with terminologically heterogeneous annotations. Further, it can be applied to augment the semantic analysis of a given text with an ontological interpretation of its morphosyntactic analysis.
Uploads
Books by Christian Chiarcos
The general trend of providing data online is thus accompanied by newly developing methodologies to interconnect linguistic data and metadata. This includes linguistic data collections, general-purpose knowledge bases (e.g., the DBpedia, a machine-readable edition of the Wikipedia), and repositories with specific information about languages, linguistic categories and phenomena. The Linked Data paradigm provides a framework for interoperability and access management, and thereby allows to integrate information from such a diverse set of resources.
The contributions assembled in this volume illustrate the band-width of applications of the Linked Data paradigm for representative types of language resources. They cover lexical-semantic resources, annotated corpora, typological databases as well as terminology and metadata repositories. The book includes representative applications from diverse fields, ranging from academic linguistics (e.g., typology and corpus linguistics) over applied linguistics (e.g., lexicography and translation studies) to technical applications (in computational linguistics, Natural Language Processing and information technology).
This volume accompanies the Workshop on Linked Data in Linguistics 2012 (LDL-2012) in Frankfurt/M., Germany, organized by the Open Linguistics Working Group (OWLG) of the Open Knowledge Foundation (OKFN). It assembles contributions of the workshop participants and, beyond this, it summarizes initial steps in the formation of a Linked Open Data cloud of linguistic resources, the Linguistic Linked Open Data cloud (LLOD).
Papers by Christian Chiarcos