Papers by Diana Trandabat
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017
This paper presents the participation of #WarTeam in Task 6 of SemEval2017 with a system classify... more This paper presents the participation of #WarTeam in Task 6 of SemEval2017 with a system classifying humor by comparing and ranking tweets. The training data consists of annotated tweets from the @midnight TV show. #WarTeam's system uses a neural network (TensorFlow) having inputs from a Naïve Bayes humor classifier and a sentiment analyzer.

Towards Building Knowledge Resources from Social Media Using Semantic Roles
Research and Advanced Technology for Digital Libraries, 2017
Text semantics is a well-hidden treasure, whose deciphering requires deep understanding. Artifici... more Text semantics is a well-hidden treasure, whose deciphering requires deep understanding. Artificial Intelligence enhances computers with human-like judgments, thus decoding the covered message and sharing it between machines is one of the main challenges that the computational linguistics domain faces nowadays. In an attempt to learn how humans communicate, computers use language models derived from human knowledge. While still far from completely understanding insinuated messages in political discourses, computer scientists and linguists have joined efforts in modeling a human-like linguistic behavior. This paper aims to introduce the VoxPopuli platform, an instrument to collect user generated content, to analyze it and to generate a map of semantically-related concepts to capturing crowd intelligence.

The paper aims to present the methodology of the platform “Ethnolinguistic audio-visual atlas of ... more The paper aims to present the methodology of the platform “Ethnolinguistic audio-visual atlas of the cultural food heritage of Bacӑu County – eCULTFOOD Atlas”, the main product of the project The Digitization of the Cultural Food Heritage. The Region of Bacău – eCULTFOOD (PNIII-P2-2.1-BG-2016-0390). The platform eCULTFOOD Atlas is a comprehensive database containing the results of field research and scientific documentation on local cultural food traditions. It includes a representative corpus of audio-visual documents recording the traditional food cultural heritage based on surveys involving the older generation from the rural county of Bacӑu, Romania. The eCULTFOOD Atlas meets the aspirations of EU policies that regard the digitization of cultural resources as a key factor that would contribute to improving accessibility and undivided flow of information in a knowledge economy. Once transposed into electronic format, the cultural food heritage of Bacӑu County may become a resourc...

Procedia Computer Science, 2017
One of the most challenging tasks in human-computer communication is the decomposition of meaning... more One of the most challenging tasks in human-computer communication is the decomposition of meaning. The theory of semantic frames allows for the identification of the roles that various constituents have in an event: the doer of the action, the receiver of the action, the person towards whom the action is directed, the means and purposes of an action, etc. Through this paper, we propose to introduce semantic frames in eLearning contexts, with the conviction that users may find it easier to learn concepts if they are offered in a semantically related manner. In order to achieve this, we propose a system that, for every concept searched by the user, offers a network of concepts, by analyzing the semantic relations which appear between concepts. In other words, the proposed system starts with a concept, retrieves sentences containing it from the collection of learning materials and identifies the semantic relations between the considered concept and the ones found in their neighborhood using semantic role labeling. Additional information is completed using DBpedia's knowledge base before establishing the final network of relations.
Identifying Semantic Events in Unstructured Text
Lecture Notes in Computer Science, 2015
Semantics has always been considered the hidden treasure of texts, accessible only to humans. Art... more Semantics has always been considered the hidden treasure of texts, accessible only to humans. Artificial intelligence struggles to enrich machines with human features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the natural language domain faces nowadays. This paper represents a further step in this direction, by proposing an automatic approach to extract information about events from unstructured texts by using semantic role labeling.
Natural Language Processing Using Semantic Frames
Page 1. University Alexandru Ioan Cuza of Iasi, Romania ... Foremost, I would like to thank my ... more Page 1. University Alexandru Ioan Cuza of Iasi, Romania ... Foremost, I would like to thank my advisor, Prof. Dan Cristea, for in-... ence, each of them offering me memorable experiences: Adi Iftene, Ionut Pistol, Maria Husarciuc, Alex Moruz, Marius R˘aschip, Lucian ...
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018
The Curative Power of Medical Data
Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, 2018
In an era when massive amounts of medical data became available, researchers working in biologica... more In an era when massive amounts of medical data became available, researchers working in biological, biomedical and clinical domains have increasingly started to require the help of language engineers to process large quantities of biomedical and molecular biology literature, patient data or health records. With such a huge amount of reports, evaluating their impact has long seized to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and discovery of structured clinical information and foster a major leap in natural language processing and health research

This paper presents a cross-linguistic analysis of the largest dictionaries currently existing fo... more This paper presents a cross-linguistic analysis of the largest dictionaries currently existing for Romanian, French, and German, and a new, robust and portable method for Dictionary Entry Parsing (DEP), based on SegmentationCohesion-Dependency (SCD) configurations. The SCD configurations are applied successively on each dictionary entry to identify its lexicographic segments (the first SCD configuration), to extract its sense tree (the second configuration), and to parse its atomic sense definitions (the third one). Using previous results on DLR (The Romanian Thesaurus – new format), the present paper adapts and applies the SCD-based technology to other four large and complex thesauri: DAR (The Romanian Thesaurus – old format), TLF (Le Tresor de la Langue Francaise), DWB (Deutsches Worterbuch – GRIMM), and GWB (GotheWorterbuch). This experiment is illustrated on significantly large parsed entries of these thesauri, and proved the following features: (1) the SCD-based method is a com...

Data, 2019
With the massive amounts of medical data made available online, language technologies have proven... more With the massive amounts of medical data made available online, language technologies have proven to be indispensable in processing biomedical and molecular biology literature, health data or patient records. With huge amount of reports, evaluating their impact has long ceased to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and the discovery of structured clinical information and could foster a major leap in natural language processing and in health research. The aim of this Special Issue, “Curative Power of Medical Data” in Data, is to gather innovative approaches for the exploitation of biomedical data using semantic web technologies and linked data by developing a community involvement in biomedical research. This Special Issue contains four surveys, which include a wide range of topics, from the analysis of biomedical articles writing style, to automatically generating tests from medical re...
Proceedings of The 12th International Workshop on Semantic Evaluation, 2018
The "Multilingual Emoji Prediction" task focuses on the ability of predicting the correspondent e... more The "Multilingual Emoji Prediction" task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), 2021
This paper presents a word-in-context disambiguation system. The task focuses on capturing the po... more This paper presents a word-in-context disambiguation system. The task focuses on capturing the polysemous nature of words in a multilingual and cross-lingual setting, without considering a strict inventory of word meanings. The system applies Natural Language Processing algorithms on datasets from SemEval 2021 Task 2, being able to identify the meaning of words for the languages Arabic, Chinese, English, French and Russian, without making use of any additional mono-or multilingual resources.

Precis: El método CONO-UC combina una escala que pondera las herramientas diagnósticas y estandar... more Precis: El método CONO-UC combina una escala que pondera las herramientas diagnósticas y estandariza el tratamiento de las NIE2+, minimizando el riesgo de manejo inadecuado por parte de especialistas jóvenes. Objetivos: En la actualidad, existe una alta tasa de sobre-tratamiento de lesiones precursoras cervicales, la cual, en su causalidad, depende de la inexperiencia del operador que toma las decisiones. El objetivo del presente trabajo fue desarrollar un método estandarizado de ponderación/juicio de variables diagnósticas y tratamiento útiles de ser usadas por especialistas jóvenes a fin de minimizar el riesgo de manejo inadecuado. Materiales y métodos: Se incluyeron 471 pacientes referidos por citología anormal y tratados mediante asa de LEEP. Se calcularon la sensibilidad, la especificidad, los valores predictivos y las relaciones de probabilidad para el diagnóstico de NIE2+ para cada uno de los métodos de diagnóstico. A cada residente se le enseñó un protocolo estandarizado de tratamiento mediante asa. Una vez identificados los mejores predictores, se construyó una escala de puntaje que ponderaba las variables y se definió mediante curva ROC el major punto de corte para la predicción de NIE2+. Las diferencias entre los grupos se compararon mediante Chi-cuadrado, ANOVA o t-test. Se construyó curva de fallas mediante el método de 1-Kaplan Meier. Resultados: La prevalencia de NIE2+ en esta cohorte fue 66%. La concordancia entre las pruebas diagnósticas fue baja, teniendo la colposcopia el peor valor predictivo positivo y el mayor riesgo de sobre-tratamiento. Para la escala de puntaje se incluyeron la edad, la citología, la colposcopia (estratificación basada en la extensión de compromiso por cuadrantes), la biopsia por mascada y la concordancia entre pruebas diagnósticas. Un puntaje≥ 9 asociado al uso de un protocolo estandarizado, obtuvo tasas de sobre-tratamiento <15%, de recurrencias de NIE2+ <5% a 5 años y una baja tasa de procedimientos sub-óptimos o con complicaciones (<2 %). Conclusiones: El método CONO-UC al combinar un sistema de puntaje integrado (punto de corte) con un protocolo estandarizado de excisión, permite minimizar el riesgo de sobretratamiento o tratamiento inadecuado, por parte de especialistas jóvenes, de lesiones preinvasoras del cuello uterino, reduciendo además el número de procedimientos indicados innecesariamente y manteniendo una alta tasa de éxito terapéutico. PALABRAS CLAVE: colposcopia, procedimiento de escisión electroquirúrgica (LEEP), puntaje, neoplasia intraepitelial cervical (NIE), conización.

This paper describes the on-going work carried out within the CoBiLiRo (Bimodal Corpus for Romani... more This paper describes the on-going work carried out within the CoBiLiRo (Bimodal Corpus for Romanian Language) research project, part of ReTeRom (Resources and Technologies for Developing Human-Machine Interfaces in Romanian). Data annotation finds increasing use in speech recognition and synthesis with the goal to support learning processes. In this context, a variety of different annotation systems for application to Speech and Text Processing environments have been presented. Even if many designs for the data annotations workflow have emerged, the process of handling metadata, to manage complex user-defined annotations, is not covered enough. We propose a design of the format aimed to serve as an annotation standard for bimodal resources, which facilitates searching, editing and statistical analysis operations over it. The design and implementation of an infrastructure that houses the resources are also presented. The goal is widening the dissemination of bimodal corpora for resea...
Eye and Voice Control for an Augmented Reality Cooking Experience
Semantics has always been considered the hidden treasure of texts, accessible only to humans. Art... more Semantics has always been considered the hidden treasure of texts, accessible only to humans. Artificial intelligence struggles to enrich machines with human features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the natural language domain faces nowadays. This paper represents a further step in this direction, by proposing an automatic approach to extract information from texts on the web by using semantic role labeling.

In order to develop a semantic labeling system, the most common methods use supervised learning f... more In order to develop a semantic labeling system, the most common methods use supervised learning from an annotated corpus. What if we have short deadlines and limited human and financial possibilities that prevent us from building such a training corpus for our language? If such a corpus already exists for any other language, this paper proposes a method to automatically import the existing corpus for the language we need. The transfer method is based on translating the existing corpus (or using annotated versions of existing parallel texts), aligning it at word level, and applying a set of mapping functions to import the annotation from one language to another. An import validation interface is also offered for the manual validation of the resulted resource. As an example, the case of semantic role import from the English FrameNet to Romanian is discussed. RÉSUMÉ. Afin de développer un système d'étiquetage sémantique automatique, les méthodes les plus fréquentes utilisent l'...
This paper presents the implementation of an educational game as a skill for Amazon’s software as... more This paper presents the implementation of an educational game as a skill for Amazon’s software assistant Alexa. The main motivation behind this work comes from the fact that learning through games and smart devices is commonly better received by children. The application we discuss generates test questions by extracting and analyzing information from two major knowledge resources, DBpedia and Wikidata. The questions can be automatically adapted to the knowledge level of the player through Computer Adaptive Testing (CAT). Specifically designed for Alexa as a skill, with intents, slots and sample utterances, users can interact with our application through Alexa voice service or through any smartphone, allowing the game to be played from home, school or any other place with an Internet connection.
Uploads
Papers by Diana Trandabat