(PDF) Multi-CAST Kalamang

(PDF) Multi-CAST Kalamang
Academia.edu uses cookies to personalize content, tailor ads and improve the user experience.
By using our site, you agree to our collection of information through the use of cookies.
To learn more, view our
Privacy Policy.
About
Press
Papers
We're Hiring!
Outline
Title
Abstract
Key Takeaways
References
FAQs
Multi-CAST Kalamang
Eline Visser
2021, Multi-CAST: Multilingual corpus of annotated spoken texts. Haig, Geoffrey & Schnell, Stefan (eds.)
visibility
description
14 pages
Sign up for access to the world's latest research
check
Get notified about relevant papers
check
Save papers to use in your research
check
Join the discussion with peers
check
Track your impact
Abstract
- six monologic, natural narrative texts
- more than 1000 clauses
- multiple levels of parallel annotations, time-aligned with audio recordings
- GRAID (Grammatical Relations and Animacy in Discourse, Haig & Schnell 2014) annotations
- RefIND (Referent Indexing in Natural Language Discourse, Schiborr et al. 2018) annotations
Key takeaways
AI
The Kalamang corpus contains over 1000 clauses with multiple layers of annotations.
GRAID and RefIND annotations facilitate understanding of grammatical relations and referent indexing.
Complex verb constructions in Kalamang can include locative or lative NPs, affecting predicate structure.
Kalamang employs a zero morpheme for 'give', annotated as ⟨0:pred⟩, indicating a unique morphological feature.
Elided predicates are common, with clauses lacking a predicate annotated as ⟨nc⟩.
Related papers
The penn discourse treebank 2.0 annotation manual
ALAN LEE
December, 2007
This report contains the guidelines for the annotation of discourse relations in the Penn Discourse Treebank (http://www. seas. upenn. edu/~ pdtb), PDTB. Discourse relations in the PDTB are annotated in a bottom up fashion, and capture both lexically realized relations as well as implicit relations. Guidelines in this report are provided for all aspects of the annotation, including annotation explicit discourse connectives, implicit relations, arguments of relations, senses of relations, and the attribution of relations and their ...
Download free PDF
View PDF
chevron_right
Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank
Rashmi Prasad
2009
Download free PDF
View PDF
chevron_right
Considering Discourse References in Textual Entailment Annotation
Luisa Bentivogli
Medea Lo Leggio
Proceedings of the 5th …, 2009
In the 2009 Recognizing Textual Entailment challenge a Search Pilot task has been introduced, aimed at finding all the sentences in a corpus which entail a set of given hypotheses. The preparation of the data set for this task has provided an opportunity to better understand some phenomena concerning textual entailment recognition in a natural setting. This paper focuses on some problematic issues related to resolving coreferences to entities, space, time and events at the corpus level, as emerged during the annotation of the data set for the textual entailment Search Pilot.
Download free PDF
View PDF
chevron_right
Guidelines for the Annotation of Parameters of Narration
Dina Serova
Sophia Döring
REALIS: Register Aspects of Language in Situation, 2023
The present guidelines describe the annotation of narrative phenomena on the clause level, using a combination of ideas and methods from linguistics and literary studies. The main categories marking the discourse strategy "narration" in stretches of text have been narrowed down to mediacy, i. e. involving a narrator, and sequentiality of events. This document specifies how to define mediacy, and in turn determine whether a narrator is present, as well as how to identify events and their sequential ordering. Lastly, a functional layer annotation is proposed which allows researchers to compare different types of narrative instances. This offers a basis for investigating a potential narrative register which is said to be important for many kinds of register studies.
Download free PDF
View PDF
chevron_right
An Annotated Dataset of Discourse Modes in Hindi Stories
Yaman Kumar
2020
In this paper, we present a new corpus consisting of sentences from Hindi short stories annotated for five different discourse modes argumentative, narrative, descriptive, dialogic and informative. We present a detailed account of the entire data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.87 k-alpha). We analyze the data in terms of label distributions, part of speech tags, and sentence lengths. We characterize the performance of various classification algorithms on this dataset and perform ablation studies to understand the nature of the linguistic models suitable for capturing the nuances of the embedded discourse structures in the presented corpus.
Download free PDF
View PDF
chevron_right
Reference manual for the analysis and annotation of rhetorical structure
Jason Baldridge
Pascal Denis
Julie Hunter
Nicholas Asher
Download free PDF
View PDF
chevron_right
Discourse annotation and semantic annotation in the GNOME corpus
Massimo Poesio
Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04, 2004
The GNOME corpus was created to study the discourse and semantic properties of discourse entities that affect their realization and interpretation, and particularly salience. We discuss what information was annotated and the methods we followed.
Download free PDF
View PDF
chevron_right
Annotating discourse anaphora
Heike Zinsmeister
Proceedings of the Third Linguistic Annotation Workshop on - ACL-IJCNLP '09, 2009
In this paper, we present preliminary work on corpus-based anaphora resolution of discourse deixis in German. Our annotation guidelines provide linguistic tests for locating the antecedent, and for determining the semantic types of both the antecedent and the anaphor. The corpus consists of selected speaker turns from the Europarl corpus.
Download free PDF
View PDF
chevron_right
The Use of Referential Constraints in Structuring Discourse
Dan Cristea
2002
The quality of discourse structure annotations is negatively influenced by the numerous difficulties that occur in the analysis process. In contrast, referential annotation resources are considerably more reliable, given the high precision of the existent anaphora resolution systems. We present an approach based on the Veins Theory (Cristea, Ide, Romary, 1998), in which successful reference annotations of texts are exploited in order to improve arbitrary structural analyses; in this way, the large amount of corpora annotated at reference level can be used for the acquisition of discourse structure annotation resources.
Download free PDF
View PDF
chevron_right
Annotating discourse connectives and their arguments
Bonnie Webber
Proceedings of the HLT/NAACL Workshop on Frontiers in Corpus Annotation, 2004
This paper describes a new, large scale discourse-level annotation project�the Penn Discourse TreeBank (PDTB). We present an approach to annotating a level of discourse structure that is based on identifying discourse connectives and their arguments. The PDTB is being built directly on top of the Penn Tree-Bank and Propbank, thus supporting the extraction of useful syntactic and semantic features and providing a richer substrate for the development and evaluation of practical algorithms. We provide a ...
Download free PDF
View PDF
chevron_right
Kalamang
annotation notes

Eline Visser August 2021
v1.0

Citation for this document
Visser, Eline. 2021. Multi-CAST Kalamang annotation notes. In Haig, Geoffrey & Schnell,
Stefan (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts.
(multicast.aspra.uni-bamberg.de/#kalamang) (date accessed)

Citation for the Multi-CAST collection
Haig, Geoffrey & Schnell, Stefan (eds.). 2015. Multi-CAST: Multilingual corpus of
annotated spoken texts. (multicast.aspra.uni-bamberg.de/) (date accessed)

The Multi-CAST collection has been archived at the University of Bamberg, Germany,
and is freely accessible online at multicast.aspra.uni-bamberg.de/.
The entirety of Multi-CAST, including this document, is published under the Creative
Commons Attribution 4.0 International Licence (CC BY 4.0), unless noted otherwise. The
licence can be reviewed online at creativecommons.org/licenses/by/4.0/.

Multi-CAST Kalamang annotation notes v1.0 last updated 1 August 2021
This document was typeset by NNS with XƎLATEX and the multicast3 class (v3.2.4).

Contents
1 Notes on the GRAID annotations 1
1.1 The Kalamang predicate and lack of a verb phrase 1
1.2 Complex verb constructions 2
1.3 Give-constructions 3
1.4 Elided predicates 3
1.5 Complement clauses and indirect speech 3
1.6 Dislocated topics 4
1.7 Topic marker me 5
1.8 Other conventions used 5

References 5

Appendices 7
A List of corpus-specific GRAID symbols 7
B List of abbreviated morphological glosses 8

1 Notes on the GRAID annotations
The following comprises selected notes on the GRAID (Haig & Schnell 2014) and RefIND (Schiborr
et al. 2018) annotations of Kalamang. It corresponds to version 2108 of the annotations, pub-
lished in August 2021. Unless a more recent version of this document exists, it also applies to any
later versions of the annotations.
The topics discussed in this document are more elaborately treated in the Kalamang refer-
ence grammar (Visser 2020). Readers are welcome to contact the author if they have any specific
questions.

1.1 The Kalamang predicate and lack of a verb phrase
Almost anything can be the predicate in Kalamang, so there are instances of predicate NPs (1),
demonstratives (2), and other words like question words (3) or quotatives (4):
(1) padahal wat perun
padahal wat per -un
however coconut water -3po
## other 0:s np:pred rn rn_pro:poss
‘However, (it was) the coconut water.’ [mc_kalamang_yardakdak_0013]

(2) ma yumene!
ma yumene
3g di
## pro:s dem_pro:pred
‘There it is!’ [mc_kalamang_kuawi_0044]

(3) wa me tamandi?
wa me tamandi
po op how
##ds dem_pro:s other other:pred
“‘How did this happen?”’ [mc_kalamang_kuawi_0054]

(4) mara me mu he koi eh: …
mara me mu he koi eh
move_towards_land op 3pl iam then o
## 0.h:s v:pred other ## pro.h:s_ds other other other:pred
‘(They) went landwards, and again (they said): ...’ [mc_kalamang_monyet_0207]

Kalamang has no VP, and so the iamitive is annotated not as ⟨lv⟩ but as ⟨other⟩:

(5) ah mera me mu he dodona nauwanonai koyet mu he era.
ah mera me mu he dodon =a nauwanona =i koyet
in then op 3pl iam things =foc tidy =plnk finish
## other other other pro.h:a other np:p =rn lv =lv v:pred ##
mu he era
3pl iam move_up
pro.h:s other v:pred
‘So then, when they packed all the stuff, they went up.’ [mc_kalamang_kuawi_0019]

Aspect and mood markers attach to the predicate, not to the verb, and are consequently marked
as ⟨other⟩.

(6) mat pararte.
ma =at parar =te
3g =obj wake_up =nfin
## 0.h:a pro.h:p =rn v:pred =other
‘Waking her up.’ [mc_kalamang_pitiskiet_0043]

1.2 Complex verb constructions
Kalamang has complex verb constructions that may include locative or lative NPs at different
positions. These constructions are appointed one head predicate which may or may not be a
verb. Locatives and latives are marked as goal or location NPs, even though they are technic-
ally part of the verb complex, and the other elements are marked as ⟨lv⟩ or ⟨rv⟩. The locat-
ive/lative case/adposition enclitics are annotated as ⟨rn⟩. This means there can be discontinu-
ous complexes such as ⟨lv … np:g=rn … v:pred⟩, or seemingly incongruent complexes such
as ⟨lv … np:pred_l=rn⟩, where the latter is the head of the predicate.

(7) ma amdirga bo muapruo.
ma amdir =ka bo muap-ruo
3g garden =la go food-dig
## pro.h:s np:g =rn lv v:pred
‘She went to the garden to dig up [i.e. harvest] food.’ [mc_kalamang_keluer_0014]

(8) ma orko.
ma or =ko
3g back =loc
## pro.d:s np:pred_l =rn
‘He is in the stern [of the boat].’ [mc_kalamang_monyet_0107]

(9) ecieni ruomgo rebaet.
ecie-n =i ruom =go reba =et
return-n =plnk foothill =loc pog =i
## 0.h:s lv =lv np:pred_g =rn other =other
‘She goes back to the foot of the mountain.’ [mc_kalamang_kuawi_0058]

(10) sarieni mindi bo uninsineingga bara.
sarie-n =i mindi bo uninsinei =ngga bara
chase-n =plnk like_that go Teluk_Buruwai =la descend
## 0.n:a 0.n:p lv =lv other lv pn_np:g =rn v:pred
‘(He) chased (it) until (they) came down to Uninsinei.’
[mc_kalamang_kasuari_0017–0018]

If no head verb can be distinguished in symmetrical serial verb constructions, the first verb is
chosen as the head, and the other(s) are marked as ⟨rv⟩ by default.

(11) sarua bo belbel, ma he kuru marua.
sarua bo belbel ma he kuru marua
scrape until sharp 3g iam bring move_towards_sea
## 0.d:a 0:p lv lv v:pred ## pro.d:a other 0:p v:pred rv
‘(He) scraped (it) until (it was) sharp, he brought (it) seawards.’
[mc_kalamang_monyet_0238]

1.3 Give-constructions
Kalamang has a zero morpheme for ‘give’, which has been annotated as ⟨0:pred⟩. In the mor-
phological glossing, this morpheme is indicated as a numeral zero (0). This verb attaches to the
recipient.

(12) keluer met boloni ande.
keluer me =at bolon-i an 0 -de
crab di =obj a_little-qn.obj 1g give -imp
##ds 0.2:a np:p rn =rn rn pro.1:g 0:pred -rv
“‘Give a little bit of that crab to me!”’ [mc_kalamang_keluer_0019]

1.4 Elided predicates
It is not unusual to elide predicates in Kalamang. Clauses that lack a predicate have been annot-
ated as ⟨nc⟩.

(13) mu he kaiat
mu he kai =at
3pl iam firewood =obj
#nc nc_pro.d nc nc =rn
‘They (unloaded?) the firewood.’ [mc_kalamang_monyet_0069]

1.5 Complement clauses and indirect speech
Complement clauses (nearly) all contain indirect speech and are annotated as ⟨#cc:other⟩. Kala-
mang lacks overt marking of complement clauses, so their exact syntactic status is uncertain.

(14) opa mu toni sabarkadoa iren.
opa mu toni sabar-kado =a iren
earlier 3pl say bow-side =foc ripe
##ds other pro.h:s v:pred #ds_cc:other np.d:s =rn v:pred
“‘They just said the one in the stern was white.”’ [mc_kalamang_monyet_0083]

1.6 Dislocated topics
Dislocated topics are have been annotated to indicate the function of the corresponding clause
constituents (⟨:dt_s⟩, ⟨:dt_p⟩, ⟨:dt_poss⟩, ⟨:dt_obl⟩). No dislocated A arguments are en-
countered in the corpus.

(15) pemukul me contoun tamandi me?
pemukul me conto -un tamandi me
hit_thing op example -3po how op
##ds np:dt_s other np:s -rn_pro:poss other:pred_other other
“‘This club, what is it like?”’ [mc_kalamang_pitiskiet_0091]

(16) sedangkan patin wa me indain alarun yuaba inat na.
sedangkan patin wa me indain alar -un
while wounded po op 1pl.e.alone tool -3po
##ds other np:dt_p rn other pro.1:a rn_np -rn_pro.h:poss
0053 0033

yua =ba in =at na
po =foc 1pl.e =obj consume
rn =rn pro.1:p =rn v:pred
0033
“‘These sores, we did them to ourselves.”’ [mc_kalamang_kuawi_0146]

(17) wienar kan inun Duan to.
wienar kan in -un Duan to
parrotfish you_know name -3po Duan to
## np.d:dt_poss other np:s -rn_pro.d:poss np:pred other
‘The parrotfish’s name, you know, is Duan, right.’ [mc_kalamang_keluer_0068–0069]

(18) mungkin enem yua me canama kona mabon ewa reon.
mungkin enem yua me canam =a kon =a ma =bon
maybe woman po op man =foc one =foc 3g =com
## other np.h:dt_obl rn other np.h:s =rn rn =rn pro.h:obl =rn
ewa reon
speak maybe
v:pred other
‘Maybe this woman [i.e. speaker’s spouse], a man is speaking with her maybe.’
[mc_kalamang_yardakdak_0011]

Most Kalamang clauses have a subject–object–predicate constituent order. Objects are marked
with an enclitic =at. Therefore, when there is a divergent word order, it is still recognizable what
is the subject and what the object, and this is hence not treated as disloation.

1.7 Topic marker me
The topic marker me can follow both NPs or entire clauses. It is annotated as ⟨other⟩.
(19) yalta me, …
yal =ta me
paddle =nfin op
## 0.d:s v:pred =rv other
‘(They) paddled and then...’ [mc_kalamang_monyet_0025]

(20) sedangkan patin wa me indain alarun yuaba inat na, …
sedangkan patin wa me indain alar -un
while wounded po op 1pl.e.alone tool -3po
##ds other np:dt_p rn other pro.1:a rn_np -rn_pro.h:poss

yua =ba in =at na
po =foc 1pl.e =obj consume
rn =rn pro.1:p =rn v:pred
“‘These sores, we did them to ourselves, ...”’ [mc_kalamang_kuawi_0146]

1.8 Other conventions used
u Prohibitive clauses are treated like (i.e. annotated as) negative clauses.
u Case/adpositions and predicate morphology that is segmented together with the form
they attach to are not explicitly annotated. The only exception are locative and lative
case/adposition enclitics.
u Before (se) koyet ‘(iam) finish’ a dummy zero subject has been added. This expression
could literally mean ‘it’s finished’, but is more generally used as a conjunction ‘after that’.
At the end of some stories, the zero subject refers to the story as a whole.

(21) koyet.
koyet
finish
## 0:s v:pred
0023
‘The end.’ [mc_kalamang_keluer_0102]

References
Haig, Geoffrey & Schnell, Stefan. 2014. Annotations using GRAID (Grammatical Relations and Animacy in
Discourse): Introduction and guidelines for annotators (version 7.0). (https : / / multicast . aspra .
uni-bamberg.de/#annotations) (Accessed 2019-03-08).
Haig, Geoffrey & Schnell, Stefan (eds.). 2016. Multi-CAST: Multilingual Corpus of Annotated Spoken Texts.
(https://multicast.aspra.uni-bamberg.de/) (Accessed 2019-03-08).
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND — Referent Indexing in Natural-language
Discourse: Annotation guidelines (v1.1). University of Bamberg Unpublished manuscript. (https : / /
multicast.aspra.uni-bamberg.de/#annotations) (Accessed 2019-03-08).
Visser, Eline. 2020. A grammar of Kalamang: The Papuan language of the Karas Islands. Lund: Lund Univer-
sity Ph.D. dissertation.
Visser, Eline. 2021. Multi-CAST Kalamang. In Haig, Geoffrey & Schnell, Stefan (eds.), Multi-CAST: Multilingual
Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni- bamberg.de/#kalamang)
(Accessed 2021-05-28).

Appendices
A List of corpus-specific GRAID symbols
The following is a list of the non-standard GRAID symbols used in the annotation of the Multi-CAST
Kalamang corpus. Please refer to the GRAID manual (Haig & Schnell 2014: 54–55) for an inventory
of basic GRAID symbols.

Form symbols and specifiers
⟨dem_pro⟩ demonstrative pronoun
⟨pn_np⟩ proper name

Function symbols and specifiers
⟨_ds⟩ specifier: subject of a verb of speech; attaches to ⟨:s⟩, ⟨:a⟩, and ⟨:ncs⟩

Other symbols
⟨.n⟩ animacy symbol used for non-human, non-anthropomorphized animates
(i.e. animals)
⟨nc_⟩ specifier: marks form glosses with RefIND indices in segments otherwise
not considered (i.e. those marked ⟨#nc⟩)

B List of abbreviated morphological glosses

an.la animate lative la lative (combined ablative
an.loc animate locative and allative)
ana anaphoric demonstrative nfin non-final
a attributive ph placeholder
don elevational ‘down’ plnk predicate linker
emph emphatic qn.obj quantifier object
(quantifier modifying an
ecl exclusive
object)
ei existential
ed reduplication
f.di far distal im similative
fil filler p interjection of surprise
he hesitation ag confirmation-seeking
iam iamitive (‘already’) interjection
incl inclusive p elevational ‘up’
in interjection ol volitional
in.e interjection of the form e
in intensifier nc not classified

multicast.aspra.uni-bamberg.de/

2021 CC-BY 4.0 Department of General Linguistics, University of Bamberg
References (5)
Haig, Geoffrey & Schnell, Stefan. 2014. Annotations using GRAID (Grammatical Relations and Animacy in Discourse): Introduction and guidelines for annotators (version 7.0). (https : / / multicast . aspra . uni-bamberg.de/#annotations) (Accessed 2019-03-08).
Haig, Geoffrey & Schnell, Stefan (eds.). 2016. Multi-CAST: Multilingual Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/) (Accessed 2019-03-08).
Schiborr, Nils N. & Schnell, Stefan & Thiele, Hanna. 2018. RefIND -Referent Indexing in Natural-language Discourse: Annotation guidelines (v1.1). University of Bamberg Unpublished manuscript. (https : / / multicast.aspra.uni-bamberg.de/#annotations) (Accessed 2019-03-08).
Visser, Eline. 2020. A grammar of Kalamang: The Papuan language of the Karas Islands. Lund: Lund Univer- sity Ph.D. dissertation.
Visser, Eline. 2021. Multi-CAST Kalamang. In Haig, Geoffrey & Schnell, Stefan (eds.), Multi-CAST: Multilingual Corpus of Annotated Spoken Texts. (https://multicast.aspra.uni-bamberg.de/#kalamang) (Accessed 2021-05-28).
FAQs
AI
What unique characteristics define complex verb constructions in Kalamang?
add
The study identifies that Kalamang allows complex verb constructions with locative or lative NPs positioned variably, marked as goal or location. Discontinuous complexes, such as ⟨lv … np:g=rn … v:pred⟩, illustrate this complexity in the verbal structure.
How does Kalamang utilize zero morphemes in its verb structures?
add
Kalamang employs a zero morpheme for 'give', annotated as ⟨0:pred⟩, which attaches to recipients. This unique morphological feature enables nuanced expression of giving without explicit verb forms.
What distinguishes Kalamang's handling of elided predicates?
add
The findings show that clauses in Kalamang often elide predicates, marked as ⟨nc⟩, allowing for succinct expression. This phenomenon creates significant variability in clause construction, reflecting linguistic economy.
In what ways are complement clauses structured in Kalamang?
add
Kalamang's complement clauses primarily feature indirect speech and lack overt marking, annotated as ⟨#cc:other⟩. Their syntactic status remains ambiguous, yet a subject-object-predicate order prevails in most clauses.
What role does the topic marker 'me' play in Kalamang syntax?
add
The topic marker 'me' can follow both NPs and entire clauses, reflecting its versatility as an indicator of topicality. It is annotated as ⟨other⟩, emphasizing its functional breadth in Kalamang.
Eline Visser
Uppsala University, Post-Doc
--> just visit https://elinevisser23.github.io/ for all my papers

I'm an expert on Kalamang (Papuan) and Uruangnirin (Austronesian), the languages spoken on the Karas Islands in eastern Indonesia. I have also collected data on other Austronesian languages of eastern Indonesia: Geser-Gorom, Onim and Yamdena. Until mid 2026, I'm leading a project called Mapping Bomberai, which aims to collect data on the understudied languages of that peninsula. Pietro Milano works on Arguni, Artúr Stickl on Kamrau, Tilda Jacobson Holmström on Erokwanas and Bedoanas, Dendi Wijaya on Sekar and John Pattipeiluhu on Geser-Gorom. Two more people are planned to collect data on Buruwai and Kemberano. Sune Gregersen will start a two-year project on Mbaham in the summer of 2025. Mapping Bomberai is financed by the Wenner-Gren Foundations with administrative support from Uppsala University and the Center for Endangered Languages Documentation in Manokwari, Indonesia.
Papers
18
Followers
46
View all papers from
Eline Visser
arrow_forward
Related papers
Annotations using GRAID : (Grammatical Relations and Animacy in Discourse) ; Manual Version 7.0
Geoffrey Haig
opus, 2015
Download free PDF
View PDF
chevron_right
THE ROLE OF ANNOTATION IN UNDERSTANDING DISCOURSE
Deniz Zeyrek
user.ceng.metu.edu.tr
In this paper, we introduce our effort of building a resource by expanding an existing resource (METU Turkish Corpus) from a sentence-level resource to a discourse-level resource. The project shares the goals of the PDTB (www.seas.upenn.edu/~pdtb), a resource for English, which has been annotated for explicit and implicit connectives, their arguments and senses. The paper explains the linguistic goals of the project and describes how the annotated corpus can help us understand discourse structure. 1 Our approach to discourse structure is corpus-based. As we briefly explain below, we have started our investigation of discourse connectives with a particular discourse theory, namely D-LTAG, but we are continuing largely in a theory-neutral way since our goal is to be able to empirically assess specific descriptions of discourse connectives with data from METU Turkish Corpus.
Download free PDF
View PDF
chevron_right
Building a tool for annotating reference in discourse
Kathleen McCoy
1999
We discuss the development of a system for marking several types of reference to facilitate the analysis of reference in discourse. The tool is designed to be used in three applicationsi generating training data for machine learning of co-reference relations, evaluating iheories of referring expression generation and resolution in texts, and developing theories for understanding reference in dialogs. The need to mark any of a broad set of relations which may span several levels of discourse structure drives the system architecture. The system has the abilityto collect statistics over encoded relations and meastwe inter-coder reliability, and includes tools to increase the accuracy of the user's markings by highlighting the di.u:repancies between two sets of markings. Using parsed corpora as the input further reduces the human workload and increases reliability.
Download free PDF
View PDF
chevron_right
A framework for annotating information structure in discourse
Sasha Calhoun
Proceedings of the Workshop on Frontiers in Corpus Annotations II Pie in the Sky - CorpusAnno '05, 2005
We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for annotating information structure, i.e. theme/rheme and background/kontrast. We show that information structure in English can only be analysed concurrently with prosodic prominence and phrasing. This annotation, using stand-off XML in NXT, can help establish standards for the annotation of information structure in discourse.
Download free PDF
View PDF
chevron_right
Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus
Kepa J. Rodriguez
Natural Language Engineering
This paper presents the second release of arrau, a multigenre corpus of anaphoric information created over 10 years to provide data for the next generation of coreference/anaphora resolution systems combining different types of linguistic and world knowledge with advanced discourse modeling supporting rich linguistic annotations. The distinguishing features of arrau include the following: treating all NPs as markables, including non-referring NPs, and annotating their (non-) referentiality status; distinguishing between several categories of non-referentiality and annotating non-anaphoric mentions; thorough annotation of markable boundaries (minimal/maximal spans, discontinuous markables); annotating a variety of mention attributes, ranging from morphosyntactic parameters to semantic category; annotating the genericity status of mentions; annotating a wide range of anaphoric relations, including bridging relations and discourse deixis; and, finally, annotating anaphoric ambiguity. T...
Download free PDF
View PDF
chevron_right
Understanding Narrative Discourse: A Computer-Based Approach to the Problems of Reference and Co-Reference
Pilar Alonso
The Grove
This paper focuses on the problems which undergraduate students of EFL encounter when they read and interpret narrative texts of high complexity, very especially those problems involving errors in the assignment of referents to personal pronouns. As a solution, it pro poses a computer-based method adapted from Serrano Deza's computerized treatment of literary texts (1994), which helps dramatically to control this kind of disruptive reading l. Introduction In the 1995-6 academic year, I started teaching a course on Linguistic Analysis of Literary Texts to undergraduate Spanish students of EFL. The aim of the course was to show students the importance of language selection and language use in literary discourse, and the positive results that could be obtained from the application oflinguistic methods to the study ofliterary texts, especially in terms ofintegration and objectivity (Traugott and Pratt, 1980; Fowler, 1986; Alonso, 1991). This was a procedure I had developed previously in my graduate courses, but the common complaint of students that these were matters to which
Download free PDF
View PDF
chevron_right
PALinkA: A highly customisable tool for discourse annotation
Constantin Orasan
Proc. of the 4th SIGdial Workshop on Discourse and …, 2003
Annotation of discourse phenomena is a notoriously difficult task which cannot be carried out without the help of annotation tools. In this paper we present a Perspicuous and Adjustable Links Annotator (PALinkA), a tool successfully used in several of our projects. We also briefly describe three types of discourse annotations applied using the tool.
Download free PDF
View PDF
chevron_right
How consistent are our discourse annotations? Insights from mapping RST-DT and PDTB annotations
Fatemeh Torabi Asr
ArXiv, 2017
Discourse-annotated corpora are an important resource for the community. However, these corpora are often annotated according to different frameworks, making comparison of the annotations difficult. This is unfortunate, since mapping the existing annotations would result in more (training) data for researchers in automatic discourse relation processing and researchers in linguistics and psycholinguistics. In this article, we present an effort to map two large corpora onto each other: the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank. We first propose a method for aligning the discourse segments, and then evaluate the observed against the expected mappings for explicit and implicit relations separately. We find that while agreement on explicit relations is reasonable, agreement between the frameworks on implicit relations is astonishingly low. We identify sources of systematic discrepancies between the two annotation schemes; many of the differences i...
Download free PDF
View PDF
chevron_right
Explore
Papers
Topics
Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts
Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials
Company
About
Careers
Press
Content Policy
580 California St., Suite 400
San Francisco, CA, 94104