HPSG: Background and Basics Ivan A. Sag Stanford University 1. Introductory Remarks1 We want to emphasize the extent to which HPSG is intellectually indebted to a wide range of recent research traditions in syntax (principally nonderivational approaches such as categorial grammar (CG), generalized phrase structure grammar (GPSG), arc pair grammar (APG), and lexical-functional grammar (LFG)), semantics (especially situation semantics), and computer science (data type theory, knowledge representation, unication-based for- malisms). The phenomena with which P&S-94 are concerned with are among those which have oc- cupied center stage within syntactic theory for well over thirty years: the control of `under- stood' subjects, long-distance dependencies conventionally treated in terms of wh -movement, and syntactic constraints on the relationship between various kinds of pronouns and their antecedents. Within that time period, detailed accounts of these phenomena { and of the relation- ships among them { have been developed within the research framework established by Noam Chomsky and known in its successive stages as the `standard' theory, the `extended standard' theory, the `revised extended standard' theory and `government-binding' theory (GB, or the `principles-and-parameters' approach). But given the widespread acceptance of that framework as a standard in recent years, especially among an extensive community of syntacticians in the United States and much of continental western Europe, it is incumbent upon the proponents of a competing framework to explicate the sense and extent to which the proposed alternative ddresses the concerns of that community. For that reason, we will try to make clear in what respects our accounts resemble those provided within GB theory, and { more importantly { in what respects they dier. A number of similarities between GB theory and the theory advocated here will be ap- parent. For example, in both theories structure is determined chie y by the interaction between highly articulated lexical entries and parametrized universal principles of gram- matical well-formedness, with rules reduced to a handful of highly general and universally available phrase structure (or immediate dominance) schemata. A number of key GB prin- ciples (such as principles A, B, and C of the binding theory, subjacency, and the empty category principle) have more or less direct analogs in HPSG; and two other HPSG princi- ples (the head feature principle and the subcategorization principle) play a role in the theory roughly comparable to that of the projection principle in GB. Moreover, in both GB and 1There may be minor inconsistencies in this document, because it was cobbled together from other things in a hasty fashion. Sections 1 and 2 are adapted from C. Pollard and I. A. Sag. 1994, Chapter 1. 1 HPSG, there are assumed to be several distinct `levels' (or, as we will call them, attributes or features ) of linguistic structure. At the same time, however, there are a great many dierences between the two theories, with respect to both global theory architecture and matters of technical detail. One key architectural dierence is the absence from HPSG of any notion of transformation. Un- like GB levels (at least as they are most commonly explicated, the attributes of linguistic structure in HPSG are related not by movement but rather by structure sharing , i.e. token identity between substructures of a given structure in accordance with lexical specications or grammatical principles (or complex interactions between the two).2 In common with a number of linguistic theories, then (including those commonly referred to as `unication- based'), HPSG is nonderivational , in contradistinction to nearly all variants of GB and its forebears, wherein distinct levels of syntactic structure are sequentially derived by means of transformational operations (e.g. move-). We will argue that, far from being a matter of indierence or mere notational variance, the derivational/nonderivational distinction has important empirical consequences. A second essential dierence between GB and HPSG has to do with the number and nature of structural levels posited. Although both theories posit multiple levels of struc- ture, the inventory is somewhat dierent. A sign (i.e. a word or phrase, the HPSG ana- log of an expression in GB) is assumed to have (at least) the attributes PHONOLOGY (PHON), SYNTAX-SEMANTICS (SYNSEM), and (in the case of phrases) DAUGH- TERS (DTRS). Here PHON and DTRS can be regarded as rough analogs of the GB lev- els PF (phonetic form) and S-structure. But the SYNSEM attribute does not correspond directly to any one level of GB syntactic structure. Rather, it in turn has (at least) three at- tributes of its own called CATEGORY (CAT), CONTENT (CONT), and CONTEXT. Here CAT plays a role roughly analogous to that of D-structure in GB; CONTENT, on the other hand, is concerned principally with linguistic information that bears directly on semantic interpetation (and is therefore most closely analogous to GB's level of LF (logical form)).3 It should also be emphasized here that, unlike the situation in GB theory, where only sentences are assumed to have the levels of representation PF, LF, S-structure, and D-structure, in HPSG it is assumed that all signs, be they sentences, subsentential phrases, or words (i.e. lexical signs), have the attributes PHON and SYNSEM, and that all phrasal signs have the attribute DTRS as well. Technical detail, of course, is what most work in HPSG consists of. Just a few salient respects in which HPSG diers from GB will be mentioned here, to give something of the avor of the theory; all will be discussed in full in the chapters to come. Perhaps most characteristically, in HPSG tree-congurational notions such as government and c-command are not regarded as linguistically signicant; instead, their role is taken over by the relation of relative obliqueness that obtains between syntactic dependents of the same head. For 2 The notion of structure sharing has a somewhat obscure origin in modern linguistics. As noted by Johnson and Postal (1980: 479-483), it has played a central role (under various names, e.g., `loops', `vines', `multiattachment' and `overlapping arcs') in various theoretical frameworks. (See especially the formulation in Johnson and Postal 1980 and the references cited therein). 3 The CONTEXT attribute contains linguistic information that bears on certain context-dependent as- pects of semantic interpretation. 2 example, in HPSG the subject is dened not in terms of a D-structure congurational position, but rather as the least oblique complement of the relevant head, where relative obliqueness is modelled by position on the list which forms the SUBCATEGORIZATION (SUBCAT) value of that head4 Another example: in HPSG, principle A (which constrains the possible antecedents of anaphors) makes no reference to c-command or government, but merely requires that an anaphor be coindexed with some less oblique argument (provided such exists). We will try to show that such noncongurational formulations are not only conceivable alternatives, perhaps to be preferred on grounds of simplicity and conceptual clarity, but are also superior with respect to conformity with the facts. As mentioned above, although HPSG does not employ movement, the account that we propose for phenomena traditionally treated under the rubric of wh -movement does resemble the GB account inasmuch as phonetically null constituents { traces { are assumed to occupy the `gap' position;5 however, we will argue that the relationship between the gap and its `ller' is more clearly understood as a matter of structure sharing than as one of movement.6 To put it another way, we deny that transformations themselves model anything in the empirical domain (and therefore HPSG shares the property of `nonderivationality' with CG, GPSG, APG and LFG, in contradistinction to GB and its derivational kin). Similarly, raising will be treated in terms of structure sharing between a matrix argument and the complement's SUBCAT specication corresponding to the complement subject. In this case, however, there is no need to posit an actual constituent (e.g. NP-trace) corresponding to that specication, and hence the complement will simply be a VP, not an S.7 Thus HPSG has no analog of GB's `extended' projection principle, which appears to us to have been introduced by Chomsky (1982) essentially without argument: lexical requirements (as expressed in SUBCAT lists) do not always have to be satised on the surface (i.e. in the DAUGHTERS attribute). Another GB assumption explicitly denied in HPSG is the principle, proposed by Chom- sky (1981), that every (nonsubject) subcategorized element must be assigned a semantic role.8 Thus there is no obstacle to a `raising-to-object' analysis of sentences like Kim be- lieves Sandy to be happy . In HPSG this amounts to structure sharing between the matrix object and the subject specication on the complement's SUBCAT list. Thus raising to subject and raising to object are handled in entirely parallel fashion: by sharing of structure between the complement subject and the matrix controller at the `level' of subcategorization. 4 NB: In the version of HPSG based on Chapter 9, feature SUBCAT (in its function of indicating relative obliqueness of a head's arguments) is renamed as ARGUMENT-STRUCTURE (ARG-S). The `valence' features SUBJ and COMPS take over the role of specifying the particular elements that the lexical head actually must combine with. 5 But we will propose an alternative, traceless analysis in Chapter 9. 6 The proposal to treat extraction phenomena in terms of structure sharing (or `overlapping arcs', in their terms) was rst made, we believe, by Johnson and Postal (1980). Our proposals for the analysis of extraction, coreference and a variety of other linguistic phenomena, though diering in many points of detail from those of Johnson and Postal, nonetheless share the important feature of being based on structure sharing, rather than derivational processes. 7 Moreover, since passive is handled by lexical rule rather than within the syntax (see below), the necessity for an analog of NP-trace is obviated altogether. 8 Postal and Pullum (1988) argue persuasively that this assumption, though conventional, is justied by neither empirical nor GB-internal theoretical considerations. 3 As we have seen, the closest HPSG analog of movement is structure sharing with either a phonetically null constituent (unbounded dependencies) or with a SUBCAT element that is not realized as a constituent at all (raising). But not all instances of movement in GB correspond to structure sharing in HPSG; passive, for example, as mentioned above, is not treated in the syntax at all but rather by lexical rule. Another case in which movement in GB has a `non-movement' (i.e. non-structure-sharing) account in HPSG is that of `head movement', as manifested (for example) in VSO word order or in English `subject-auxiliary inversion'. On our account, such structures simply arise from the existence of a phrase- structure schema, utilized (like all schemata) to dierent extents by dierent languages, that permits the realization of all complements (including the subject) as sisters of the lexical head (P&S-87, sec. 6.2); the orderings are the consequence of independently motivated language-specic constituent ordering principles (P&S-87, sec. 7.2). The other core case of head movement in GB, viz. movement of the head of VP into INFL, does not require any treatment at all in HPSG, for HPSG does not posit an indepen- dent category INFL to serve as a repository of tense and subject agreement features. Instead, subject agreement features (like object agreement features, in languages which have object agreement) occur within the corresponding SUBCAT element of the verb; and the role of the tense element of INFL is taken over by the head feature VERB-INFLECTIONAL- FORM (VFORM). Thus whether or not the verb is tensed is simply a question of whether the VFORM value is nite (n) or some other (nonnite) value; and the independent question of whether or not the verb is an auxiliary (and therefore can license VP deletion, contracted negation, etc.) is treated in terms of another (binary) head feature AUXILIARY (AUX). Indeed, from the point of view of HPSG, Chomsky's rule move- must be seen as a kind of Procrustean bed. On our account, the phenomena which have been relegated to it are a heterogeneous assemblage, each of which deserves a more comfortable resting place of its own, be it in the lexicon (passive and verb in ection), in the phrase structure schemata (verb-object nonadjacency), or in structure sharings that accord with dierent kinds of interactions between lexical specications and universal principles (raising and unbounded dependencies).9 2. The Nature of Linguistic Theory Let us begin by making explicit some methodological assumptions. In any mathematical theory about an empirical domain, the phenomena of interest are modelled by mathemati- cal structures, certain aspects of which are conventionally understood as corresponding to observables of the domain. The theory itself does not talk directly about the empirical phenomena; instead, it talks about, or is interpreted by , the modelling structures. Thus the predictive power of the theory arises from the conventional correspondence between the model and the empirical domain. An informal theory is one that talks about the model in natural language, say a technical dialect of English, German, or Japanese. But as theories become more complicated and their empirical consequences less straightforwardly apparent, the need for formalization arises. In 9 For an analogous critique of the notion of metarule employed in GPSG, see Pollard 1985. 4 cases of extreme formalization, of course, the empirical hypotheses are cast as a set of axioms in a logical language, where the modelling structures serve as the intended interpretations of expressions in the logic. In our view, a linguistic theory should bear exactly the same relation to the empirical do- main of natural language, viz. the universe of possible linguistic objects, as a mathematical theory of celestial mechanics should bear to the possible motions of n-body systems. Thus we insist on being explicit as to what sorts of constructs are assumed (i.e. what ontolog- ical categories of linguistic objects we suppose to populate the empirical domain), and on being mathematically rigorous as to what structures are used to model them. Moreover, we require that the theory itself actually count as a theory in the technical sense of precisely characterizing those modelling structures which are regarded as admissible or well-formed (i.e. corresponding to those imaginable linguistic objects which are actually predicted to be possible ones). This does not mean that the empirical hypotheses must be rendered in a formal logic as long as their content can be made clear and unambiguous in natural language (the same holds true in mathematical physics), but in principle they must be capable of be- ing so rendered. Unless these criteria are satised, an enterprise purporting to be a theory can not have any determinate empirical consequences. We emphatically reject the currently widespread view which holds that linguistic theory need not be formalized. Our position is the same as the one advocated by Chomsky (1957:5). Precisely constructed models for linguistic structure can play an important role, both negative and positive, in the process of discovery itself. By pushing a precise but inadequate formulation to an unacceptable conclusion, we can often expose the exact source of this inadequacy and, consequently, gain a deeper understanding of the linguistic data. More positively, a formalized theory may automatically provide solutions for many problems other than those for which it was explicitly designed. Obscure and intuition-bound notions can neither lead to absurd conclusions nor provide new and correct ones, and hence they fail to be useful in two important respects. I think that some of those linguists who have questioned the value of precise and technical development of linguistic theory have failed to recognize the productive potential in the method of rigorously stating a proposed theory and applying it strictly to linguistic material with no attempt to avoid unacceptable conclusions by ad hoc adjustments or loose formulation. In HPSG, the modelling domain { the analog of the physicist's ows { is a system of sorted feature structures (Moshier 1988, Pollard and Moshier 1990)), which are intended to stand in a one-to-one relation with types of natural language expressions and their subparts. The role of the linguistic theory is to give a precise specication of which feature structures are to be considered admissible; the types of linguistic entities which correspond to the admissible feature structures constitute the predictions of the theory. A further methodological principle, shared by the scientic community at large, is that of ontological parsimony: insofar as it is possible without doing violence to the simplicity and elegance of the theory, we do not posit constructs that do not correspond to observables of the empirical domain. Of course, all scientic theories contain such constructs. An obsolete 5 example is the phlogiston that used to form the basis for the theory of combustion; a contem- porary one is the quarks that are posited to account for the observed variety of subatomic particles. But the parsimony principle with respect to nonobservable constructs dictates: use only as needed. Perhaps phrase structure itself (variously manifested as, e.g, GB's S- Structure, LFG's c-structure, and HPSG's DAUGHTERS attribute) is the nonobservable linguistic construct that enjoys the widest acceptance in current theoretical work. Surely the evidence for it is far less direct, robust, and compelling then that for phonological structure (e.g. GB's PF, HPSG's PHONOLOGY), logical predicate-argument structure (GB's LF, HPSG's CONTENT), or underlying grammatical relations (GB's D-Structure, HPSG's SUBCATEGORIZATION attribute, LFG's f-structure). But for all that a theory that successfully dispensed with a notion of surface constituent structure is to be preferred (other things being equal, of course), the explanatory power of such a notion is too great for many syntacticians to be willing to relinquish it. But if phrase structures are current syntactic theory's quarks, move- { as Koster (1987) has remarked { might well be regarded as its phlogiston. As we hope to have made clear by now, we regard transformational operations between levels as constructs that are not motivated by empirical considerations. What we observe, albeit indirectly, is sharing of certain subparts (e.g. between a ller and a gap, between an anaphor and a binder, between an `understood' subject and a controller). But such sharing is straightforwardly and neutrally accounted for as simple identity; attributing it to derivational processes at best contributes nothing to the theory, and at worst introduces complications and confusions (e.g. ordering paradoxes) of a completely artifactual nature.10 There is a further condition of decidability that we impose upon a linguistic theory. That is, we require that for a substantial fragment of candidate expressions (i.e. expressions and non-expressions) for a given language under study, it must be determinable by algorithm whether each candidate expression is assigned a well-formed structure by the theory, and if so what that structure is. The condition of decidability is the theory's re ection of two fun- damental facts about language use: rst, the structures of linguistic expressions are capable in principle of being computed by the resource-bounded information-processing organisms which successfully employ them in a communicative function; and second, that language users are able to render judgments as to the well-formedness of candidate expressions (gen- erally taken as the primary data to be accounted for by the theory). Of course, decidability of this sort, in and of itself, is a modest criterion to impose on a linguistic theory. If the grammars oered by a linguistic theory are to be embedded into a theory of human language processing, then there are a variety of properties of language processing that might be expected to inform the design of grammar. For example, even the most supercial observation of actual language use makes plain the fact that language processing is typically highly incremental: speakers are able to assign partial interpretations to partial utterances (and quite rapidly, in fact). Thus, other things being equal, a theory of grammar which provides linguistic descriptions that can be shown to be incrementally processable should be regarded as superior to one which does not. 10For further arguments in support of the view that grammars should be formulated as declarative systems of constraints rather than derivational processes, see Johnson and Postal 1980 and Langendoen and Postal 1984. 6 Similarly, we know that language processing is highly integrative { information about the world, the context, and the topic at hand is skillfully woven together with linguistic in- formation whenever utterances are successfully decoded. For example, it is the encyclopedic fact that books don't t on atoms { integrated mid-sentence { that allows the correct modi- cation of the prepositional phrase on the atom to be determined well before word-by-word processing of a sentence like (1) is complete.11 (1) After nding the book on the atom, Kim decided that the library really wasn't as bad as people had been claiming. Without such nonlinguistic sources of constraint, the interpretation of even the most mun- dane of utterances can become highly indeterminate. So profound, in fact, is this indeter- minacy (and the concomitant reliance of language on situational information) that the very fact that communication is possible using natural language acquires an air of considerable mystery. Although we lack at present any well-developed scientic theory of how linguis- tic and nonlinguistic information are brought together to resolve such indeterminacy, it is nonetheless clear that we must prefer a linguistic theory whose grammars provide partial linguistic descriptions of a sort that can be exibly integrated with nonlinguistic information in a model of language processing. In addition to the incremental and integrative nature of human language processing, we may also observe that there is no one order in which information is consulted that can be xed for all language use situations. In fact, an even stronger claim can be justied. In examples like (2), early accessing of morphological information allows the cardinality of the set of sheep under discussion to be determined incrementally, and well before the world knowledge necessary to select the `fenced enclosure' sense of pen , rather than its `writing implement' sense.12 (2) The sheep that was sleeping in the pen stood up. In (3), on the other hand, the relevant information about the world (the information, how- ever represented, that allows a hearer to determine that sheep might t inside a fenced enclosure, but not inside a writing implement) seems to be accessed well before the relevant morphological information constraining the cardinality of the set of sheep . (3) The sheep in the pen had been sleeping and were about to wake up. What contrasts like these suggest is that the order in which information accessed in language understanding, linguistic or otherwise, is tied fairly directly to the order of the words being processed. Assuming then that it is the particular language process that will in general dictate the order in which linguistic (and other) information is consulted, a grammar { if it is to play the role, as we assume, of information that ts directly into a model of processing { should be unbiased as to order. Grammars that are to t into realistic models of processing should be completely order-independent. Finally, we know that linguistic information, in the main, functions with like eect in many diverse kinds of processing activity, including comprehension, production, translation, 11 Example (1) is an adaptation of an example of Graeme Hirst's (see Hirst 1987). 12 We owe this sort of example to Martin Kay. 7 playing language games, and the like. By `like eect', we mean, for example, that the set of sentences potentially produceable by a given speaker-hearer is quite similar to, in fact bears a natural relation (presumably proper inclusion) to, the set of sentences that that speaker- hearer can comprehend. This might well have been otherwise. The fact there is so close and predictable a relation between the production activity and the comprehension activity of any given speaker of a natural language argues strongly against any theory where production grammars are independent from comprehension grammars, for instance. Rather, this simple observation suggests that the dierences between, say, comprehension and production should be explained by a theory that posits dierent kinds of processing regimes based on a single linguistic description - a process-neutral grammar of the language that is consulted by the various processors that function in linguistic activity. The fact that production is more restricted than comprehension can then be explained within a theory of comprehension that allows certain kinds of linguistic constraints to be relaxed, or even word-by-word processing to be suspended when situational information is sucient to signal partial communicative intent. Suspension of word-by-word processing clearly cannot enter into production in the same way (though incomplete sentences sometimes achieve communicative success). Hence, if we appeal to dierences of process - not dierences of grammar, there is at least the beginning of a natural account for why production should lag behind comprehension. Speakers that stray very far from the grammar of their language run serious risk of not being understood; yet hearers who allow grammatical principles to relax when necessary, may understand more than those who do not. There is thus a deep functional motivation for why the two kinds of processing might dier as they appear to. Observations of this sort about real language use and language processing are quite robust. Yet, given our current understanding, it is not completely clear how to convert such intuitive observations into criteria for evaluating linguistic theories. The problem is in essence that our understanding of language processing lags well behind our understanding of linguistic structure. Whereas it is reasonable to expect that further research into human language processing will produce specic results that inform the minute details of future linguistic theories, we do not yet know how to bring these considerations to bear. Despite this uncertainty, the foregoing observations about human language processing suggest certain conclusions about the design of grammar. Grammars whose constructs are truly process-neutral, for example, hold the best hope for the development of processing models. And the best known way to ensure process-neutrality is to formulate a grammar as a declarative system of constraints.13 Such systems of constraints t well into models of pro- cessing precisely because all the information they provide is on an equal footing. To see this, consider a theory of grammar that does not meet this criterion. A grammar of the sort pro- posed by Chomsky (1965), for example, embodies transformational rules whose application is order-dependent. The xed order imposed on such rules is one that is more compatible with models of production than models of comprehension. This is so because production models may plausibly be closely associated with the application of transformations, and the information that must be accessible to determine transformational applicability is localized within a single structural description (a phrase marker) at some level in the transformational 13A similar point is made by Bresnan and Kaplan (1982). See also Halvorsen 1983, Sag et al. 1985, and Fenstad et al. 1987. 8 derivation. Comprehension models based on transformational grammar, by constrast, seem ineluctably saddled with the problem of systematically applying transformations in reverse , and this is a problem that no one, to our knowledge, has ever solved. Declaratively formulated grammars like those developed within HPSG exhibit no bi- ases toward one mode of processing rather than another. Because each partial linguistic description is to be viewed denotatively, i.e. as being satised by a certain set of linguistic structures (see above), the constructs of such grammars (e.g. words, rules, or principles) can be consulted in whatever order a process may dictate { the constructs are all constraints which, by their very nature, are order-independent and which allow themselves to be pro- cessed in a monotonic fashion. Given the current state of our knowledge of language use, a constraint-based architecture of this sort would seem to be the most plausible choice for the design of the theory of language, at least if the goal of embedding that theory within a model of language processing is ever to be realized. In our concern for processing issues like those we have touched on brie y here, we have accepted the conventional wisdom that linguistic theory must account for linguistic knowl- edge (a recursively denable system of linguistic types) but not necessarily for processes by which that knowledge is brought to bear in the case of individual linguistic tokens. Indeed, we take it to be the central goal of linguistic theory to characterize what it is that every linguistically mature human being knows by virtue of being a linguistic creature, viz. uni- versal grammar. And a theory of a particular language { a grammar { characterizes what linguistic knowledge (beyond universal grammar) is shared by the community of speakers of that language. Indeed, from the linguist's point of view, that is what the language is. But what does language consist of? One thing that it certainly does not consist of is individual linguistic events or utterance tokens, for knowledge of these is not what is shared among the members of a linguistic community. Instead, what is known in common, that makes communication possible, is the system of linguistic types. For example, the type of the sentence I'm sleepy is part of that system, but no individual token of it is. 1 Phrases and Schemata For expository purposes, HPSG is often presented in terms of the familiar trappings of generative grammar, where syntactic rules or schemata are formal devices that `generate' word-terminated structures like 4: 9 (4) S NP VP V NP Felix DET N 0 chased the dog But this presentation is in fact a distortion of HPSG, where phrases are treated in essentially the same way as words, as feature structure that serve as models of utterance types. The most fundamental sort of utterance recognized in the version of HPSG developed in Pollard and Sag (1994) is the sign, of which there are two subsorts: the lexical sign (or word) and the phrasal sign (or phrase). So, just as lexical entries are descriptions of families of words (more precisely, types of word utterances), schemata are descriptions of families of phrases (types of phrase utterances). And parochial and universal principles are just further descriptions { i.e. additional constraints that the phrases of the language in question must satisfy. Words are feature structures like those described below, where certain phonological, mor- phological, syntactic and semantic features are assumed to be appropriate and are further hypothesized to be organized according to a particular feature geometry. Phrases, on the other hand, have some appropriate features of their own, as well as their own feature ge- ometry. The feature geometry of phrases is what is normally discussed in terms of phrase structure, but the feature structure description of phrase structure looks somewhat dierent. Instead of a tree structure like (4), we have a feature structure like the one described in (5): 2 3 (5)6PHON hFelix,chased,the,dogi 7 6SYNSEM `S' 7 6 2 " # 37 6 PHON hFelixi 7 6 7 6 6SUBJ-DTR h i 77 6 6 6 6 SYNSEM `NP' 7 7 7 6 6 6 2 37 77 6 6 PHON hchased,the,dogi 777 6 6 6 77 7 6 6 6SYNSEM `VP' 77 7 6DTRS 6 6 7 777 6 6 6 2 h i 3777 6 6HEAD-DTR PHON hchasedi 6 6 6HEAD-DTR 66 777 777 6 6 6 6 6 " #7 7777 6 6 6DTRS 6 PHON hthe,dogi 5777777 7 6 6 6 4 COMP-DTRS h 6 4 4 SYNSEM `NP' 5575 i 4 hd-comp-struc hd-subj-struc It may not be obvious whether there is any signicant dierence between the two con- ceptions of linguistic structure. However, there are several noteworthy advantages to this `sign-based' approach. First, the explicit mention of heads, subjects and complements al- lows constraints about linear order, feature `percolation', etc. to be stated without the 10 introduction of ancillary mechanisms. Second, the bundling of syntactic, semantic and even contextual information into each SYNSEM value makes such information available at each level of phrase structure. This exible access to contextual information is of considerable value in the treatment of focus placement and focus inheritance, as demonstrated by Engdahl and Vallduvi (1994), who exploit this exibility crucially in explaining dierences between the focus systems of, inter alia, English and Catalan. Third, the tree-based conception of phrase structure is a special case of the sign-based approach { one that uses only concatena- tion to relate the PHON values of mother and daughters. But generalizing such operations to include wrapping14 or other operations that permit interleaving (e.g. Reape's sequence union operation (Reape 1990, in press)) has proven to be an extremely interesting and suc- cessful aproach to the analysis of many problems of word order variation, extraposition, and coordination15. Finally, since the sign-based approach involves classifying phrases hierarchi- cally, it is possible to express generalizations about phrasal signs using the same multiple inheritance techniques that have proven so useful in the analysis of lexical signs (see below). 3. Universal Grammar HPSG is thus a constraint-based theory of grammatical competence. All of its repre- sentations { lexical entries, rules, and even universal principles { are partial descriptions of constructs used to model types of linguistic utterances.16 Hence HPSG linguistic descrip- tions are declarative, order-independent, and reversible, making them ideally suited for the description of linguistic performance. Many of the central constructs of HPSG are motivated by its adherence to strict lexi- calism, a thesis that entails that syntactic operations cannot operate on or make reference to internal properties of lexical items. Any lexically based theory necessarily employs rich lexical representations and HPSG's UG is a small set of principles that allow the gram- mar of phrases to be projected from the particular information encoded in lexical heads. One might think of the core of HPSG theory as an attempt to simplify both grammatical structures and their grammar, deriving the eects equivalent to those of head movement, functional categories and the projection principle all from the interaction of X -theory and 0 strict lexicalism. All X -theories embody some variant of the following principle, whose specic formulation 0 presumes that HEAD is a feature taking a feature structure complex as its value: (6) The Head Feature Principle (HFP) The HEAD value of a headed phrase is identied with that of its head-daughter. This familiar principle guarantees that certain grammatical properties, e.g. part-of-speech, case, and form class, are systematically projected onto X -phrases from lexical items, and 0 from X -phrases onto maximal phrases. The HEAD value of a word thus contains only 0 information that phrasal projections inherit in virtue of the HFP. 14 Various kinds have been investigated. See, for example Bach 1979 and Pollard 1984. 15 See, e.g. Reape in press, Kathol and Levine 1993, Kathol to appear 16 The idea that such a uniform characterization of linguistic theory is possible is due to Martin Kay. 11 As in Categorial Grammar, phrase maximality is described not in terms of bar level, but rather via combinatoric saturation. That is, a lexical entry bears certain specications that determine what elements it combines with syntactically. Such specications are stated in terms of the valence features SUBJ (SUBJECT), COMPS (COMPLEMENTS), and SPR (SPECIFIER).17 A headed phrase is well-formed only if it satises the following principle: (7) The Valence Principle (VALP) For each valence feature F, the F value of a headed phrase is the head-daughter's F value minus the realized non-head-daughters (e.g. Subj-Dtr, Complement-Dtrs, Spr- Dtr). The Valence Principle thus plays a role within HPSG much like that of the category cancella- tion associated with function application in Categorial Grammar. Although such principles are often described informally in terms of a bottom-up phrase generation procedure, notice that (7) is a static constraint on headed phrases. Universal grammar makes available a small set of schemata which specify partial infor- mation about universally available types of phrases. As in GPSG, these schemata not only abstract away from the principles of X theory just enumerated, but also from the order 0 of daughter elements, leaving such matters to more general constituent ordering principles. Three schemata of relevance are illustrated in (8). (8) a. Schema 1: X ! Head-Dtr Phrase, Subj-Dtr [COMPS h i] b. Schema 2: X ! Lexical Head-Dtr, Comp-Dtrs c. Schema 3: X ! Head-Dtr Phrase, Spr-Dtr [COMPS h i] Schema 1 licenses phrases consisting of a phrasal head daughter and a subject daughter; the phrases licensed by Schema 2 consist of a lexical head daughter and any number of complement daughters; Schema 3 allows a phrasal head to combine with an appropriate specier. Because of X theory, the head daughter's HEAD information is maximally projected in 0 any given phrase (by the HFP) and the head's valence information determines the elements that the maximal projection contains (in accordance with the VALP). Thus each subtree in the following structure satises one of the schemata and all of the principles of UG: 17This follows innovations in HPSG theory due to Robert Borsley (1989), specically as adapted by Pollard and Sag (1994: chap. 9). 12 (9) 2 S 3 HEAD 3 6 6SUBJ 4 h i775 COMPS hi 1 NP VP 2 3 HEAD 3 6 6SUBJ 4 h i775 1 Felix COMPS hi 2 V 3 2 NP 3 HEAD 3 HEAD 4 6 6SUBJ 4 h i7751 6 24SPR 6 h i775 COMPS hi 2 COMPS hi 5 DET N 0 chased 2 HEAD 4 3 noun7 6 6SPR 4 hi 5 7 5 the COMPS hi dog The boxed integers in these tree diagrams are variables used to `tag' certain feature values within the structure as being token identical, as required by the HFP or the VALP. Thus the part-of-speech information (tagged 4 in (9)) specied in the lexical entry for dog , is identied with that of the NP it projects, in accordance with the HFP. (The same would be true for CASE specications in a language whose nouns were systematically in ected for case.) In like fashion, the lexical entry for chased species the part-of-speech verb, which the HFP ensures is also the part-of-speech of the VP and the S. (The reader should continue to bear in mind that the tree structure shown in (9) is used solely for expository convenience. We're all used to thinking in terms of phrase structure trees, after all. The tree in (9) depicts (albeit in more detail) the very same phrasal sign that we illustrated earlier in feature structure notation.) The lexical entry for chased selects for an NP complement and hence may combine by Schema 2 with the phrase the dog, whose grammatical information (tagged 2 ) is identied with the complement selected by chased. Chased similarly selects lexically for an NP subject, and this specication is also part of the VP (in accordance with the Valence Principle). Hence this VP combines with the subject NP by Schema 1 to form a saturated phrase, i.e. one all of whose valence specications are empty. 13 4. The Hierarchical Lexicon Once schemata and universal principles are formulated in this modular fashion, it becomes possible to reduce the vast complexity of phrasal types to the diversity of lexical entries, each of which will project its own particular kind of phrase in virtue of its specications for HEAD and the various valence features and their interaction with the principles and schemata just illustrated. Now a word is just another kind of feature structure which, if presented in all its detail, would be described in the fashion of 10. 2 3 (10)6PHON 2 hchasedi 37 6 6 synsem 7 6 6 2 2 " # 3377 6 6 verb 77 7 6 6 6 6 6HEAD 777 777 6 6 7777 6 6 6 6 6 VFORM n 77 6 6 6 6 6 2 3777 7 6 6 6 6 6 synsem 777 777 6 6 6 6 7777 6 6 6 6 6 6SUBJ 6 " #7 7i7 7777 7 6 h 16 noun 6 6 6 6 6 4 LOC CAT HEAD 5 777 777 6 6 6 6CATEGORY 6 j j CASE nom 7777 6 6 6 7777 6 6 6 6 6 6 2 3 7 7777 7 6 6 6 6 synsem 777 7 7 6 6 6 6 7 7777 6 6 6 " #7 6 6LOCAL 6 6COMPS h 26 7i 7 77 6 6 6 6 noun 7777 6SYNSEM 6 6 6 4 LOC CAT HEAD 5 7777 6 7777 j j 6 6 6 6 6 6 4 CASE acc 5777 6 6 6 6 7 777 7 6 6 6 6 ARG-S h 1 , 2 i 777 6 6 6 6 77 77 6 6 6CONTENT ... 777 7 6 6 7 6 6 6 6 2 " #3 77 77 7 6 6 SPEAKER ... 7 7 6 6 6 6 6CONX-INDICES 77 77 7 6 6 7 6 6 6CONTEXT 6 HEARER ... 75 777 6 6 6 4 4 57 77 6 6 BACKGROUND ... 77 6 6 77 6 6 2 n o3 77 6 6 77 6 6 SLASH 77 4 4NONLOCAL 4 5 57 5 ... But since these descriptions quickly become unwieldy, we will systematically simplify them. A few abbreviated entries are sketched in (11). (11) a. chases b. picture c. of 2 3 2 3 2 3 HEAD verb [n] 7 HEAD noun HEAD prep[of]7 6 6SUBJ 6 h 1 NP[nom]3si777 6 6SPR 6 h 1 Deti 7 7 7 6 6SUBJ 6 h i 777 6COMPS h 2 NPi 6COMPS h( 2 PP[of])i7 6COMPS h 1 NPi 7 6 6 7 6 7 4 5 4 5 4 5 ARG-S h 1 , 2 i ARG-S h 1 , 2 i ARG-S h 1 i Note here that the SUBJ and COMPS lists (or SPR and COMPS lists) `add up' to the list value of another feature called ARGUMENT-STRUCTURE (ARG-S). ARG-S values corre- spond to the hierarchical argument structure of a word (relevant, for example, to binding 14 theory { see Pollard and Sag (1992)), while the valence features specify the word's combina- toric potential. Lexical entries such as these contain much information that can in fact be consolidated within an explanatory theory of lexical structure and organization. Indeed, considerable re- search within HPSG has been concerned with the development of just such theories, namely those which allow complex lexical information to be factored in various ways to re ect ap- propriate linguistic generalizations. Central to this line of inquiry has been the concept of hierarchical classication { essentially an assignment of words to categories, and an assign- ment of those categories to superordinate categories. With each category (or sort), certain attributes are specied to be appropriate and certain constraints are stated that hold for all members of that category. Without stipulation, a word inherits all the features and constraints of the (atomic) sort it is assigned to and, via the technique of hierarchical inher- itance, all such features and constraints declared for supersorts of that atomic sort are also associated with the word in question. In HPSG, words are modelled by feature structures of a particular sort. And because particular words (atomic lexical feature structures) are multiply classied, i.e. have more than one nonatomic supersort, it is possible to express cross-cutting generalizations about words in an elegant, deductive fashion. (See, e.g. Flickinger 1987, Flickinger and Nerbonne 1992, Riehemann 1993, Davis in progress). Consider the three verbs in (12).18 (12) a. chases b. continues c. dies 2 3 2 3 2 3 HEAD verb [n] HEAD verb[n] HEAD verb [n] 6 7 6 7 6 7 6SUBJ h 1 NP[nom]3si7 6SUBJ h 1 NPi 7 6SUBJ h 1 NP[nom]3si7 6 7 6 7 6 7 6COMPS h 2 NPi 7 6COMPS h 2 VP[inf, SUBJ h 1 i]i7 6COMPS h i 7 4 5 4 5 4 5 ARG-S h 1 , 2 i ARG-S h 1 , 2 i ARG-S h 1 i None of the feature specications indicated in (12) needs to be stipulated ad hoc. Words are assigned to sorts that are subordinate to various others. With each sort come certain con- straints stating general properties that are true of all elements belonging to that sort. Thus by establishing hierarchical relations among sorts, an individual word inherits all properties (constraints) associated with all its sorts and all supersorts of those sorts19 The lexical descriptions in (12) are thus a logical consequence of the appropriate sortal classication of English verbs, which might appear as in (13). 18Note here that the SUBJ and COMPS lists (or SPR and COMPS lists) `add up' to the list value of another feature called ARGUMENT-STRUCTURE (ARG-S). ARG-S values correspond to the hierarchical argument structure of a word (relevant, for example, to binding theory { see Pollard and Sag (1992)), while the valence features specify the word's combinatoric potential. 19Up to consistency. I will assume here without argument that subordinate con icting constraints `override' more general superordinate constraints. 15 (13) SORT "CONSTRAINTS # ISA verb HEAD verb word SUBJ hNPi trans-verb "[COMPS hNP,...i] # verb subj-raising SUBJ h 1 i verb COMPS hXP[SUBJ h 1 i], : : :i strict-intran-verb h[COMPS h i] i verb obj-raising verb COMPS h 1 ,XP[SUBJ h 1 i]i tran-verb strict-tran-verb "[COMPS hXi] # tran-verb nite-verb HEAD [VFORM n] verb SUBJ NP[nom]i h 3rd-person-verb [SUBJ hNP3sg i] nite-verb base-verb [HEAD [VFORM base]] verb passive-verb [HEAD [VFORM pass]] verb The sort names at the end of each line in (13) specify `is a' relations among the sorts, i.e. they indicate each sort's immediately superordinate sort(s). The resulting inheritance hierarchy thus allows the particular properties of the lexical entries in (12) to be derived, i.e. deduced, from the sort assignments in (14): (14) chases: strict-tran-verb & 3rd-person-verb continues: subj-raising & 3rd-person-verb dies: strict-intran-verb & 3rd-person-verb Thus a 3rd-person-verb form like chases is assigned to two distinct atomic sorts strict- transitive-verb and 3rd-person-verb, each of which species a dierent subset of the informa- tion that chases inherits, as shown in (14). Multiple inheritance is thus an essential feature of lexical organization in a theory like HPSG. It is a fundamental mechanism for expressing common properties of lexical items that are divergent in other respects. Further generalizations about lexical entries are expressed by lexical rules.20 As in early work in LFG (Bresnan, ed. 1982), these systematically expand the set of basic (or `canonical') lexical entries, specifying only particular noncanonical properties that hold of the output forms. Among these is the passive lexical rule, sketched in (15). (15) Passive Lexical Rule (PLR): 2 3 2 3 hNP i 775) trans-verb passive-verb 6 6SUBJ 4 6 6SUBJ 4 hi 2 7 7 5 h ,: : :i h: : : (,PP )i i COMPS 2 COMPS i 20For recent attempts to eliminate lexical rules in HPSG in favor of a hierarchically organized theory of morphological structure, see Riehemann 1994, Kim 1994, Kathol 1994, and Malouf 1994. 16 Within a lexical rule, all properties of the input (e.g. semantic role assignment) that are not explicitly modifed remain unchanged in the corresponding output. Thus, in virtue of (15), the base form of the lexeme chase (looks similar to (12)a above (but is base instead of nite) gives rise to the appropriately specied passive form chased: (16) chased 2 3 HEAD verb[pass] 7 6 6SUBJ 6 h 2 NPi 777 6COMPS h( 3 PP[by])i7 6 4 5 ARG-S h 2 , 3 i This form may then serve as the lexical head of a passive verb phrase, e.g. chased by the police. This follows from the interaction of the lexicon, Schema 2, the HFP and the Valence Principle without the need for any passive-specic machinery. 17 References Bach, Emmon. 1979. Control in Montague Grammar. Linguistic Inquiry 10: 515{531. Borsley, Robert. 1989. Phrase-Structure Grammar and the Barriers Conception of Clause Structure. Linguistics 27: 843{863. Bresnan, Joan, ed. 1982. The Mental Representation of Grammatical Relations. Cambridge, MA.: MIT Press. Bresnan, Joan and Ronald M. Kaplan. 1982. Introduction. In Bresnan, ed. 1982a. Pp. xvii-lii. Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton. Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam. 1982. Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: MIT Press. Davis, Anthony. in progress. Linking and the Hierarchical Lexicon. Doctoral dissertation, Stanford University. Engdahl, Elisabet and Enric Vallduvi. 1994. Information Processing in Constraint-Based Grammars. Unpublished paper: University of Edinburgh. Fenstad, Jens-Erik, Per-Kristian Halvorsen, Tore Langholm and Johan van Benthem. 1987. Situations, Language, and Logic. Dordrecht: Reidel. Flickinger, Daniel. 1987. Lexical Rules in the Hierarchical Lexicon. Doctoral dissertation, Stanford University. Flickinger, Daniel P. and John Nerbonne. 1992. Inheritance and Complementation: A Case Study of Easy Adjectives and Related Nouns. Computational Linguistics 18: 269{309. Halvorsen, Per-Kristian: 1983. Semantics for Lexical-Functional Grammar. Linguistic Inquiry 14: 567{616. Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge: Cambridge University Press. Johnson, David and Paul Postal. 1980. Arc Pair Grammar. Princeton: Princeton University Press. Kathol, Andreas. to appear. Doctoral dissertation. Ohio State University. Kathol, Andreas, and Robert Levine. 1993. Inversion as a Linearization Eect. In Proceedings of NELS 23. GLSA, Amherst. Koster, Jan 1987. Domains and Dynasties. Dordrecht: Foris. 18 Langendoen, D. Terrence, and Paul Postal. 1984. The Vastness of Natural Language. Oxford: Basil Blackwell. Moshier, Drew. 1988. Extensions to Unication Grammars for the Description of Program- ming Languages. Doctoral dissertation, University of Michigan, Ann Arbor. Pollard, Carl. 1984. Generalized Context-Free Grammars, Head Grammars and Natural Lan- guage. Doctoral dissertation, Stanford University. Pollard, Carl and Drew Moshier. 1990. Unifying Partial Descriptions of Sets. Information, Language and Cognition, Volume 1 of Vancouver Studies in Cognitive Science. University of British Columbia Press, Vancouver. PAGES?? Pollard, Carl and Ivan A. Sag. 1992. Anaphors in English and the Scope of Binding Theory. Linguistic Inquiry 23: 261{303. Pollard, Carl and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press and Stanford: CSLI Publications. Pollard, Carl. 1985. Phrase Structure Grammar without Metarules. Proceedings of the Fourth West Coast Conference on Formal Linguistics, eds. Jerey Goldberg, Susannah MacKaye and Michael Wescoat. Stanford: Stanford Linguistics Association. Pp. 246{261. Postal, Paul and Georey K. Pullum. 1988. Expletive Noun Phrases in Subcategorized Po- sitions. Linguistic Inquiry 19: 635{670. Reape, Mike. 1990. A Theory of Word Order and Discontinuous Constituency in West Con- tinental Germanic. In Elisabeth Engdahl and Mike Reape, eds., Parametric Variation in Germanic and Romance: Preliminary Investigations. ESPRIT Basic Research Ac- tion 3175 DYANA, Deliverable R1.1.A.. Edinburgh: Center for Cognitive Science and Department of Articial Intelligence, 25{39. Reape, Mike. In press. Getting Things in Order. In A. Horck and Harry Bunt, eds., Discon- tinuous Constituency. Berlin: Mouton de Gruyter. Riehemann, Susanne. 1993. Word Formation in Lexical Type Hierarchies: A Case Study of bar-Adjectives in German. M.A. thesis, University of Tubingen. Sag, Ivan A., Ronald Kaplan, Lauri Karttunen, Martin Kay, Carl Pollard, Stuart Shieber and Annie Zaenen. 1985. Unication and Grammatical Theory. Proceedings of the Fifth West Coast Conference on Formal Linguistics, eds. Mary Dalrymple, Jerey Goldberg, Kristin Hanson, Michael Inman, Chris Pi~non, and Stephen Wechsler. Stanford: Stanford Linguistics Association. Pp. 238-254. 19