ORIGINAL RESEARCH ARTICLE published: 25 March 2014 BEHAVIORAL NEUROSCIENCE doi: 10.3389/fnbeh.2014.00097 Temporal relation between top-down and bottom-up processing in lexical tone perception Lan Shuai 1* and Tao Gong 2* 1 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA 2 Department of Linguistics, University of Hong Kong, Hong Kong, China Edited by: Speech perception entails both top-down processing that relies primarily on language Leonid Perlovsky, Harvard University experience and bottom-up processing that depends mainly on instant auditory input. and Air Force Research Laboratory, USA Previous models of speech perception often claim that bottom-up processing occurs in an early time window, whereas top-down processing takes place in a late time window Reviewed by: Ryan Giuliano, University of Oregon, after stimulus onset. In this paper, we evaluated the temporal relation of both types of USA processing in lexical tone perception. We conducted a series of event-related potential I-Fan Su, The University of Hong (ERP) experiments that recruited Mandarin participants and adopted three experimental Kong, Hong Kong Wentao Gu, Nanjing Normal paradigms, namely dichotic listening, lexical decision with phonological priming, and University, China semantic violation. By systematically analyzing the lateralization patterns of the early *Correspondence: and late ERP components that are observed in these experiments, we discovered that: Lan Shuai, Department of Electrical auditory processing of pitch variations in tones, as a bottom-up effect, elicited greater and Computer Engineering, Johns right hemisphere activation; in contrast, linguistic processing of lexical tones, as a top- Hopkins University, North Charles Street 3400, Baltimore, MD 21218, down effect, elicited greater left hemisphere activation. We also found that both types of USA processing co-occurred in both the early (around 200 ms) and late (around 300–500 ms) e-mail:

[email protected]

; time windows, which supported a parallel model of lexical tone perception. Unlike the Tao Gong, Department of previous view that language processing is special and performed by dedicated neural Linguistics, University of Hong Kong, Pokfulam Road, Hong Kong, circuitry, our study have elucidated that language processing can be decomposed into China general cognitive functions (e.g., sensory and memory) and share neural resources with e-mail:

[email protected]

these functions. Keywords: lexical tone, ERP, lateralization, serial model, parallel model INTRODUCTION In this paper, we discussed the temporal relationship between Perception in general comprises two types of processing, bottom- bottom-up and top-down processing in lexical tone perception, up (or data-based) processing and top-down (or knowledge- with the purpose of not only examining the underlying mecha- based) processing, which are based, respectively, on incoming nisms of lexical tone perception but also shedding valuable light data and prior knowledge (Goldstein, 2009). For speech percep- on the general models of speech perception concerning tonal tion, the TRACE model (McClelland and Elman, 1986) claims languages. Lexical tone is a primary use of pitch variations to that both types of processing are necessary, and the auditory distinguish lexical meanings (Wang, 1967). Noting that pitch per- sentence processing model (Friederici, 2002) proposes that the ception belongs to the general auditory perception that is also cognitive processes involved in speech perception proceed in a shared by other animals (Hulse et al., 1984; Izumi, 2001; Yin et al., series of steps. Following these models, bottom-up processing 2010) and word semantics are acquired primarily through lan- such as acoustic processing of incoming signal and generalization guage learning, lexical tone perception also entails both bottom- of speech features happens first, whereas top-down processing up and top-down processing. In our study, we defined bottom-up such as recognition based on knowledge of phonemes, seman- processing as auditory processing and feature extraction of incom- tics, or syntax takes effect at a later stage of perception. These ing acoustic signals, which referred specifically to pitch contour models, as well as other theories or models of speech percep- perception. By contrast, we defined top-down processing as recog- tion (e.g., Liberman and Mattingly, 1985; Fowler, 1986; Stevens, nition and comprehension of incoming signals according to 2002; Diehl et al., 2004; Hickok and Poeppel, 2007), are based pri- language knowledge, which referred specifically to influence of marily on evidence from non-tonal languages. Two issues remain language experience on recognizing and comprehending a certain to be explored: (a) whether the processing of tonal languages, syllable in a tonal language. The recognition of a pitch contour as which take up about 60–70% of world languages (Yip, 2002), fol- a certain tonal category was also ascribed to top-down processing. lows the same cognitive processes; and (b) what is the role of Ample of available studies on lexical tone perception focused lexical tone perception in a general model of speech perception. on the lateralization patterns of lexical tone processing (e.g., Van In addition, considering that the lexical tone attached to a syl- Lancker and Fromkin, 1973; Baudoin-Chial, 1986; Hsieh et al., lable is carried mainly by the vowel nucleus of the syllable, the 2001; Wang et al., 2001; Gandour et al., 2002, 2004; Tervaniemia temporal dimension of cognitive processes underlying lexical tone and Hugdahl, 2003; Luo et al., 2006; Zatorre and Gandour, 2008; perception is of special interest. Li et al., 2010; Krishnan et al., 2011; Jia et al., 2013), and reported Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 1 Shuai and Gong Processing in lexical tone perception mixed results even under the same experimental paradigms. For that phonemes with slow- (lexical tone) and fast-changing (stop- example, by employing the dichotic listening (DL) paradigm consonant) acoustic properties inducted, respectively, right and and materials from Mandarin, Baudoin-Chial (1986) reported no left lateralization patterns of the MMN (Mismatch Negativity) hemisphere advantages of lexical tone perception, but Wang et al. component. The second half was proposed to address the con- (2001) found a left hemisphere advantage. Using fMRI, Gandour fliction between their results and the previous literature that and colleagues compared lexical tone processing with intonation showed a general left hemisphere advantage of lexical tone per- or vowel processing. The study of lexical tone and intonation ception. They proposed that during the late stage a left later- (Gandour et al., 2003b) revealed a left hemisphere advantage in alization should be shown in the semantics-associated late ERP frontal lobe, whereas the study of lexical tone and segments (Li component, N400 (Kutas and Hillyard, 1980). et al., 2010) discovered a right hemisphere advantage in fronto- This serial model associated the right hemisphere advantage parietal area for the perception of tones. A right lateralization of with bottom-up processing of lexical tones, and the left hemi- lexical tone perception was also reported in an ERP (Luo et al., sphere advantage with top-down processing. In terms of lexical 2006) and a DL experiment (Jia et al., 2013). tone perception, there exists ample evidence in support of such The inconsistent laterality effects could be due to different association between the two types of processing and the two types experimental conditions in these studies. For example, in the DL of hemisphere advantage. For example, in studies of language experiments reporting a left hemisphere advantage (Van Lancker experience and prosody, Gandour et al. (2004) dissociated lin- and Fromkin, 1973; Wang et al., 2001), tonal language speakers guistic processing in the left hemisphere and acoustic processing participated into more difficult tasks than non-tonal language in the right hemisphere, by locating a left lateralization in certain speakers, and the heavier load of these tasks (e.g., hearing trials brain regions in tonal language speakers and a right lateralization at a faster pace) might enhance the left hemisphere advantage in in non-tonal language speakers during speech prosody process- tonal language speakers. By contrast, there were no hemisphere ing. Pitch processing has a right hemisphere advantage, as shown advantages in the study that had no task differences between in behavior experiments such as DL (Sidtis, 1981), PET stud- tonal and non-tonal language speakers (Baudoin-Chial, 1986). ies (Zatorre and Belin, 2001), and later fMRI studies (Boemio In addition, in the DL tasks that involved meaningless sylla- et al., 2005; Jamison et al., 2006); for review, see Zatorre et al. bles and hums, which could direct participants’ attention toward (2002). By contrast, compared to non-tonal language speakers, pitch contours only, a right hemisphere advantage was shown (Jia tonal language speakers have greater left hemisphere activities et al., 2013). More importantly, whether language-related tasks during lexical tone perception (Gandour et al., 1998, 2004; Hsieh are involved is the primary noticeable difference between studies et al., 2001; Wang et al., 2004), and multiple brain regions in the showing an explicit right hemisphere advantage (e.g., Luo et al., left hemisphere were believed to be the primary source of N400 2006) and those reporting a left hemisphere advantage of lexical (Lau et al., 2008). Noting these, we also adopted the lateraliza- tone perception (Van Lancker and Fromkin, 1973; Hsieh et al., tion pattern in our study to investigate top-down and bottom-up 2001; Wang et al., 2001; Gandour et al., 2002, 2004). For example, processing of lexical tones. Luo et al. (2006) conducted a passive listening task in which par- In addition, in Luo et al. (2006)’s study, there was insuffi- ticipants were engaged in a silent movie, whereas the other studies cient direct evidence to manifest the top-down effect at the late carried out explicit language tasks such as lexical tone identifica- stage of processing, because this study only explored acoustic tion. Accordingly, the right lateralization reported in Luo et al. factor without involving explicit language-related tasks or any (2006)’s study could be attributed to the pure bottom-up effect linguistic factor. Therefore, it is hard to comprehensively evalu- without top-down influence, whereas the other studies did not ate Luo et al. (2006)’s serial model. Considering these, in order address the underlying mechanisms of lexical tone perception. to make sure that language knowledge (top-down) would take This could lead to the inconsistent results between these stud- effect, we adopted a number of explicit language-related tasks, ies. These mixed results also reflect a multifaceted perspective on including DL, lexical decision with phonological priming, and lexical tone processing and hemispheric lateralization. As stated semantic violation. Meanwhile, we manipulated both the acoustic in Zatorre and Gandour (2008)’s review, in tonal processing, “it (requiring bottom-up processing) and semantic (requiring top- appears that a more complete account will emerge from con- down processing) factors in the experimental design and analyzed sideration of general sensory-motor and cognitive processes in the ERP components at both the early (around 200 ms) and late addition to those associated with linguistic knowledge.” (around 300–500 ms) processing stages to explore the temporal To our knowledge, among the available studies, there was only relationship of the two types of processing during lexical tone one work (Luo et al., 2006) that discussed these two types of pro- perception. cessing in lexical tone perception and a few that examined the Our experimental results showed that both bottom-up (acous- cognitive processes involved for lexical tone perception (Ye and tic) processing and top-down (semantic) processing exist in both Connie, 1999; Schirmer et al., 2005; Liu et al., 2006; Tsang et al., the processing state around 200 ms and that around 300–500 ms, 2010). In Luo et al. (2006)’s study, a serial model of lexical tone which inspired a parallel model of top-down and bottom- processing was proposed, which suggested that bottom-up pro- up processing in lexical tone perception. In the rest of the cessing (i.e., pitch perception) took effect in an early time window paper, we described the two ERP components traced in our around 200 ms and top-down processing (i.e., semantic compre- experiments of Mandarin lexical tone perception (section ERP hension) happened in a late time window around 300–500 ms. Components Reflecting Bottom-up and Top-down Processing), The first half of this model was based on their experimental results reported these experiments and their findings (section ERP Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 2 Shuai and Gong Processing in lexical tone perception Experiments of Lexical Tone Perception), discussed the lateraliza- top-down effects could happen at the early stage of a “lower-level” tion patterns of the ERP components shown in these experiments processing. To this purpose, we designed Experiment 1 using the and the derived parallel model of lexical tone perception (sec- DL task. Apart from the bottom-up effect on phoneme identifica- tion General Discussions), connected language processing with tion, we introduced a semantics factor to see whether a top-down general cognitive functions (section Language Processing and effect inducted by this factor could exist in the early stage of per- General Cognitive Functions), and finally, concluded the paper ception and whether such effect could be reflected by the early (section Conclusion). ERP components (e.g., P2). Second, we were interested in identifying bottom-up effects at ERP COMPONENTS REFLECTING BOTTOM-UP AND the late stage of a “high-level” processing. To this purpose, we TOP-DOWN PROCESSING designed Experiment 2 using an auditory lexical decision task, We examine two ERP components in our experiments, namely which entailed a top-down semantic processing and a bottom-up auditory P2 and auditory N400, which occur, respectively, in the processing induced by various types of phonological primes. We early and late time windows after stimulus onset. also designed Experiment 3 using a semantic violation task. In Auditory P2 is the second positive going ERP component. It this task, semantic integration could be reflected by the late ERP usually has a central topographic distribution, and peaks in the component (e.g., N400). Meanwhile, phonemes bearing differ- early time window around 200 ms (Luck, 2005). The lateraliza- ent acoustic properties could also induce the bottom-up acoustic tion of P2 is subject to both acoustic properties and tasks (e.g., processing at this stage. categorizing emotional words, Schapkin et al., 2000). The corre- Experiment 1 involved a DL task, which is a widely-adopted sponding MEG component is P2m or M200. Previous research paradigm in behavioral and ERP studies examining the lateraliza- reported a general left lateralization of P2m in doing language- tion in the auditory modality. For example, Eichele et al. (2005) related tasks (e.g., perceiving consonants and vowels, Liebenthal adopted a DL task using stop consonants as stimuli, and discov- et al., 2010), but acoustic properties of incoming signals also affect ered that the latency of the ERP waveforms in the left hemisphere the lateralization of P2m (e.g., the voice onset time of consonants, were shorter than those in the right hemisphere, thus reflect- Ackermann et al., 1999). ing a quicker response of the left hemisphere in perceiving stop As a negative going potential, the auditory N400 appears in the consonants. Wioland et al. (1999) explored pitch perception in a late time window (around 250–550 ms) when the target sound DL task, and found that the ERP waveforms had higher ampli- stimulus is incongruent with the context (Kutas and Federmeier, tudes when the tone change happened in the left ear than in the 2011). The semantic violation paradigm can elicit N400 (Kutas right ear, thus indicating that the right hemisphere had preva- and Hillyard, 1980). The phonological priming experiment can lence in pitch discrimination. In our experiment, we adopted also elicit N400, when comparing the control condition with the the DL paradigm to explore tone lateralization, and used the priming condition (Praamstra and Stegeman, 1993; Dumay et al., amplitude of auditory P2 as a temporal indicator, rather than 2001). The auditory N400 usually has a more frontal topological ear advantages as in previous contradictory behavioral responses distribution than the visual N400 (Holcomb and Anderson, 1993; (Van Lancker and Fromkin, 1973; Baudoin-Chial, 1986), to reflect Kutas and Federmeier, 2011). In young population, the auditory hemispheric specialization. We compared the lateralization pat- N400 tends to have a frontal distribution (Curran et al., 1993; terns under tones and stop consonants in both words and non- Tachibana et al., 2002). The source of N400 is believed to lie words. In terms of acoustic properties, the stop consonants have in the frontal and temporal brain areas (Maess et al., 2006; Lau fast-changing properties, whereas the lexical tones in Mandarin et al., 2008), starting from 250 ms in the posterior half of the left have slow-changing properties. superior temporal gyrus, migrating forward and ventrally to the We expected an increase in the activity of the hemisphere for left temporal lobe by 365 ms, and then moving to the right ante- a certain processing, when there was a heavier load of informa- rior temporal lobe and both frontal lobes after 370 ms (Kutas and tion in the corresponding hemisphere. For example, in dichotic Federmeier, 2011). trials containing two different lexical tones, there would be a rela- tively greater right hemisphere advantage (equivalent to a less left ERP EXPERIMENTS OF LEXICAL TONE PERCEPTION hemisphere advantage) than dichotic trials containing two differ- We designed three ERP experiments to explore the temporal ent stop-consonants but the same lexical tones. Similarly, dichotic relation of bottom-up and top-down processing in lexical tone trials containing words should generate a greater left hemisphere perception. These experiments recruited Mandarin participants activity than dichotic trials containing non-words. In line with and traced the above two ERP components in three tasks, respec- previous literature (Wioland et al., 1999; Luo et al., 2006), we tively, at the syllable, word, and sentence levels, which cover examined the ERP waveforms in the C3 and C4 electrode groups. aspects of acoustics and phonetics, phonology, and semantics Experiment 2 involved a lexical decision task with phonologi- processing. According to the serial models (e.g., Friederici, 2002), cal priming. Priming refers to the phenomenon of acceleration in these types of processing could be reflected by different ERP com- response after repetition. An early study of child language acqui- ponents shown at the early and late stages. However, a parallel sition (Bonte and Blomert, 2004) adopted such a task. It used model would predict a co-existence of these types of processing at Dutch words and non-words as testing materials, and discovered both the early and late stages of lexical tone perception. different N400 reduction patterns in different language groups. These experiments were designed primarily for the follow- Our experiment adopted a similar design, but used consonants ing two reasons. First, we were interested in clarifying whether and tones, as well as Chinese words and non-words as testing Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 3 Shuai and Gong Processing in lexical tone perception materials. In Experiment 2, consonant or tone primes appeared before target words, and we examined the auditory P2 and audi- tory N400 under the tone or consonant priming paradigm. Other than the enhancement effect of DL as in Experiment 1, we expected that there would be a reduction of ERP components (smaller amplitude) due to the priming effect, and that semantic violation would induce a reduction of ERP amplitudes (as shown by the smaller amplitude in the positive component and greater amplitude in the negative component). These reductions could be greater in the corresponding hemisphere related to a certain pro- cessing. For example, the reduction caused by lexical tone priming should have a greater right hemisphere advantage (equivalent to a less left hemisphere advantage) compared to that of consonants, whereas the reduction caused by non-words should be greater in the left hemisphere compared to that of words. Considering the topographic distributions of auditory P2 and N400 (Curran et al., 1993; Tachibana et al., 2002; Luck, 2005) as well as the audi- tory brain regions involved in the tone priming tasks (Wong et al., 2008), In Experiment 2, we examined the ERP waveforms in the posterior (P3, P4) and frontal (F3, F4) electrode groups. Experiment 3 involved a semantic violation task in sentences. N400 has been one of the most widely-explored ERP components in such studies, and we adopted the semantic violation paradigm FIGURE 1 | Electrode positions of the EGI 128-channel Geodesic to explore whether acoustic property affected the lateralization of Sensor Net, in which the key electrodes (F3, F4, C3, C4, P3, and P4) for auditory N400 occurring in the late time window during sentence forming electrode groups to trace interested ERP waveforms and comprehension, which was a high-level linguistic task. The viola- components are marked. This figure is available at: ftp.egi.com/pub/ documentation/placards/gsn200_128_map.pdf. tion was induced by changing either the stop consonant or the tone of the target syllable in a sentence. We expected a greater right lateralization of the N400 induced by lexical tone violation air-conducting insert earphones, which diminished the environ- compared to consonant violation. Here we set the central (C3, mental noise by 20–30 dB. Sound pressure level, measured by C4) electrode groups as the regions of interest. a sound level meter, was set to 75 dBSPL during experiment. The sound materials in these experiments were recorded from a PARTICIPANTS AND SETTINGS female, native Mandarin speaker. The recording was conducted in All these experiments were approved by the College Research a sound-proof booth using Shure SM10A microphone and Sony Ethics Committee (CREC) of Hong Kong. Thirty-two university PCM-2700A audio recorder. The adjustments on recorded sound students (16 females, 16 males) volunteered for these experiments materials were implemented by the PSOLA (pitch-synchronous (age range: 19–29, mean = 27, SD = 4.2). In Experiment 1, overlap add) algorithm in Praat (Boersma and Weenink, 2013), data from all participants were analyzed. In Experiment 2, the the experimental procedures were implemented using E-Prime data of one participant were excluded due to excessive eye move- (Psychology Software Tools, Pittsburgh, PA), and the statistical ments, thus leaving 31 participants (age range: 19–29, mean = analyses were conducted using the SPSS software (version 18.0, 25, SD = 2.3). In Experiment 3, the data of three participants SPSS Inc. Chicago, IL). were excluded, thus leaving 29 participants (age range: 19–29, mean = 26, SD = 2.8). EEG DATA RECORDING AND ERP PROCESSING All these participants were native Mandarin speakers with no The EEG (electroencephalography) data were collected by a 128- musical training. They had normal hearing (below 25 dBHL) in channel EEG system with Geodesic Sensor Net (EGI Inc., Eugene, both ears and less than 10 dBHL differences at 125, 250, 500, 750, OR, USA) (see Figure 1). The impedances of all electrodes were and 1000 Hz between the two ears, according to the PTA (pure kept below 50 k at the beginning of the recording. In all the tone analysis) test. They were all right-handed according to the three experiments, participants were encouraged to avoid blink- Edinburgh handedness test (Oldfield, 1971), and reported no his- ing or moving their body parts at certain points. Eye blinks and tory of head damage or mental illness. They signed informed movements were monitored through electrodes located above consent forms before each of these experiments, and got com- and below each eye and outside of the outer canthi. The origi- pensation at a rate of 50 HKD per hour after completing these nal reference point was the vertex. The ERPs were re-referenced experiments. to the averages of all 129 scalp channels in data processing (aver- These experiments were conducted on three separate days. age reference). During recording, signals were sampled at 250 Hz They were conducted in a dimly lit, quiet room. During exper- with a 0.01–100 Hz band-pass filter. iment, participants were seated comfortably in front of a com- During offline ERP processing, the recorded continuous data puter monitor, and the sound stimuli were presented via ER-3A were filtered by a 40 Hz low-pass filter and segmented from −100 Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 4 Shuai and Gong Processing in lexical tone perception FIGURE 2 | Sound waveforms and spectrograms of the eight Mandarin duration), followed by a rising portion (about half of the duration). The x-axis syllables in Experiment 1. The pitch contours of the syllables having level represents time (0–350 ms) and the y-axis represents frequency (75–5000 Hz) tones have stable level portions throughout the duration, whereas those of in spectrograms. The blue curve represents pitch contour at a different scale the syllables having rising tones start with a level portion (about half of the (50–500 Hz) superimposed on the spectrograms. to 900 ms by referring to the stimulus onset. The segments hav- in the IPA notation), two diphthongs (/au/ and /ua/ in the IPA ing either an amplitude change exceeding 100 μV in the vertical notations), and two Mandarin tones (tone 1, the high level tone; eye channels and all electrodes, or a voltage fluctuation exceeding and tone 2, the high rising tone). Eight syllables were constructed 50 μV in the horizontal eye channels were excluded from analy- using these phonemes and tonemes, among which four were ses. In each experiment, at least a half number of total trials were real-syllables, having corresponding Chinese characters, whereas preserved for analysis in each condition and for every participant. the other four were pseudo-syllables, made of valid consonants The baseline correction was conducted from −100 to 0 ms. and diphthongs but having no corresponding Chinese characters. In the following sections, we reported the materials, proce- All these stimuli were cut to 350 ms based on intensity profile. dures, and results of these three experiments. Figure 2 shows their waveforms and spectrograms, among which the pitch contours are also marked. EXPERIMENT 1: MANDARIN TONE DICHOTIC LISTENING TASK Materials Procedure The recorded stimuli included Mandarin real- and pseudo- In the DL task, participants simultaneously heard two distinct syl- syllables, which are formed by two stop consonants (/p/ and /t/ lables, respectively, in their left and right ears, and were asked to Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 5 Shuai and Gong Processing in lexical tone perception Table 1 | Experimental conditions in Experiment 1, each containing Data analysis and results two syllables and four DL trials. As for the behavioral data, the overall rate of response was 95.7%. Conditions Word Non-word A Three-Way repeated-measures ANOVA of rates of correct response, with lexicality (word vs. non-word), acoustic contrast Consonant /pau1/ ( ,“bag”), /tau1/( ,“knife”) /tua2/,/pua2/ (consonant vs. tone), and hemisphere (left vs. right) as three Tone /pau2/ ( ,“hail”),/pau1/( ,“bag”) /pua1/,/pua2/ factors, revealed a significant three-way interaction [F(1, 31) = Words have corresponding Chinese characters (shown in brackets, together with 5.981, p < 0.024, η2p = 0.239]. In addition, the post-hoc analy- their meanings). The pronunciations are annotated with the IPA characters, the sis revealed a significant left hemisphere (right ear) advantage in numbers in which denote Mandarin tones. the non-word, consonant condition [t(19) = −2.280, p < 0.034]. Since participants needed to respond to both the consonant and the tone of the syllable in one ear, we did not analyze the respond, according to the Chinese character “ ” (left/right) reaction time. shown on the screen, both the consonant and the tone of the As for the ERP data, considering the central distribution of corresponding side of the auditory input, by pressing the corre- auditory P2 (Luck, 2005) and previous literature (Luo et al., sponding keys on the respond pad. 2006), we averaged the data recorded by the four homolog We adopted a two-by-two design, with word and non-word as pairs of adjacent central electrodes including C3 and C4 [elec- two levels of the lexicality factor, and stop consonant and tone trodes 37 (C3), 38, 42, 43 in the left hemisphere, and 105 as two levels of the acoustic contrast factor. The eight syllables (C4), 88, 104, 94 in the right hemisphere, according to the formed four experimental conditions, including the word, con- EGI system] for analysis. Since the P2 peak appeared between sonant condition; word, tone condition; non-word, consonant 180 and 200 ms, we averaged the amplitude of P2 within condition; and non-word, tone condition, each containing two this time range. A Three-Way repeated-measures ANOVA of syllables (see examples in Table 1). In the two word conditions, P2 amplitudes, with lexicality, acoustic contrast, and hemi- the words were formed by meaningful real-syllables in Chinese; sphere as three factors, revealed two significant interactions, in the two non-word conditions, the non-words were formed one between acoustic contrast and hemisphere [F(1, 31) = 7.744, by pseudo-syllables having no meanings in Chinese. In the two p < 0.0091, η2p = 0.200] and the other between lexicality and consonant conditions, the two syllables had the same diphthong and tone, but different initial consonants; in the two tone condi- hemisphere [F(1, 31) = 12.687, p < 0.0012, η2p = 0.290], and tions, however, the two syllables had the same initial consonant two main effects, hemisphere [F(1, 31) = 14.393, p < 0.0006, and diphthong, but different tones. To balance the two syllables, η2p = 0.317] and acoustic contrast [F(1, 31) = 22.024, < 0.0001, respectively, played to the left and right ears of participants and η2p = 0.415]. the two directions in participants’ responses (left or right), each Figure 3 shows the average ERP waveforms of the C3 and of these four conditions corresponded to four DL trials. In total, C4 electrode groups, Figure 4 shows the topographies of the there were 16 DL trials. ERP component contrasts, and Figure 5 shows the average P2 In each trial, a fixation first appeared on the center of the amplitudes between 180 and 200 ms in different conditions. screen and remained there. After 400 ms, the two syllables in a The greater left lateralization of P2 shown by comparing DL trial were simultaneously played to the left and the right ears the word conditions with the non-word conditions reflected of participants, respectively. Participants were encouraged not to top-down processing in the early time window around 200 ms. blink or move their body parts during the appearance of the fix- This indicates the involvement of language experience in the ation. After 1000 ms, the fixation on the screen was replaced by process. In order to differentiate words from non-words, par- the Chinese character that indicated left or right, and accord- ticipants needed prior language knowledge, which casted as ingly, participants reported the consonant and the tone of the a top-down effect, regardless of whether this effect belonged syllable heard by their corresponding ears. The purpose of let- to word form recognition (Friederici, 2002) or semantic ting participants hear the stimuli before seeing the indication (left processing. or right) was to avoid inducing prior bias in their attention. The The greater left lateralization of P2 shown by comparing the indication stayed on the screen for 2000 ms, during which par- consonant conditions with the tone conditions also reflected ticipants gave their responses. The presentation sequence of the bottom-up processing in the same time window. The differ- stimuli was randomized, and the order of choices between the two ence between these conditions was the speed of changes in consonants and between the two tones on the response box was acoustic cues. In line with previous results (Jamison et al., counter-balanced across participants. 2006), we found a greater left lateralization in perceiving Participants first went through a practice session (16 trials) to fast changing acoustic cues (formant transition in stop con- familiarize the experimental paradigm. In the experimental ses- sonants) and a less left lateralization in perceiving relatively- sion, a total of 256 trials were presented to participants, each slow changing acoustic cues (pitch changes in tones). We lasting around 5 s. The experiment consisted of four blocks, each expected that the less left lateralization of tone processing having 64 trials that lasted about 5 min. In each block, the 16 compared to consonant processing was due to the greater DL trials randomly repeat four times. Participants could take a right lateralization of tone processing compared to conso- 2-min break after each block, and the whole experiment lasted nant and rhyme in certain brain regions in Li et al.’s work approximately 30 min. 2010. Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 6 Shuai and Gong Processing in lexical tone perception FIGURE 3 | Average ERP waveforms of the C3 and C4 electrode groups under the four conditions of Experiment 1. (A) Word, consonant condition; (B) Non-word, consonant condition; (C) Word, tone condition; (D) Non-word, tone condition. FIGURE 4 | Topographies of P2 (180–200 ms) contrasts in different conditions of Experiment 1. (A) Consonant vs. tone condition; (B) Word vs. non-word condition. FIGURE 5 | Average P2 amplitudes of the C3 and C4 electrode groups in Experiment 1. EXPERIMENT 2: LEXICAL DECISION TASK WITH PHONOLOGICAL PRIMING non-words. They were instructed to ignore the monosyllabic Materials words (primes) played before the disyllabic words and focus on The recorded stimuli included monosyllabic words as primes the latter. and disyllabic words as targets. These words could be real- or Similar to Experiment 1, we adopted a two-by-two design, non-words in Chinese. The mean duration of the primes was with word and non-word as the two levels of the lexicality fac- 383.06 ms (range: 251–591 ms, SD = 49.17), and that of the tar- tor, and stop consonant and tone as the two levels of the priming gets 619.58 ms (range: 510–751 ms, SD = 52.05). There was no condition factor. The materials in Table 2 formed six experimen- significant differences of either the prime duration [F(5, 354) = tal conditions. In the two consonant conditions, the syllable in 1.098, p = 0.3609, η2p = 0.015] or the target duration [F(5, 354) = the prime shared the initial consonant with the first syllable of the 1.244, p = 0.2878, η2p = 0.017] between conditions. The onset target; in the two tone conditions, the syllable in the prime shared asynchrony between the primes and the targets was fixed at the tone with the first syllable of the target; and in the two control 1000 ms. The sound intensity of the primes was set to 55 dB, and conditions, the syllable in the prime and the first syllable of the that of the targets 75 dB. The purpose of presenting primes at target shared no phonemes or tonemes. a lower intensity was to maximize the priming effect (Lau and In each trial, participants first heard the prime. After 600 ms, a Passingham, 2007). fixation appeared on the center of the screen and remained there. After another 400 ms, participants heard the target. After another Procedure 1000 ms, the fixation disappeared, and participants had 3000 ms In the lexical decision task, participants were asked to judge to give their response, by pressing one of the two keys marked by whether the heard disyllabic words (targets) were words or “yes” and “no” in the keyboard. Participants were encouraged not Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 7 Shuai and Gong Processing in lexical tone perception Table 2 | Example materials and experimental conditions in 0.216]. Similar to Experiment 1, these results indicated that both Experiment 2. semantics and acoustic properties affected lateralization. Conditions Prime Target Apart from P2, we conducted another analysis of the ERP Word, consonant priming /t y 2/ ( ) /t i4// t 1/ ( , “vehicle”) waveform in the late time window (500–550 ms) based on four Non-word, consonant priming /t i 4/ ( ) /t i 1//p n3/ (* ) Word, tone priming /x 1/ ( ) / i 1//k 3/ ( , “Hong Kong”) homolog pairs of adjacent frontal electrodes including F3 and Non-word,tone priming /t yn2/ ( ) /ku 2// m n4/ (* ) F4 [electrodes 25 (F3), 28, 29, 35 in the left hemisphere, and Word, control /fa1/ ( ) / y 2//t 3/ ( , “scholar”) 124 (F4), 123, 118, 117 in the right hemisphere, according Non-word,control /lun2/ ( ) /ji 3//f 4/ (* ) to the EGI system]. Rather than deriving auditory N400 by The pronunciations are annotated with the IPA characters, the numbers in which contrasting non-word and word conditions, we analyzed these denote Mandarin tones. As for the primes, Chinese characters are shown in conditions separately in order to preserve the lexicality factor brackets. As for the targets, the Chinese words are shown in brackets, together and make factors in statistic analysis consistent with the previ- with their meanings. Each non-word includes two real-syllables (shown in brack- ous one based on P2, though the interested time windows in ets), but their combination does not form a meaningful disyllabic word in Chinese these two analyses were different. The data for statistical anal- (marked by *). ysis here were all from priming conditions without subtracting control conditions. By examining the same three factors as in the to blink or move their body parts during the appearance of the previous analysis, this analysis showed three main effects, prim- fixation, and not to respond until the fixation disappeared. One ing condition [F(1, 30) = 6.564, p < 0.0157, η2p = 0.180], lexical- half of the participants responded to the “yes” key with their left ity [F(1, 30) = 7.892, p < 0.0087, η2p = 0.208], and hemisphere index finger and the “no” key with their right index finger. The [F(1, 30) = 9.193, p < 0.0050, η2p = 0.235]. Priming condition other half did the reverse. The left or right response order was interact significantly with lexicality [F(1, 30) = 5.636, p < 0.0242, randomly assigned to participants. η2p = 0.158]. More importantly, there was a significant interac- There were in total 360 trials, with 60 trials in each of the six tions between lexicality and hemisphere [F(1, 30) = 10.729, p < conditions. Each trial lasted around 5 s. Participants first went 0.0027, η2p = 0.263]. through a practice session (36 trials) to familiarize the experi- Figure 6 shows the average ERP waveforms of the P3 and P4 mental paradigm. The experiment consisted of six blocks, each electrode groups, and Figure 7 shows those of the F3 and F4 having 60 trials and lasting about 5 min. Trials were arranged in electrode groups. Figure 8 shows the topographies of the con- a random order. Participants could take a 2-min break after each trasts of the P2 component around 200 ms (200–220 ms), and block, and the whole experiment lasted approximately 40 min. Figure 9 shows those of the late ERP component around 500 ms (500–550 ms). Figure 10 shows the average amplitudes of P2, Data analysis and results and Figure 11 shows those of the late component in different As for the behavioral data, the response correctness was 96.0%. conditions. The average reaction time was 962.72 ms (SD = 159.99). Outliers The lateralization pattern of P2 could be interpreted as follows. greater than three times of standard deviation from mean were The greater the priming effect, the lower the amplitude of P2, due replaced with mean value in each participant. A marginal signif- to the repetition effect. Since there was no main effect of prim- icant priming effect in the word, consonant priming condition ing condition, the priming effects of consonants and tones were was observed [t(30) = −1.993, p < 0.055], while the tone prim- not much different from each other. However, the left and right ing conditions showed interference effects [as for the word, tone hemispheres showed different trends of these priming effects, priming condition, t(30) = 1.822, p = 0.078; as for the non-word, due to the significant interaction between hemisphere and prim- tone priming condition, t(30) = 2.524, p < 0.017]. ing condition. By examining the amplitude difference between As for the ERP data, considering both of the P2 topogra- the priming effects of consonants and tones (see Figure 8A) phy (Luck, 2005) and the brain regions for tone priming (Wong and those of words and non-words (see Figure 8B), we found a et al., 2008), we averaged four homolog pairs of adjacent pos- stronger priming effect of consonants than tones was shown in terior electrodes including P3 and P4 [electrodes 53 (P3), 61, the left hemisphere compared to the right hemisphere around 54, 38 in the left hemisphere, and 87 (P4), 79, 80, 88 in the centro-parietal region, and a greater left hemisphere advantage in right hemisphere, according to the EGI system] for analysis. We processing words compared to non-words. These showed that the calculated the priming effects of tones or consonants by sub- left hemisphere responded significantly differently in the conso- tracting the ERP waveforms in the word or non-word control nant priming and tone priming conditions, as well as word and conditions from those in the word or non-word experimen- non-word conditions. In line with Experiment 1, these results tal conditions. Similar to Experiment 1, based on a Three-Way illustrated that both bottom-up and top-down processing took repeated-measures ANOVA, with lexicality (word vs. non-word), place around 200 ms. The left hemisphere responded to the fast priming condition (consonant vs. tone), and hemisphere (left changing acoustic cues greater than the slow changing acous- vs. right) as three factors, we found that the average P2 ampli- tic cues, and it also responded to word semantics greater than tude in the early time window (200–220 ms, P2 peak values non-words that had no meanings. were within this time range) showed two significant interac- By comparing the word and non-word conditions, we found tions, one between lexicality and hemisphere [F(1, 30) = 5.618, that the amplitudes of frontal region at the late component p < 0.0244, η2p = 0.158], and the other between priming condi- (around 500 ms) were lower in the non-word conditions tion and hemisphere [F(1, 30) = 8.515, p < 0.0066, η2p = 0.221], compared to the word conditions (consistent with the main effect and a main effect of lexicality [F(1, 30) = 8.242, p < 0.0074, η2p = of lexicality), which was right lateralized (consistent with the Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 8 Shuai and Gong Processing in lexical tone perception FIGURE 6 | Average ERP waveforms of the P3 and P4 electrode condition; (C) Word, tone priming condition; (D) Non-word, tone groups under the six conditions of Experiment 2. (A) Word, priming condition; (E) Word, control condition; (F) Non-word, control consonant priming condition; (B) Non-word, consonant priming condition. FIGURE 7 | Average ERP waveforms of the F3 and F4 electrode condition; (C) Word, tone priming condition; (D) Non-word, tone groups under the six conditions of Experiment 2. (A) Word, priming condition; (E) Word, control condition; (F) Non-word, control consonant priming condition; (B) Non-word, consonant priming condition. interaction between lexicality and hemisphere) (see Figure 9). sentence. The average duration of these sentences from the Such a right lateralization was also shown in Experiment 3 when onset of the first syllable to the stop of the last one was comparing the tone-induced N400 with the consonant-induced 3206.1 ms (SD = 156.1). There was no significant difference N400. of these durations between different conditions [F(2, 177) = 0.697, p = 0.4996, η2p = 0.008]. The intensity of these sen- EXPERIMENT 3: SEMANTIC VIOLATION IN SENTENCES tences was adjusted to 75 dB. Table 3 shows examples of such Materials sentences. The recorded stimuli included a number of Chinese sentences. Each sentence consisted of 11 syllables, and the last two were Procedure always a verb and its object. Semantic violation was induced In the sentence comprehension task, participants were asked to by changing the tone or consonant of the last syllable of a judge whether the last syllable in a sentence was consistent with Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 9 Shuai and Gong Processing in lexical tone perception FIGURE 8 | Topographies of P2 (200–220 ms) contrasts in different conditions of Experiment 2. (A) Consonant priming effect vs. tone priming effect; (B) Non-word vs. word condition. FIGURE 11 | Average amplitudes of the late ERP component around 500–550 ms in the F3 and F4 electrode groups in Experiment 2. Table 3 | Example sentences and experimental conditions in Experiment 3. Conditions Example sentences Control (“Once getting up in the morning, he combs his hair.”) Consonant violation (“With no class in the afternoon, all boys go playing oil [football].”) FIGURE 9 | Topographies of the contrasts of the late ERP component Tone (500–550 ms) in different conditions of Experiment 2. (A) Non-word vs. violation (“After obtaining the degree, he immediately word experimental conditions; (B) Non-word vs. word control conditions. returns to his pass [motherland].”) The last syllable of each sentence is the target word, and all the previous ones are the context. In each condition, the Chinese transcription of the sentence and its meaning are shown. The pronunciations of the last two syllables are anno- tated with the IPA characters, the numbers in which denote Mandarin tones. In the violation conditions, the last syllable induces violation, and the syllable after change is still a real-syllable in Chinese. For comparison, the syllables within the square brackets are consistent with the context. violation condition, the violation was induced by changing the tone of the last syllable of the sentence. In each trial, a fixation first appeared on the screen and remained there. After 400 ms, one of the sentences was presented to participants. The fixation disappeared at 1000 ms after the onset of the last syllable in the sentence, and then, participants FIGURE 10 | Average P2 amplitudes within 200 and 220 ms in the P3 had 2000 ms for response, by pressing one of the two keys marked and P4 electrode groups in Experiment 2. “WCon” denotes the by “yes” and “no” in the keyboard. One half of the participants comparison between the word, consonant priming condition and the word, control condition, and “NCon” between the non-word, consonant priming pressed the “yes” key with their left index finger and the “no” key condition and the non-word, control condition, “WTon” between the word, with their right index finger. The other half did the reverse. The tone priming condition and the word, control condition, “NTon” between left or right response order was randomly assigned to participants. the non-word, tone priming condition and the non-word, control condition. There were in total 180 testing sentences, with 60 sentences in each condition. We also added ten filler sentences, each having the same length as the testing sentences, no semantic violation, and a the context or not. Since the violation appeared toward the end of free structure. The purpose of incorporating filler sentences was the sentence, participants were encouraged not to blink or move to make the yes and no responses have equal chances. The average their body parts toward the end of the sentences. length of each trial was 6606.1 ms (SD = 156.1). Participants first There were three experimental conditions (see Table 3): in the went through a practice session (30 trials) to familiarize the exper- control condition, there was no semantic violation; in the conso- imental paradigm. The experiment consisted of six blocks, each nant violation condition, the violation was induced by changing containing 40 sentences. These sentences included ten randomly the initial of the last syllable of the sentence; and in the tone chosen sentences from each of the three conditions, and ten filler Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 10 Shuai and Gong Processing in lexical tone perception FIGURE 12 | Average ERP waveforms of the F3 and F4 electrode groups in Experiment 3. (A) C3 electrode group; (B) C4 electrode group. sentences. The order of these sentences was randomized. Each GENERAL DISCUSSIONS block lasted about 4 min. Participants could take a 2-min break To sum up, in Experiment 1 and Experiment 2, we discovered between blocks. The whole experiment lasted about 30 min. both top-down (semantic) and bottom-up (acoustic) processing in the early time window around 200 ms. In the late stage around Data analysis and results 300–500 ms, we only found the top-down effect in Experiment As for the behavioral data, the response correctness was 94.3%. 2, probably because that the phonological primes were presented The averaged reaction time was 854.13 ms (SD = 186.61). too early and the bottom-up effect could not last long enough. Outliers greater than three times of standard deviation from mean However, in Experiment 3, the bottom-up effect was reflected were replaced by the mean value in each participant. by N400 in the late stage. As indicated by the late component As for the ERP data, we referred to the data recorded by in Experiment 2 and Experiment 3, we suggested that both top- the four homolog pairs of adjacent electrodes including C3 and down and bottom-up processing existed at the late stage. The C4 [electrodes 37 (C3), 38, 42, 43 in the left hemisphere, and N400 component had a shorter latency in Experiment 3 than that 105 (C4), 88, 104, 94 in the right hemisphere, according to the in Experiment 2 because of the context effect, and the topography EGI system] for analysis. A Two-Way repeated-measures ANOVA, of the earlier N400 in Experiment 3 had a more central distribu- with violation type (consonant vs. tone) and hemisphere (left tion compared to the late frontal N400 in Experiment 2, which is vs. right) as two factors, revealed a main effect of violation consistent to the description of N400 in time and spatial domains type [F(1, 28) = 9.622, p < 0.0044, η2p = 0.256], and a significant (Kutas and Federmeier, 2011). interaction between hemisphere and violation type [F(1, 28) = 9.573, p < 0.0044, η2p = 0.255]. A post-hoc T-test revealed a sig- RELATION BETWEEN TOP-DOWN AND BOTTOM-UP PROCESSING nificance of the right lateralized N400 [t(28) = 2.164, p < 0.0391] DURING LEXICAL TONE PERCEPTION in the tone violation condition, and no significant lateralization in During lexical tone perception, the prior knowledge formed by the consonant violation condition. Similar to Experiment 2, this language experience helps match a large variety of pitch con- analysis considered the control conditions, by subtracting the ERP tours onto clear tonal categories, and the semantic representa- waveforms in them from those in the experimental conditions. tion requires combining tonal categories with carrying syllables. Figure 12 shows the average ERP waveforms of the C3 and Therefore, the prior knowledge of tonal categories and the lex- C4 electrode groups, Figure 13 shows the topographies and dif- ical semantics of linking tonal categories with carrying syllables ferences of auditory N400, and Figure 14 shows the average become the primary top-down factors during lexical tone percep- amplitudes of N400 (300–350 ms) in different conditions. tion. Since semantic and categorical information of phonemes is The significant interaction between hemisphere and violation processed dominantly in the left hemisphere (McDermotta et al., type reflected bottom-up processing. Noticeably, there was a right 2003; Liebentral et al., 2005), a general left lateralization pat- lateralization in the difference between the N400 inducted by tone tern during lexical tone perception reflects top-down processing. violation and that induced by consonant violation, which sup- Similarly, the primary acoustic cue of lexical tone is pitch varia- ported that bottom-up (acoustic) processing also existed in the tion, and processing of pitch variations is bottom-up. Since it is late stage of perception. widely accepted that the right hemisphere is dominant for pitch Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 11 Shuai and Gong Processing in lexical tone perception FIGURE 14 | Average N400 (300–350 ms) amplitudes of the C3 and C4 electrode groups in Experiment 3. “Consonant” and “tone” denote the consonant and tone violation conditions. in perceiving lexical tones compared to segments (Gandour et al., 2003a). Apart from perception, differences between tone and seg- ment processing were also found in lexical tone production (Liu FIGURE 13 | Topographies of: (A) the N400 induced by consonant et al., 2009). However, all these explorations did not disentan- violation; (B) the N400 induced by tone violation; (C) the N400 induced by summing up consonant and tone violation; and (D) the difference gle language experience as top-down factors and acoustic cues between the N400 in (B) and (A) in Experiment 3. as bottom-up factors. Treating processing of pitch information and semantic role as a cohort processing makes it difficult to fig- ure out the cognitive processes during tone processing (Zatorre processing (Sidtis, 1981; Tenke et al., 1993), a relative right lat- and Gandour, 2008), since lexical tone processing involves both eralization pattern during lexical tone perception also reflects acoustic and linguistic factors. bottom-up processing. In our study, we regard pitch processing as bottom-up pro- By tracing the auditory ERP components in the early and cessing in lexical tone perception, since it concerns acoustic cues, late time windows, our experiments explore the general and and semantic processing as top-down processing, since it involves relative lateralization patterns in conditions with or without lex- language experience. In this way, we separate these two cognitive ical semantics and slow or fast changing acoustic cues. Though functions called for tone perception. By exploring the temporal involving distinct tasks, these experiments reveal two consistent relation between bottom-up and top-down processing in the time lateralization patterns of early (P2) and late (N400) ERP com- windows around 200 ms and around 300–500 ms after stimulus ponents: (a) manipulation of linguistic information in words onset, we confirm that both types of processing participate in tone modulates lateralization: meaningful words tend to generate a perception during these early and late time periods. greater left lateralization; and (b) manipulation of physical prop- Apart from these findings, there was also evidence showing a erty of auditory input modulates lateralization: faster changing greater left lateralization of contour tones than level tones as well cues generate a greater left lateralization. These two patterns as a general left lateralization of Cantonese lexical tone perception reflect top-down (lexical semantic) and bottom-up (acoustic in the N1 component around 100 ms after stimulus onset (Ho, phonetic) processing, respectively. Since lexical tone perception 2010; Shuai et al., in press). The result of Cantonese perception concerns both acoustic properties and lexical semantics, the reflected a bottom-up acoustic effect at around 100 ms, whereas observed lateralization patterns are never a simple dichotomy the general left lateralization was consistent with the lateralization of purely left or right lateralization, as observed in the previ- of top-down effect as observed in our experiments of Mandarin ous studies focusing only on one aspect of lexical tone per- tone perception. ception. The modulation effects on lateralization at both the Based on the findings in our experiments and those previous early and late time windows suggest that both top-down and studies, we propose a detailed, three-stage, parallel model of lex- bottom-up processing exist at different stages of perception, ical tone processing. The three stages are defined based on the which support a parallel model of top-down and bottom-up occurrences of different ERP components in our and previous processing. experiments (e.g., N1, around 100 ms after stimulus onset; P2, around 100–300 ms; and N400, after 300 ms; among these com- THREE-STAGE, PARALLEL LEXICAL TONE PROCESSING MODEL ponents, N1 and P2 belong to the early stage and N400 belongs to Previous explorations revealed that lexical tone differed from the late stage). segmental cues (Ye and Connie, 1999; Lee, 2007), and that later- At the first stage (before and around 100 ms after stimulus alization of lexical tone processing differed from that of segment onset, as in Ho, 2010 and Shuai et al., in press), syllable initials processing (Li et al., 2010). Neuroimaging studies of lexical tone are processed to provide the basic structure of the syllable. At this processing also revealed that separate brain regions were involved stage, if the syllable starts with a vowel or a sonorant consonant, Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 12 Shuai and Gong Processing in lexical tone perception pitch information is available; if it starts with a voiceless conso- example, in the dorsal-ventral pathway hypothesis of speech nant, there is no pitch information. In either case, tonal category perception (Hickok and Poeppel, 2007), phoneme percep- is not formed yet, since the recognition of pitch variation or pat- tion was regarded as involving only the left hemisphere, tern, as slow-varying acoustic cues, requires a time window longer since such generalization did not involve tonal languages. than 100 ms. At this stage, top-down linguistic processing also Considering that 60–70% of world languages are tonal languages occurs, no matter whether there is contextual information before (Yip, 2002), a speech perception model leaving out tones is the syllable initial. With contextual information, top-down pro- incomplete. cessing would become stronger, though it may play distinct roles from bottom-up processing at this stage. TOP-DOWN PROCESSING AT THE PREATTENTIVE STAGE At the second stage (100–300 ms after stimulus onset, as in our Humans often predict incoming signals based on experience. experimental conditions), predictions about pitch patterns and Therefore, top-down processing could accompany the whole pro- following segmental information are made, based on the infor- cesses of speech perception. In addition, information generated mation gathered at the first stage and prior language experience. by bottom-up processing is also used to match the prediction At this stage, semantic information at a gross level is also activated coming from top-down processing. In this sense, top-down pro- via top-down processing, which is initiated based on the infor- cessing is a pre-determined process, making the relevant hemi- mation gathered at the first stage. Since lexical tone is not fully sphere or brain regions get prepared for the forthcoming task. recognized, bottom-up processing is still ongoing. According to When stimuli come in, they will evoke responses from the corre- previous literature (Kaan et al., 2007, 2008), pitch variation in the sponding hemisphere or brain regions, and these responses may middle proportion of a syllable is the most important for recog- adjust or even alter the degrees of lateralization. A similar effect nizing contour tones for native tonal language speakers. At this is shown in the attention or memory modulation of lateraliza- stage, listeners keep integrating the incoming pitch information tion in DL tasks (Hugdahl, 2005; Saetrevik and Hugdahl, 2007). with the information gathered at the first stage in order to recog- For example, by asking participants to attend to stimuli in either nize the tone and the whole syllable. In this sense, both top-down left or right ears (Hugdahl, 2005), the degree of lateralization is prediction of tonal categories and bottom-up generalization of adjusted in favor of the side attended. tonal categories are taking places at this stage. Even though long-term language experience keeps affecting At the third stage (after 300 ms, as in our experimental condi- automatic processing at the preattentive stage, it is generally hard tions), top-down processing becomes more prominent, helping to observe online top-down processing at this stage. It is only listeners recognize the tone, the whole syllable, and its mean- until recently that the automatic top-down processing has gained ing, based on prior or previous language experience and the researchers’ attention (Kherif et al., 2011; Wager et al., 2013). information gathered at the first two stages. Detailed semantic Considering that the automatic preattentive processing discov- information is recognized at this stage. However, bottom-up pro- ered via the MMN paradigm can be induced by either acoustic cessing keeps taking effect, helping to confirm the recognized tone properties or long-term language experience, the MMN com- and syllable. ponent is able to reflect not only bottom-up processing, but Compatible with our findings in the series of experiments also top-down processing relevant for semantics and syntax at involving various levels of language processing, this parallel the preattentive stage (Pulvermüller, 2001; Pulvermüller et al., model of lexical tone perception can shed important lights on 2001a,b; Pulvermüller and Shtyrov, 2006; Penolazzi et al., 2007; general speech perception models in many aspects. Shtyrov and Pulvermüller, 2007; Gu et al., 2012). Top-down First, this parallel model, as a cognitive model, proposes that processing was also found as early as around 200 ms after stim- top-down processing is available in both the early and the later ulus onset with attention (Bonte et al., 2006). However, there stages of lexical tone perception, especially when contextual infor- was an MMN study (Luo et al., 2006) of tone perception only mation is available. found a general right lateralization of tone perception. Two fac- Second, this parallel model refutes the claim that semantic tors may cause this. First, many of these experiments only concern processing (top-down) occurs always after acoustic processing acoustic processing, i.e., bottom-up processing at the preattentive (bottom-up), as in Friederici’s general auditory processing model stage (e.g., Luo et al., 2006). Second, without recruiting linguistic and Luo et al.’s serial model. The influence of previous language factors, top-down processing that is dominant primarily in the experience always exists during speech perception, especially in left hemisphere and in the case of semantics processing would the case of auditory sentence processing. There are various types have the least influence at the preattentive stage (e.g., Xi et al., of cues and ample information that can serve as context for 2010). perceiving incoming syllables, and the neural system in humans In our study, we consider both semantic roles that require always makes predictions. Even in the case of single syllable per- linguistic top-down processing and pitch variations that require ception, if the task is linguistic relevant, top-down processing acoustic bottom-up processing in active, language-relevant tasks, based on language experience is inevitable. As shown in our which distinguish our experiments from those MMN experi- experiences, the greater left lateralization of P2 and N400 under ments. Via a series of tasks that involve different levels of language the semantic conditions explicitly reflects such language-relevant, processing, we explicitly address top-down processing at both the top-down processing. early and late stages of lexical tone perception, and gather consis- Third, the bilateral lexical tone processing also comple- tent evidence of the co-occurrence of top-down and bottom-up ments to the neuroimaging models of speech perception. For processing at those stages. Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 13 Shuai and Gong Processing in lexical tone perception LANGUAGE PROCESSING AND GENERAL COGNITIVE the early (around 200 ms) and late (around 300–500 ms) time FUNCTIONS windows of processing. Based on these findings, we proposed a Separating lexical tone perception into bottom-up and top-down parallel lexical tone processing model that entailed both types processing not only decomposes this language-specific function of processing throughout various processing stages. This exper- into general cognitive functions like sensory or memory, but imental study discussed not only the temporal relation between also reveals that speech processing share similar mechanisms bottom-up and top-down processing during tone perception, but with other cognitive functions. For example, there are bottom-up also the similarities between language processing and other cog- attention (automatic attention shift to an unexpected event, with- nitive functions, the latter of which pointed out an important out requiring any sort of executive processing nor involving any direction in future research of language processing and general active engagement beforehand) and task-related top-down atten- cognition. tion (Connor et al., 2004; Buschman and Miller, 2007; Pinto et al., 2013), both of which take part in information processing. ACKNOWLEDGMENTS On the one hand, although previous work on speech per- This work is supported in part by the Seed Fund for Basic ception focuses mainly on the left hemisphere, there are ample Research of the University of Hong Kong. The preliminary findings arguing against the existence of a centralized “core” in results in this paper were first presented in the 3rd International the left hemisphere dedicated exclusively to language process- Symposium on Tonal Aspects of Languages (TAL 2012). We thank ing. For example, the language function of intonation shows a William S.-Y. Wang for his generous support on this work. We are right hemisphere advantage (Gandour et al., 2003b). Following also grateful to the three anonymous reviewers for their useful a decompositional view, such right hemisphere advantage can comments on this work. be ascribed to consistent right hemisphere advantages of gen- eral cognitive components, including perception of slow-varying REFERENCES Ackermann, H., Lutzenberger, W., and Hertrich, I. (1999). Hemispheric later- cues and emotions. Although lexical tone perception is special in alization of the neural encoding of temporal speech features: a whole-head the sense that it involves advantageous components in both the magnetencephalography study. Brain Res. Cogn. Brain Res. 7, 511–518. doi: left hemisphere (semantic processing) and the right hemisphere 10.1016/S0926-6410(98)00054-8 (pitch processing), we can apply the same view to it. Similarly, this Anderson, M. L. (2010). Neural reuse: a fundamental organizational principle of decompositional view can also be extended to aspects of seman- the brain. Behav. Brain Sci. 33, 245–313 doi: 10.1017/S0140525X10000853 Arbib, M. (2012). How the Brain Got Language: The Mirror System Hypothesis. tics and syntax. Rather than arguing that there is no brain region Oxford: Oxford University Press. that is specific for language processing, what the decompositional Baudoin-Chial, S. (1986). Hemispheric lateralization of modern standard view emphasizes is that language must be supported by many Chinese tone processing. J. Neurolinguist. 2, 189–199. doi: 10.1016/S0911- general functions and share or recruit similar computational 6044(86)80012-4 resources as those general functions. Boemio, A., Fromm, S., Braun, A., and Poeppel, D. (2005). Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci. 8, On the other hand, it is not uncommon to conceptualize com- 389–395. doi: 10.1038/nn1409 plex cognitive functions like language as a combination of gen- Boersma, P., and Weenink, D. (2013). Praat: doing phonetics by computer eral functions in terms of cognitive models and neural circuitry [Computer program]. Version 5.3.51. Available online at: http://www.praat.org (Dehaene and Cohen, 2007; Hurley, 2007; Anderson, 2010). Take Bonte, M., and Blomert, L. (2004). Developmental changes in ERP correlates of spoken word recognition during early school years: a phonological priming the example of attention, there is a heat debate on whether our study. Clin. Neurophysiol. 115, 409–423. doi: 10.1016/S1388-2457(03)00361-4 attention is drawn voluntarily by top-down, task-dependent fac- Bonte, M., Parviainen, T., Hytonen, K., and Salmelin, R. (2006). Time course tor or involuntarily by bottom-up, saliency factor (Theeuwes, of top-down and bottom-up influences on syllable processing in the auditory 1991; Buschman and Miller, 2007). Similarly, language process- cortex. Cereb. Cortex 16, 115–123. doi: 10.1093/cercor/bhi091 ing also involves domain-general functions (Yip, 2002; Hurford, Buschman, T. J., and Miller, E. K. (2007). Top-down versus bottom-up con- trol of attention in the prefrontal and posterior parietal cortices. Science 315, 2007; Fitch, 2010; Arbib, 2012). Although examining top-down 1860–1862. doi: 10.1126/science.1138071 and bottom-up mechanisms in cognitive functions is already pre- Connor, C. E., Egeth, H. E., and Yantis, S. (2004). Visual attention: bottom-up vailing, such a separation has not been commonly practiced in versus top-down. Curr. Biol. 14, R850–R852. doi: 10.1016/j.cub.2004.09.041 previous language processing literature. Curran, T., Tucker, D. M., Kutas, M., and Posner, M. I. (1993). Topography Noting these, unlike the previous research that puts too much of the N400: brain electrical activity reflecting semantics expectancy. Electroencephalogr. Clin. Neurophysiol. 88, 188–209. doi: 10.1016/0168- emphasis on discovering domain-specific cognitive or neural 5597(93)90004-9 mechanisms for language processing, we advocate that decom- Dehaene, S., and Cohen, L. (2007). Cultural recycling of cortical maps. Neuron 56, posing language functions into more basic components (Fitch, 384–398. doi: 10.1016/j.neuron.2007.10.004 2010) and locating the neural networks that systematically mar- Diehl, R. L., Lotto, A. J., and Holt, L. L. (2004). Speech perception. Annu. Rev. shal these functions (Sporns, 2011) can lead to rigorous views Psychol. 55, 149–179. doi: 10.1146/annurev.psych.55.090902.142028 Dumay, N., Benraiss, A., Barriol, B., Colin, C., Radeau, M., and Besson, M. about the essential commonalities between language and other (2001). Behavioral and electrophysiological study of phonological prim- cognitive functions. ing between bisyllabic spoken words. J. Cogn. Neurosci. 13, 121–143. doi: 10.1162/089892901564117 CONCLUSION Eichele, T., Nordby, H., Rimol, L. M., and Hugdahl, K. (2005). Asymmetry of evoked potential latency to speech sounds predicts the ear advantage in dichotic In this paper, we reported three ERP experiments that collec- listening. Cogn. Brain Res. 23, 405–412. doi: 10.1016/j.cogbrainres.2005.02.017 tively illustrated that both bottom-up processing and top-down Fitch, W. T. (2010). The Evolution of Language. Cambridge: Cambridge University processing during lexical tone perception co-occurred in both Press. Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 14 Shuai and Gong Processing in lexical tone perception Fowler, C. A. (1986). An event approach to the study of speech perception from a Kutas, M., and Hillyard, S. A. (1980). Reading senseless sentences: brain poten- direct-realist perspective. J. Phonetics 14, 3–28. tials reflect semantic incongruity. Science 207, 203–208. doi: 10.1126/sci- Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. ence.7350657 Trends Cogn. Sci. 6, 78–84. doi: 10.1016/S1364-6613(00)01839-8 Lau, E. F., Phillips, C., and Peoppel, D. (2008). A cortical network for Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., et al. semantics: (de)constructing the N400. Nat. Rev. Neurosci. 9, 920–933. doi: (2003b). Temporal integration of speech prosody is shaped by language expe- 10.1038/nrn2532 rience: an fMRI study. Brain Lang. 84, 318–336. doi: 10.1016/S0093-934X Lau, H. C., and Passingham, R. E. (2007). Unconscious activation of the cognitive (02)00505-9 control system in the human prefrontal cortex. J. Neurosci. 27, 5805–5811. doi: Gandour, J., Tong, Y., Wong, D., Talavage, T., Dzemidzic, M., Xu, Y., et al. 10.1523/JNEUROSCI.4335-06.2007 (2004). Hemispheric roles in the perception of speech prosody. Neuroimage 23, Lee, C.-Y. (2007). Does horse activate mother? Processing lexical tone in form 344–357. doi: 10.1016/j.neuroimage.2004.06.004 priming. Lang. Speech 50, 101–123. doi: 10.1177/00238309070500010501 Gandour, J., Wong, D., and Hutchins, G. (1998). Pitch processing in the human Li, X., Gandour, J. T., Talavage, T., Wong, D., Hoffa, A., Lowe, M., et al. (2010). brain is influenced by language experience. Neuroreport 9, 2115–2119. doi: Hemispheric asymmetries in phonological processing of tones vs. segmental 10.1097/00001756-199806220-00038 units. Neuroreport 21, 690–694. doi: 10.1097/WNR.0b013e32833b0a10 Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N., Tong, Liberman, A. M., and Mattingly, I. G. (1985). The motor theory of speech Y., et al. (2002). A crosslinguistic fMRI study of spectral and temporal cues perception revised. Cognition 21, 1–36. doi: 10.1016/0010-0277(85)90021-6 underlying phonological processing. J. Cogn. Neurosci. 14, 1076–1087. doi: Liebenthal, E., Desai, R., Ellingson, M. M., Ramachandran, B., Desai, A., and 10.1162/089892902320474526 Binder, J. R. (2010). Specialization along the left superior temporal sulcus Gandour, J., Xu, Y., Wong, D., Dzemidzic, M., Lowe, M., Li, X., et al. (2003a). Neural for auditory categorization. Cereb. Cortex 20, 2958–2970. doi: 10.1093/cer- correlates of segmental and tonal information in speech perception. Hum. Brain cor/bhq045 Mapp. 20, 185–200. doi: 10.1002/hbm.10137 Liebentral, E., Binder, J. R., Spitzer, S. M., Possing, E. T., and Medler, D. A. (2005). Goldstein, B. E. (2009). Sensation and Perception, 8th Edn. Pacific Grove, CA: Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631. doi: Wadsworth Inc. 10.1093/cercor/bhi040 Gu, F., Li, F., Wang, X., Hou, Q., Huang, Y., and Chen, L. (2012). Memory Liu, L., Deng, X., Peng, D., Cao, F., Ding, G., Jin, Z., et al. (2009). Modality- traces for tonal language words revealed by auditory event-related potentials. and task-specific brain regions involved in Chinese lexical processing. J. Cogn. Psychophysiology 49, 1353–1360. doi: 10.1111/j.1469-8986.2012.01447.x Neurosci. 21, 1473–1487. doi: 10.1162/jocn.2009.21141 Hickok, G., and Poeppel, D. (2007). The cortical organization of speech processing. Liu, L., Peng, D., Ding, G., Jin, Z., Zhang, L., Li, K., et al. (2006). Dissociation in Nat. Rev. Neurosci. 8, 393–402. doi: 10.1038/nrn2113 the neural basis underlying Chinese tone and vowel production. Neuroimage 29, Ho, J. P.-K. (2010). An ERP Study on the Effect of Tone Features on Lexical Tone 515–523. doi: 10.1016/j.neuroimage.2005.07.046 Lateralization in Cantonese. Master Thesis, The Chinese University of Hong Luck, S. (2005). An Introduction to the Event-Related Potential Technique. Kong. Cambridge, MA: MIT Press. Holcomb, P. J., and Anderson, J. E. (1993). Cross-modal semantic priming: a Luo, H., Ni, J.-T., Li, Z.-H., Li, X.-O., Zhang, D.-R., Zeng, F.-G., et al. (2006). time-course analysis using event-related brain potentials. Lang. Cogn. Proc. 8, Opposite patterns of hemisphere dominance for early auditory processing of 379–411. doi: 10.1080/01690969308407583 lexical tones and consonants. Proc. Natl. Acad. Sci. U.S.A. 109, 19558–19563. Hsieh, L., Gandour, J., Wong, D., and Hutchins, G. (2001). Functional heterogene- doi: 10.1073/pnas.0607065104 ity of inferior frontal gyrus is shaped by linguistic experience. Brain Lang. 76, Maess, B., Herrmann, C. S., Hahne, A., Nakamura, A., and Friederici, A. D. (2006). 227–252. doi: 10.1006/brln.2000.2382 Localizing the distributed language network responsible for the N400 measured Hugdahl, K. (2005). Symmetry and asymmetry in the human brain. Eur. Rev. 13, by MEG during auditory sentence processing. Brain Res. 1096, 163–172. doi: 119–133. doi: 10.1017/S1062798705000700 10.1016/j.brainres.2006.04.037 Hulse, S. H., Cynx, J., and Humpal, J. (1984). Absolute and relative pitch discrimi- McClelland, J. L., and Elman, J. L. (1986). The TRACE model of speech perception. nation in serial pitch perception by birds. J. Exp. Psychol. Gen. 113, 38–54. doi: Cogn. Psychol. 18, 1–86. doi: 10.1016/0010-0285(86)90015-0 10.1037/0096-3445.113.1.38 McDermotta, K. B., Petersen, S. E., Watson, J. M., and Ojemanna, J. G. (2003). A Hurford, J. R. (2007). The Origin of Meaning. Oxford: Oxford University Press. procedure for identifying regions preferentially activated by attention to seman- Hurley, S. (2007). The shared circuits model: how control, mirroring and simula- tic and phonological relations using functional magnetic resonance imaging. tion can enable imitation, deliberation, and mindreading. Behav. Brain Sci. 31, Neuropsychologia 41, 293–303. doi: 10.1016/S0028-3932(02)00162-8 1–22. doi: 10.1017/S0140525X07003123 Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh Izumi, A. (2001). Relative pitch perception in Japanese monkeys (Macaca fuscata). inventory. Neuropsychologia 9, 97–113. doi: 10.1016/0028-3932(71)90067-4 J. Comp. Psychol. 115, 127–131. doi: 10.1037/0735-7036.115.2.127 Penolazzi, B., Hauk, O., and Pulvermüller, F. (2007). Early semantic context inte- Jamison, H. L., Watkins, K. E., Bishop, D. V., and Matthews, P. M. (2006). gration and lexical access as revealed by event-related potentials. Biol. Psychol. Hemispheric specialization for processing auditory nonspeech stimuli. Cereb. 74, 374–388. doi: 10.1016/j.biopsycho.2006.09.008 Cortex 16, 1266–1275. doi: 10.1093/cercor/bhj068 Pinto, Y., van der Leji, A. R., Sligte, I. G., Lamme, V. A. F., and Scholte, S. H. Jia, S. Tsang, Y.-K., Huang, J., and Chen, H.-C. (2013). Right hemisphere advan- (2013). Bottom-up and top-down attention are independent. J. Vis. 13:16. doi: tage in processing Cantonese level and contour tones: evidence from dichotic 10.1167/13.3.16 listening. Neurosci. Lett. 556, 135–139. doi: 10.1016/j.neulet.2013.10.014 Praamstra, P., and Stegeman, D. F. (1993). Phonological effects on the audi- Kaan, E., Barkley, C. M., Bao, M., and Wayland, R. (2008). Thai lexical tone percep- tory N400 event-related brain potential. Cogn. Brain Res. 1, 73–86. doi: tion in native speakers of Thai, English and Mandarin Chinese: an event-related 10.1016/0926-6410(93)90013-U potentials training study. BMC Neurosci. 9:53. doi: 10.1186/1471-2202-9-53 Pulvermüller, F. (2001). Brain reflections of words and their meaning. Trends Cogn. Kaan, E., Wayland, R., Bao, M., and Barkley, C. M. (2007). Effects of native language Sci. 5, 517–524. doi: 10.1016/S1364-6613(00)01803-9 and training on lexical tone perception: an ERP study. Brain Res. 1148, 113–122. Pulvermüller, F., Assadollahi, R., and Elbert, T. (2001a). Neuromagnetic evidence doi: 10.1016/j.brainres.2007.02.019 for early semantic access in word recognition. Eur. J. Neurosci. 13, 201–205. doi: Kherif, F., Josse, G., and Price, C. J. (2011). Automatic top-down processing 10.1046/j.0953-816X.2000.01380.x explains common left occipito-temporal responses to visual words and objects. Pulvermüller, F., Kujala, T., Shtyrov, Y., Simola, J., Tiitinen, H., Alku, P., et al. Cereb. Cortex 21, 103–114. doi: 10.1093/cercor/bhq063 (2001b). Memory traces for words as revealed by the mismatch negativity Krishnan, A., Gandour, J., Ananthakrishnan, S., Bidelman, G., and Smalt, C. (MMN). Neuroimage 14, 607–616. doi: 10.1006/nimg.2001.0864 (2011). Functional ear (a)symmetry in brainstem neural activity relevant to Pulvermüller, F., and Shtyrov, Y. (2006). Language outside the focus of attention: encoding of voice pitch: a precursor for hemispheric specialization? Brain Lang. the mismatch negativity as a tool for studying higher cognitive processes. Prog. 119, 226–231. doi: 10.1016/j.bandl.2011.05.001 Neurobiol. 79, 49–71. doi: 10.1016/j.pneurobio.2006.04.004 Kutas, M., and Federmeier, K. D. (2011). Thirty years and counting: finding mean- Saetrevik, B., and Hugdahl, K. (2007). Priming inhibits the right ear advantage ing in the N400 component of the event-related brain potential (ERP). Annu. in dichotic listening: implications for auditory laterality. Neuropsychologia 45, Rev. Psychol. 62, 621–647. doi: 10.1146/annurev.psych.093008.131123 282–287. doi: 10.1016/j.neuropsychologia.2006.07.005 Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 15 Shuai and Gong Processing in lexical tone perception Schapkin, S. A., Gusev, A. N., and Kuhl, J. (2000). Categorization of unilaterally Wang, Y., Jongman, A., and Sereno, J. A. (2001). Dichotic perception of Mandarin presented emotional words: an ERP analysis. Acta Neurobiol. Exp. 60, 17–28. tones by Chinese and American listeners. Brain Lang. 78, 332–348. doi: Available online at: http://www.ane.pl/pdf/6003.pdf 10.1006/brln.2001.2474 Schirmer, A., Tang, S. L., Penney, T. B., Gunter, C. T., and Chen, H. C. (2005). Brain Wioland, N., Rudolf, G., Metz-Lutz, M. N., Mutschler, V., and Marescaux, C. responses to segmentally and tonally induced semantic violations in Cantonese. (1999). Cerebral correlates of hemispheric lateralization during a pitch dis- J. Cogn. Neurosci. 17, 1–12. doi: 10.1162/0898929052880057 crimination task: an ERP study in dichotic situation. Clin. Neurophysiol. 110, Shtyrov, Y., and Pulvermüller, F. (2007). Early MEG activation dynamics in the 516–523. doi: 10.1016/S1388-2457(98)00051-0 left temporal and inferior frontal cortex reflect semantic context integration. Wong, P. C. M., Warrior, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, J. Cogn. Neurosci. 19, 1633–1642. doi: 10.1162/jocn.2007.19.10.1633 T. B., et al. (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Shuai, L., Gong, T., Ho, J. P.-K., and Wang, W. S.-Y. (in press). “Hemispheric lat- Cereb. Cortex 18, 828–836. doi: 10.1093/cercor/bhm115 eralization of perceiving Cantonese contour and level tones: an ERP study,” in Xi, J., Zhang, L., Shu, H., Zhang, Y., and Li, P. (2010). Categorical perception of Studies on Tonal Aspect of Languages (Journal of Chinese Linguistics Monograph lexical tones in Chinese revealed by mismatch negativity (MMN). Neuroscience Series, No. 25), ed W. Gu (Hong Kong: Journal of Chinese Linguistics). 170, 223–231. doi: 10.1016/j.neuroscience.2010.06.077 Sidtis, J. J. (1981). The complex tone test: implications for the assessment of Ye, Y., and Connie, C. M. (1999). Processing spoken Chinese: the role of tone auditory laterality effects. Neuropsychologia 19, 103–112. doi: 10.1016/0028- information. Lang. Cogn. Proc. 14, 609–630. doi: 10.1080/016909699386202 3932(81)90050-6 Yin, P., Fritz, J. B., and Shamma, S. A. (2010). Do ferrets perceive relative pitch? Sporns, O. (2011). The Networks of the Brain. Cambridge, MA: MIT Press. J. Acoust. Soc. Am. 127, 1673–1680. doi: 10.1121/1.3290988 Stevens, K. N. (2002). Toward a model of lexical access based on acoustic Yip, M. (2002). Tone. Cambridge: Cambridge University Press. doi: 10.1017/ landmarks and distinctive features. J. Acoust. Soc. Am. 111, 1872–1891. doi: CBO9781139164559.006 10.1121/1.1458026 Zatorre, R., and Gandour, J. T. (2008). Neural specializations for speech and pitch: Tachibana, H., Minamoto, H., Takeda, M., and Sugita, M. (2002). Topographical moving beyond the dichotomies. Philos. Trans. R. Soc. Lond. B Biol. Sci. 363, distribution of the auditory N400 component during lexical decision and 1087–1104. doi: 10.1098/rstb.2007.2161 recognition memory tasks. Int. Congr. Ser. 1232, 197–202. doi: 10.1016/S0531- Zatorre, R. J., and Belin, P. (2001). Spectral and temporal processing in human 5131(01)00818-4 auditory cortex. Cereb. Cortex 11, 946–953. doi: 10.1093/cercor/11.10.946 Tenke, C. E., Bruder, G. E., Towey, J. P., Leite, P., and Sidtis, J. J. (1993). Zatorre, R. J., Belin, P., and Penhune, V. B. (2002). Structure and function Correspondence between brain ERP and behavioral asymmetries in a of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37–46. doi: dichotic complex tone test. Psychophysiology 30, 62–70. doi: 10.1111/j.1469- 10.1016/S1364-6613(00)01816-7 8986.1993.tb03205.x Tervaniemia, M., and Hugdahl, K. (2003). Lateralization of auditory-cortex func- Conflict of Interest Statement: The authors declare that the research was con- tions. Brain Res. Rev. 43, 231–246. doi: 10.1016/j.brainresrev.2003.08.004 ducted in the absence of any commercial or financial relationships that could be Theeuwes, J. (1991). Exogenous and endogenous control of attention—the construed as a potential conflict of interest. effect of visual onsets and offsets. Percept. Psychophys. 49, 83–90. doi: 10.3758/BF03211619 Received: 15 July 2013; paper pending published: 11 December 2013; accepted: 09 Tsang, Y.-K., Jia, S., Huang, J., and Chen, H.-C. (2010). ERP correlates of pre- March 2014; published online: 25 March 2014. attentive processing of Cantonese lexical tones: the effects of pitch contour and Citation: Shuai L and Gong T (2014) Temporal relation between top-down and pitch height. Neurosci. Lett. 487, 268–272. doi: 10.1016/j.neulet.2010.10.035 bottom-up processing in lexical tone perception. Front. Behav. Neurosci. 8:97. doi: Van Lancker, D., and Fromkin, V. A. (1973). Hemispheric specialization for pitch 10.3389/fnbeh.2014.00097 and “tone”: evidence from Thai. J. Phonetics 1, 101–109. This article was submitted to the journal Frontiers in Behavioral Neuroscience. Wager, E. E., Peterson, M. A., Folstein, J. R., and Scalf, P. E. (2013). Automatic top- Copyright © 2014 Shuai and Gong. This is an open-access article distributed down processes mediate selective attention. J. Vis. 13:137. doi: 10.1167/13.9.137 under the terms of the Creative Commons Attribution License (CC BY). The Wang, W. S.-Y. (1967). Phonological features of tone. Int. J. Am. Linguist. 33, use, distribution or reproduction in other forums is permitted, provided the 93–105. doi: 10.1086/464946 original author(s) or licensor are credited and that the original publication in Wang, Y., Behne, D., Jongman, A., and Sereno, J. (2004). The role of linguistic expe- this journal is cited, in accordance with accepted academic practice. No use, rience in the hemispheric processing of lexical tone. Appl. Psycholinguist. 25, distribution or reproduction is permitted which does not comply with these 449–466. doi: 10.1017/S0142716404001213 terms. Frontiers in Behavioral Neuroscience www.frontiersin.org March 2014 | Volume 8 | Article 97 | 16