Language Teaching (2019), 52, 188–200 doi:10.1017/S0261444819000028 RESEARCH TIMELINE Lexical coverage and profiling Ulugbek Nurmukhamedov1 and Stuart Webb2 1 Northeastern Illinois University, USA and 2The University of Western Ontario, Canada Corresponding author. Email:

[email protected]

Introduction Technological innovations in computer software and advancements in corpus linguistics have made it easier to analyse the vocabulary in a text. Corpus-driven analysis of multi-million-word corpora has enabled researchers to create word frequency lists and these have been used together with software to profile the distribution of words at different frequency levels within texts. This analysis of vocabu- lary provides an indication of the difficulty level of a text, the potential for vocabulary learning through reading or listening to the text, as well as the vocabulary knowledge required to understand different types of spoken and written discourse. This area of research, known as LEXICAL COVERAGE or LEXICAL PROFILING, tends to fall into a number of different categories. The following themes highlight some of the major areas of lexical coverage research. A. How does lexical coverage affect comprehension? B. How many words do you need to know to understand written or spoken discourse? C. To what extent do different text types provide opportunities for incidental vocabulary learning through reading and listening? D. Which tools and methods can be used to investigate lexical coverage? Studies in Theme A are experimental in nature and findings are typically based on the percentage of words that learners know in reading or listening passages and their comprehension scores for that passage. This area investigates the amount of lexical coverage needed to reach ‘adequate’ discourse comprehension. It is important to note that ‘adequate comprehension’ is not a clearly defined con- struct and what is considered adequate in one study might be quite different in another. This line of research indicates that 95% to 98% coverage is needed to establish adequate comprehension of dif- ferent types of discourse. For reading comprehension, minimal (95%) and optimal (98%) thresholds have been suggested (Laufer & Ravenhorst-Kalovski, 2010*) while for listening comprehension, 95% lexical coverage is recommended for good comprehension (e.g. van Zeeland & Schmitt, 2013*). These findings indicate that a reasonable comprehension level may be achieved at a density of lexical coverage above 90%, suggesting that learners need to know a very high percentage of words in a text in order to understand it. Building on the findings in Theme A, the second area of lexical coverage research (Theme B) uses word frequency lists and a representative corpus of a domain to examine the number of words necessary to reach the coverage figures associated with adequate comprehension. For example, 3,000 word families plus proper nouns (e.g. ‘Harry’, ‘London’) and interjections (e.g. ‘oh’, ‘hey’), are sufficient to reach 95% coverage of movies (Webb & Rodgers, 2009a*), TV programs (Webb & Rodgers, 2009b*), and conversa- tion (Nation, 2006*). Findings from this line of research give an indication of the coverage of high-, mid-, and low-frequency words, as well as academic vocabulary in different types of discourse. This is particu- larly useful in distinguishing the relative values of words in different types of discourse. For example, *Indicates full reference appears in the subsequent timeline. © Cambridge University Press 2019 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Language Teaching 189 West’s (1953)* General Service List (GSL) was developed to represent the most useful 2,000 words in the English language and provides around 70–90% coverage of discourse with the amount of coverage dependent on the type of text. Coxhead’s (2000)* Academic Word List (AWL) covers around 10% of aca- demic written texts and around 4% of academic spoken discourse (Dang & Webb, 2014*). Studies in Theme C examine the number of encounters with words to determine the potential for inci- dental vocabulary learning through reading, listening, or viewing the text. These studies draw on the lex- ical profiling research in Theme B by investigating the potential for learning words incidentally that are deemed to be unknown due to their word frequency level. For example, Webb and Rodgers (2009b*) looked at repetition of vocabulary that was less frequent than the 3,000 word level to determine the poten- tial to incidentally learn lower frequency words through watching television programs. Cobb (2007)* examined the potential for readers to learn the most frequent 3,000 word families through reading dif- ferent types of text. Studies investigating incidental vocabulary learning have shown that the more often words are encountered, the more likely they are to be learned (e.g. Horst, Cobb, & Meara, 1998; Rott, 1999; Waring & Takaki, 2003; Webb, 2007). By examining the extent to which text types such as graded readers, listening passages, and classroom-based teacher talk create the opportunity for incidental vocabulary acquisition, we can better understand where vocabulary learning is more likely to occur. Studies under Theme D relate to lexical profiling methodology. These include discussions of issues that should be considered when designing and evaluating studies of lexical coverage. For example, Reynolds (2013*) questions whether the unit of counting in lexical profiling studies should be the ‘word family’. Word families are made up of a headword (e.g. ‘access’), its inflections (‘access’, ‘accesses’, ‘accessing’, ‘accessed’), and derivations (‘accessible’, ‘accessibility’, ‘inaccessible’, ‘inaccess- ibility’) and are the typical unit of counting in the studies found in this timeline (e.g. Nation 2006*; Webb & Paribakht 2015*). The rationale behind using word families is that if a learner knows the headword, he or she may understand the meaning of the inflections and derivations when they are encountered in context. Currently, there is a debate in lexical profiling studies – whether to use word families or lemmas as a unit of counting. Although there are some proponents of the lemma-based approach (e.g. Gardner & Davies, 2014), there is little research investigating this issue and an answer for this question remains to be determined. Thus, further research needs to be done in this area (see Nation, 2016 for discussion*). Another issue is the vocabulary size at which proper nouns (e.g. Baltimore, Harry) should be considered to be understood by learners. Research under this theme also discusses tools (e.g. software, frequency lists) that are commonly used to perform the coverage analysis. In addition to the four themes discussed in this manuscript, we were tempted to include studies of the lexical frequency profiles of learner language use (e.g. Laufer & Nation, 1995; Meara, 2005). Although these studies do have a similar methodology to those of Theme B, their focus on productive knowledge contrasts the research in this timeline which is all focused on receptive vocabulary knowledge. Lexical frequency profiling of learner language measures the extent of lexical sophistication and thus might be better incorporated in timelines that are devoted to lexical richness. Another theme not included in this timeline is word lists (e.g. high-/low-frequency lists, academic and technical word lists). There has been ample research on word frequency lists since the early 1950s that has resulted in the creation of lists such as West’s (1953) GSL and Brezina and Gablasova’s (2015) New GSL. Similarly, there are several lists of academic vocabulary such as Coxhead’s (2000*) AWL and Gardner and Davies’ (2014) Academic Vocabulary List. In addition, there are lists of academic collocations (Ackermann & Chen 2013), phrasal expressions (Martinez & Schmitt, 2012), and lexical frames common in academic and spoken English (Gray & Biber, 2013). In this timeline, frequency lists such as the GSL (West 1953*) and Nation’s (2012*) British National Corpus (BNC) are included, together with the British National Corpus/Corpus of Contemporary American (COCA) word lists that have been used in analyses within the articles summarized in the timeline. However, there was not sufficient space to include all of the available word lists, and ideally these studies would be detailed in a separate timeline. Overall, our timeline reveals a relatively steady number of studies published on lexical coverage from the 1980s. Research might have been spurred by the introduction of technological tools that Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 190 Ulugbek Nurmukhamedov and Stuart Webb reduce the challenges of corpus-driven studies. The availability of the RANGE program (Heatley, Nation, & Coxhead 2002*) and the development of the BNC word lists (Nation, 2006*) have likely led to an increase in the number of studies conducted in recent years. With a relatively small number of studies investigating the relationship between lexical coverage and comprehension, further research is still warranted. However, there is value in researching each of these themes in more depth (Schmitt et al., 2017*). Moreover, as advances in lexical profiling tools occur there will likely be additional innovation regarding how these lines of research are conducted. Endnotes 1 Gilner, L. (2011). A primer on the General Service List. Reading in a Foreign Language, 23(1), 65–83. 2 Nagy, W., Anderson, R., Schommer, M., Scott, J., & Stallman, A. (1989). Morphological families in the internal lexicon. Reading Research Quarterly, 24(3), 263–282. 3 Xu, G., & Nation, I. S. P. (1984). A university word list. Language Learning and Communication, 3(2), 215–229. 4 Krashen, S. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the input hypothesis. Modern Language Journal, 73(4), 440–464. 5 Schmitt, N., Schmitt, D., & Clapham, C. (2001). Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55–88. 6 Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13. 7 Nation, I. S. P., & Webb, S. (2011). Researching and analysing vocabulary. Boston, MA: Cengage Learning. 8 Nation, I. S. P. (2016). Making and using word lists for language learning and testing. Amsterdam: John Benjamins. 9 Macalister, J., & Webb, S. (2013). A Response to Reynold’s (2013) comments on ‘Is text written for children useful for L2 extensive reading?’ TESOL Quarterly, 47(4), 852–855. References Ackermann, K., & Chen, Y. (2013). Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12, 235–247. Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22. Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327. Gray, B., & Biber, D. (2013). Lexical frames in academic prose and conversation. International Journal of Corpus Linguistics, 18(1), 109–135. Horst, M., Cobb, T., & Meara, P. (1998). Beyond A Clockwork Orange: Acquiring second language vocabulary through read- ing. Reading in a Foreign Language, 11(2), 207–223. Laufer, B., & Nation, I. S. P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16, 307–322. Meara, P. (2005). Lexical frequency profiles: A Monte Carlo analysis. Applied Linguistics, 26, 32–47. Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299–320. Rott, S. (1999). The effect of exposure frequency on intermediate language learners’ incidental vocabulary acquisition and retention through reading. Studies in Second Language Acquisition, 21(4), 589–619. Waring, R., & Takaki, M. (2003). At what rate do learners learn and retain new vocabulary from reading a graded reader? Reading in a Foreign Language, 15, 1–27. Webb, S. (2007). The effects of repetition on vocabulary knowledge. Applied Linguistics, 28, 46–65. Ulugbek Nurmukhamedov is an Assistant Professor of TESOL in the M.A. TESOL program at Northeastern Illinois University (Chicago, Illinois). His research focuses on second lanaguge (L2) vocabulary instruction and computer-assisted language learning. He has published in these areas in journals including TESOL Journal, ELT Journal, Writing & Pedagogy and International Journal of Lexicography. Stuart Webb is a Professor of Applied Linguistics in the Faculty of Education at the University of Western Ontario. His research interests include vocabulary studies, extensive reading and listening, and language learning through watching tele- vision. His articles have been published in journals such as Applied Linguistics and Language Learning. His latest book (with Paul Nation) is How vocabulary is learned (2017, Oxford University Press). Cite this article: Nurmukhamedov, U., & Webb, S. (2019). Lexical coverage and profiling. Language Teaching, 52(2), 188–200. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at Year References Annotations Theme 1953 West, M. (1953). A general service list of English West’s widely used GSL contains 2,000 word families and provides high lexical coverage (∼70–90%) D words. London: Longmans, Green & Co. of English. The words in the list were selected using both objective criteria (frequency of occurrence and range) as well as subjective criteria (usefulness of words for English learners), thus some words were manually identified and added to the list. The GSL has been a great resource for material writers and vocabulary researchers, and has contributed to research in vocabulary profiling partly because of its integration into vocabulary tools such as RANGE and VocabProfile. Despite its popularity, the GSL was put together almost 60 years ago, thus researchers question whether it is representative of current English. A recent attempt was made to update the list to reflect more current English. Gilner (2011)1 provides an excellent historical and methodological overview of the GSL. 1956 Schonell, F., Meddleton, I., & Shaw, B. (1956). A In this pioneering study of lexical coverage, Schonell et al. examined the number of words B study of the oral vocabulary of adults: An necessary to engage in conversation. To do so, they recorded interviews and spontaneous investigation into the spoken vocabulary of the conversation of about 2,800 unskilled and semi-skilled workers in Australia. The analysis showed Australian worker. Brisbane: University of that around 2,000 headwords would cover 99% of informal conversation. This work is significant Queensland Press. because it was carried out when electronic corpora were not available. 1985 Liu, N., & Nation, I. S. P. (1985). Factors affecting Liu & Nation investigated the effect of lexical coverage on the ability to guess the meanings of C guessing vocabulary in context. RELC Journal, unknown words. Participants were presented with a low-density (one unknown word in 25) and a 16(1), 33–42. high-density text (one unknown word in ten). More accurate guesses were made with the low-density texts compared to the high-density texts indicating that lexical coverage affects successful guessing. This is the only study examining how coverage affects vocabulary learning. More research in this area is warranted. 1988 Wodinsky, M., & Nation, I. S. P. (1988). Learning Wodinsky & Nation analysed the recurrence of words in two different graded readers and the C from graded readers. Reading in a Foreign potential for incidental vocabulary learning in each text. The number of encounters necessary for Language, 5(1), 155–161. incidental vocabulary learning to occur was set at ten. They found that learners would encounter 498 out of the 1,100 word families ten or more times if both graded readers were read. Based on the finding, the authors suggest that learners need to read a large number of graded readers to learn Language Teaching all of the new words encountered in a level. 1989 Laufer, B. (1989). What percentage of text lexis is This important study was the first to examine the effect of lexical coverage on L2 reading A essential for comprehension? In C. Lauren & comprehension. Laufer operationalized ‘adequate’ reading comprehension as scores of 55% or M. Nordman (Eds.), Special language: From higher because this was a passing grade at the institution where the participants were recruited. humans thinking to thinking machines (pp. 316– Learners who achieved a reading comprehension score of 55% or above were found to have a 323). Clevedon: Multilingual Matters. minimum of 95% lexical coverage. Based on this finding, she suggested that the lexical threshold for reasonable reading comprehension is at around 95% coverage of a text and recommended that this lexical coverage could be achieved with knowledge of 5,000 words. (Continued ) 191 Note. Authors’ names are shown in small capitals when the study referred to appears in this timeline. https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at (Continued) 192 Year References Annotations Theme Ulugbek Nurmukhamedov and Stuart Webb 1991 Meara, P. M. (1991). BBC English Core Curriculum: Meara investigated the lexical profiles of BBC radio broadcasts as a means to determine whether B The lexicon. London: BBC English. they might be understood by L2 learners. He found that knowledge of the most frequent 2,500 word families provided around 90% coverage of the broadcasts. 1992 Hirsh, D., & Nation, I. S. P. (1992). What Hirsh & Nation attempted to find the number of words needed for pleasurable reading in English. B vocabulary size is needed to read unsimplified They found that the most frequent 5,000 word families provided 97–98% lexical coverage of texts for pleasure? Reading in a Foreign unsimplified texts designed for teenagers who are native speakers of English. They suggested that Language, 8(2), 689–696. L2 learners target this vocabulary size as a goal to provide comprehension of these texts. 1993 Bauer, L., & Nation, I. S. P. (1993). Word families. Bauer & Nation’s discussion and categorization of word families has had an impact on most studies D International Journal of Lexicography, 6(4), 253– of lexical coverage because word families are the most commonly used unit of counting in lexical 279. coverage research. Because there is empirical support for the use of word families as the unit of counting in first language (L1) studies (e.g. Nagy et al., 1989)2 but there is little L2 research validating its use, some scholars question the validity of using word families as the unit of counting in L2 studies (e.g. BROWN, 2013; REYNOLDS, 2013). 1994 Carver, R. P. (1994). Percentage of unknown This study indirectly examined the lexical coverage of a text by investigating the relationship A vocabulary words in text as a function of the between comprehension of relatively difficult reading passages and the number of unknown words relative difficulty of the text: Implications for in the same passages. In this two-part study, schoolchildren in grades 3–6 and graduate-level instruction. Journal of Reading Behavior, 26(4), students were asked to read passages of varying levels of difficulty and draw a line under each 413–433. unknown word. Carver suggested that the relative difficulty of a text increases, as the number of unknown words in the text increases. He claimed that around 98–99% coverage would make texts suitable for readers. 1994 Sutarsyah, C., Nation, I. S. P., & Kennedy, This study compared the vocabulary in small corpora made up of related and unrelated texts. An B G. (1994). How useful is EAP vocabulary for ESP? ESP corpus included a single long economics text while an EAP corpus included short texts A corpus based case study. RELC Journal, 25(2), comprising diverse academic disciplines (e.g. education, engineering). The analysis showed that the 34–50. number of words in the unrelated academic texts was much higher than in the single economics text (12,744 vs. 5,438) indicating that reading materials made up of narrowly focused (related) topics require readers to know less vocabulary than those from a variety of different (unrelated) topics. 1999 Nation, I. S. P., & Wang, K. (1999). Graded readers This study examined the lexical coverage of graded readers and the amount of reading required to C and vocabulary. Reading in a Foreign Language, learn the high-frequency words found in a graded reader series. The findings indicated that 40% of 12(2), 355–380. the new words in a level would be encountered ten or more times (and potentially learned) when learners read seven books. 1999 Ward, J. (1999). How large a vocabulary do EAP Ward identified the most frequent 2,000 word families (EngList) from an engineering corpus and B Engineering students need? Reading in a Foreign then compared its coverage with other frequency lists such as the GSL (WEST, 1953) and the UWL (Xu Language, 12(2), 309–324. & Nation, 1984).3 The results showed that the most frequent 2,000 words (EngList) provided over 95% coverage of the engineering texts, which was better coverage than the GSL and GSL plus the UWL. He suggested that knowing EngList would provide students with sufficient vocabulary knowledge to read university-level engineering texts. https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 2000 Bonk, W. (2000). Second language lexical Bonk (2000) was the first to study the relationship between vocabulary knowledge and L2 listening A knowledge and listening comprehension. comprehension. Bonk used a dictation task to measure EFL students’ vocabulary knowledge and a International Journal of Listening, 14(1), 14–31. free-recall test to assess their listening comprehension. He found that the relationship between coverage and listening comprehension was complex as there was a great deal of variation in the comprehension scores at the same levels of coverage. He suggested that with good coping strategies learners may achieve good listening comprehension when coverage is less than 95%. VAN ZEELAND & SCHMITT (2013) followed up this study but recommended that 95% coverage should be the target for good listening comprehension of informal texts. 2000 Coxhead, A. (2000). A new academic word list. Using a 3.5-million-word corpus of written academic English, Coxhead created the AWL, comprising D TESOL Quarterly, 34(2), 213–238. 570 word families that account for ∼10% coverage of written academic texts. AWL has been common in lexical profiling studies because researchers explored the AWL’s coverage in different types of academic discourse and in specialized/technical vocabulary lists. It is important to note that AWL was put together in response to shortcomings of an earlier academic word list: the UWL (Xu & Nation, 1984). More recently, used a 120-million-word corpus to create a new Academic Vocabulary List, which has not been widely used in lexical profiling research yet. 2000 Hu, M., & Nation, I. S. P. (2000). Unknown This study follows up LAUFER’S (1989) investigation of the relationship between reading A vocabulary density and reading comprehension. comprehension and lexical coverage. In contrast to the findings of the earlier study, Hu & Nation Reading in a Foreign Language, 13(1), 403–430. found that L2 readers needed to know 98% of the words in a text to have adequate reading comprehension of a relatively easy text. 2002 Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). The RANGE software has been the most commonly used computer program in studies of lexical D Range: A program for the analysis of vocabulary profiling. It lists all the words that occur in a text according to where they are found in word lists, in texts. Retrieved from http://www.victoria.ac.nz/ and shows how many times they were used. Currently, RANGE can be downloaded freely together lals/about/staff/paul-nation with three sets of lists: the GSL/AWL lists (2,000, plus 570 word families), the BNC lists (14,000), and the BNC/COCA lists (25,000). RANGE also allows researchers to use their own word frequency lists instead of those that come with the program. 2003 Adolphs, S., & Schmitt, N. (2003). Lexical coverage Building on the work of SHONELL ET AL. (1956), Adolphs & Schmitt examined the vocabulary size B of spoken discourse. Applied Linguistics, 24(4), necessary for everyday (informal spoken) communication. In contrast to the earlier study, they 425–438. reported that 2,000 word families was insufficient to cover 95% of informal everyday conversation, and that 3,000 word families was a more appropriate target as the vocabulary size necessary to Language Teaching understand spoken discourse. 2004 Gardner, D. (2004). Vocabulary input through This study investigated the vocabulary in expository and narrative texts written for 10- and B, C extensive reading: A comparison of words found 11-year-olds. Gardner analysed the proportion of general high-frequency words from WEST’S (1953) in children’s narrative and expository reading GSL list and academic words from Xu & Nation’s (1984) UWL that were in the texts. Gardner found materials. Applied Linguistics, 25(1), 1–37. that these lists covered about 89% of his corpus. Because there was greater repetition of words in expository texts, he reported that expository texts might contribute to greater incidental vocabulary learning than narrative texts. (Continued ) 193 https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at (Continued) 194 Year References Annotations Theme Ulugbek Nurmukhamedov and Stuart Webb 2004 Adolphs, S., & Schmitt, N. (2004). Vocabulary Building on their earlier work (see ADOLPHS & SCHMITT 2003), Adolphs & Schmitt examined a number B coverage according to spoken discourse context. of factors such as the purpose of interaction or the degree of familiarity between two interlocutors In P. Bogaards & B. Laufer (Eds.), Vocabulary in a in spoken discourse and their influence on lexical coverage. The findings indicated that spoken second language (pp. 39–49). Amsterdam: John interaction requires a larger vocabulary than was previously reported (ADOLPHS & SCHMITT, 2003). They Benjamins. also found that a larger number of word forms was necessary for friendship-based (intimate) interactions compared to professional/business-based (transactional) conversations. 2006 Nation, I. S. P. (2006). How large a vocabulary is In this influential paper, Nation investigated the number of words necessary to understand B, D needed to reading and listening? The Canadian different types of spoken and written text. Nation focused on the vocabulary size necessary to reach Modern Language Review, 63(1), 59–82. 98% lexical coverage of written and spoken texts. He found that knowledge of the most frequent 8,000–9,000 word families was necessary to reach 98% coverage of written discourse while knowing the most frequent 6,000–7,000 word families was necessary to reach 98% coverage of spoken discourse. One of the most important contributions of this study is that it introduced the 14 BNC 1,000-word lists that were used in much of the subsequent research examining lexical coverage. Nation’s BNC word lists are made up of the 14,000 most frequent word families and also include additional lists of proper nouns and marginal words. 2007 Cobb, T. (2007). Computing the vocabulary This study was conceived in response to Krashen’s (1989)4 claim that reading alone may provide L2 C demands of L2 reading. Language Learning and learners with sufficient lexical input necessary for unassisted reading. Cobb analysed the Technology, 11(3), 38–63. vocabulary found in three subcorpora: press, academic and fiction. He found that not all words at the 3,000-word level were encountered a sufficient number of times (six or more) in the subcorpora to master that level. Based on this finding, Cobb suggested that reading alone may not be sufficient to improve L2 learners’ vocabulary knowledge for reading. 2008 Stæhr, L. S. (2008). Vocabulary size and the skills This study examined the relationship between EFL learners’ vocabulary size and their reading, A of listening, reading and writing. Language listening and writing skills. The participants’ vocabulary size was calculated based on their scores Learning Journal, 36(2), 139–152. on the Vocabulary Levels Test (Schmitt, Schmitt, & Clapham 2001).5 Stæhr reported that the learners’ vocabulary size showed fairly high correlations with reading comprehension (.83), writing production (.73) and listening comprehension (.69). The results corroborate findings on the importance of lexical coverage for L2 comprehension (LAUFER, 1989; BONK, 2000; SCHMITT ET AL., 2011). 2008 Webb, S., & Nation, I. S. P. (2008). Evaluating the Webb & Nation explain how two tools, (1) RANGE (HEATLEY ET AL., 2002) and (2) the Vocabulary Size B, D vocabulary load of written text. TESOLANZ, 16, Test (Nation & Beglar, 2007),6 can be used to help English language teachers identify level 1–10. appropriate texts for their students. Identifying suitable texts for students was based on the degree to which students have sufficient lexical coverage of texts. 2009 Webb, S., & Rodgers, M. P. H. (2009). The lexical Webb & Rodgers analysed a corpus comprising 2,841,887 words that was made up of 314 American B coverage of movies. Applied Linguistics, 30(3), and British movies. Their findings showed that a vocabulary of 3,000 word families plus knowledge 407–427. of proper nouns and marginal words provided 95% coverage of the corpus. They suggested that this vocabulary size may be sufficient for comprehension of movies. van Zeeland & Schmitt’s (2013) study of the lexical coverage necessary to understand spoken discourse supported this recommendation. https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 2009 Webb, S., & Rodgers, M. P. H. (2009). Vocabulary Webb & Rodgers analysed the vocabulary in 88 English language TV programs. They found that B, C demands of television programs. Language knowledge of the most frequent 3,000 word families and proper nouns and marginal words Learning, 59(2), 235–366. provided 95% coverage of TV programs. These results were similar to the findings of WEBB & RODGERS’ (2009) study of the lexical coverage of movies. Together, the results of the two studies suggest that 95% coverage of video may usually be achieved when viewers know the most frequent 3,000 word families. They also examined the reoccurrence of lower frequency words and suggested that regular TV viewing had the potential to contribute to considerable incidental vocabulary learning. 2009 Stæhr, L. S. (2009). Vocabulary knowledge and This study examined the relationship between vocabulary size and depth on listening A advanced listening comprehension in English as comprehension. Based on learners’ scores on the Vocabulary Levels Test Stæhr indirectly a foreign language. Studies in Second Language calculated the degree of lexical coverage of the audio passages. His findings suggested that 98% Acquisition, 31(4), 577–607. coverage provided adequate L2 listening comprehension. This is partially supported by VAN ZEELAND & SCHMITT (2013) who found that 98% coverage provided high-level listening comprehension. 2010 Brown, D. (2010). An improper assumption? The In many studies of lexical coverage, there is an assumption that proper nouns have a minimal B, D treatment of proper nouns in text coverage learning burden and are easily understood (e.g. NATION, 2006; WEBB & RODGERS, 2009). Brown counts. Reading in a Foreign Language, 22(2), questions this hypothesis. He suggests that research is needed to determine the extent to which 355–361. learners of different vocabulary sizes can understand the proper nouns encountered in text. 2010 Horst, M. (2010). How well does teacher talk Horst investigated the lexical coverage of in-class teacher talk addressed to English as a second B, C support incidental vocabulary acquisition? language (ESL) learners and examined the extent to which it may contribute to the incidental Reading in a Foreign Language, 22(1), 161–180. acquisition of newly encountered words. The findings indicated that 4,000 word families were necessary to reach 98% coverage of teacher talk, suggesting that it does not require as much vocabulary as needed to comprehend other spoken materials such as TV shows (WEBB & RODGERS, 2009b), movies (NATION, 2006), and academic lectures (DANG & WEBB 2014). The findings showed that ESL teacher talk in a classroom setting offers minimal opportunities for incidental vocabulary acquisition. 2010 Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Laufer & Ravenhorst-Kalovski examined the relationship between learners’ vocabulary size, lexical A, B Lexical threshold revisited: Lexical text coverage, coverage of academic texts and reading comprehension scores of academic English. They found learner’s vocabulary size and reading that 4,000–5,000 word families plus proper nouns was sufficient to reach 95% lexical coverage and comprehension. Reading in a Foreign Language, 6,000–8,000 word families plus proper nouns would enable learners to reach 98% coverage of the 22(1), 15–30. texts. They suggested that 95% coverage may provide minimum comprehension while 98% Language Teaching coverage may provide optimal reading comprehension. Their calculation of the target vocabulary size necessary to reach 98% coverage of written text supports NATION’S (2006) earlier findings. 2010 Matsuoka, W., & Hirsh, D. (2010). Vocabulary Matsuoka & Hirsh examined the lexical profile of an English language teaching (ELT) course book B, C learning through reading: Does an ELT course and the potential for vocabulary learning through completing the text. They found that knowledge book provide good opportunities? Reading in a of the most frequent 2,000 words and the AWL would provide around 95% coverage. They also Foreign Language, 22(1), 56–70. reported that few lower frequency and academic words were likely to be learned in the ELT course book and suggested that knowledge of the low-frequency words should be strengthened through studying additional suitable ELT texts. (Continued ) 195 https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 196 (Continued) Year References Annotations Theme Ulugbek Nurmukhamedov and Stuart Webb 2010 Webb, S. (2010). Using glossaries to increase the This study examined the lexical coverage of glossaries made up of low-frequency words found in B, C lexical coverage of television programs. Reading television programs. Webb created glossaries comprising low-frequency words (beyond the most in a Foreign Language, 22(1), 201–221. frequent 3,000 words) that occurred ten or more times in television programs. The findings indicated that the lexical coverage of the glossaries was 1.31% and 2.26%. In both cases, the amount of coverage was greater than that of the most frequent 3,001–4,000 word families, indicating that coverage provided by glossaries may be an efficient way to increase lexical coverage, and in turn comprehension. 2010 Webb, S. (2010). A corpus driven study of the Webb examined the number of encounters with low-frequency words in single movies and sets of B, C potential for vocabulary learning through movies. In this study, low-frequency words were operationalized as word families belonging to the watching movies. International Journal of Corpus 4,000–14,000 word family lists from NATION’S (2006) BNC lists. Webb found that the number of Linguistics, 15(1), 497–519. low-frequency words occurring ten or more times increased as the number of analysed movies increased. This suggested that watching movies regularly may create many opportunities for incidental learning of low-frequency words. 2010 Webb, S. (2010). Pre-learning low-frequency Based on the finding that 3,000 word families plus proper nouns and marginal words would provide B, C vocabulary in second language television 95% coverage of TV programs (see WEBB & RODGERS, 2009b), Webb examined the extent to which programmes. Language Teaching Research, 14(4), knowledge of low-frequency topic- related words encountered in TV programs would boost lexical 501–515. coverage. The coverage of ten word families ranged from 0.70% to 3.91% in different programs. This tended to provide greater coverage than learning 1,000 word families at the next level (e.g. 3,000– 3,999 or 4,000–4,999). 2011 Hsu, W. (2011). The vocabulary thresholds of This study examined the vocabulary sizes necessary to reach 95% and 98% coverage of Business B business textbooks and business research English in two corpora: business textbooks and business research articles. Hsu found that articles for EFL learners. English for Specific knowledge of 5,000 and 8,000 word families plus proper nouns were necessary to reach 98% Purposes, 30(4), 247–257. coverage of business textbooks and business research articles, respectively. Further analyses revealed that the vocabulary demands of business textbooks and research articles varied according to subject (e.g. business policy, marketing, accounting). 2011 Rodgers, M. P. H., & Webb, S. (2011). Narrow This study looked at the potential for vocabulary learning through related and unrelated television C viewing: The vocabulary in related television programs. The results indicate that there are fewer different words in related programs and the programs. TESOL Quarterly, 45(4), 689–717. words within these programs reoccur more often than in unrelated programs. These findings support the results of Sutarsyah, Nation, & Kennedy’s (1994) earlier study of academic text. 2011 Schmitt, N., Jiang, X., & Grabe, W. (2011). The Schmitt et al. examined the relationship between lexical coverage and reading comprehension. A percentage of words known in a text and reading This comprehensive study of lexical coverage addressed the methodological limitations of comprehension. Modern Language Journal, 95(1), LAUFER (1989) and HU & NATION (2000) and included representative samples of English language 26–43. learners (N = 661) from eight different countries. Schmitt et al. found a linear relationship between coverage and comprehension when coverage was 90% and higher. At these levels of coverage, more words that are known in a text led to greater reading comprehension. The results support the findings of HU & NATION (2000). https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 2012 Coxhead, A., & Walls, R. (2012). TED Talks, This study examined the vocabulary coverage of TED Talks, an online platform which contains B vocabulary, and listening for EAP. TESOL ANZ recordings of short academic spoken presentations given by experts at annual TED conferences, Journal, 20, 55–65. and explored whether the coverage of the AWL in TED Talks was similar to written academic texts. Coxhead & Walls found that that knowledge of 8,000–9,000 word families plus proper nouns were necessary to reach 98% coverage of TED Talks. This is similar to NATION’S (2006) findings for written discourse at 98% coverage. Their analysis also revealed that the AWL accounted for 3.90% of TED Talks, similar to the AWL’s coverage (4.41%) of academic spoken English (DANG & WEBB, 2014), indicating that the AWL’s coverage of academic spoken lectures is around half that of written academic texts (10%). 2012 Nation, I. S. P. (2012). The BNC/COCA word family Nation’s 25 BNC/COCA 1,000-word lists are the most recently created lists being used in lexical D lists. Retrieved from http://www.victoria.ac.nz/ profiling studies. They were created to better represent general service vocabulary at the 1,000- and lals/about/staff/paul-nation 2,000- word-frequency levels, and were sourced primarily from spoken discourse at these levels. The rest of the lists were also designed to more accurately reflect international English and were therefore sourced from American and British supercorpora. There are four additional lists that include (1) a list of proper names, (2) a list of marginal words, (3) a list of abbreviations, and (4) a list of transparent compounds. Both sets of lists are freely available at Paul Nation’s website. Information about the evaluation of the BNC/COCA word lists can be found in NATION & WEBB (2011: chapter 8)7 and NATION (2016: chapter 13).8 2012 O’Loughlin, R. (2012). Tuning in to vocabulary This study provides a second evaluation of the vocabulary learning potential of English language C frequency in coursebooks. RELC Journal, 43(2), course books (see MATSUOKA & HIRSH, 2010). O’Loughlin evaluated three levels of a course book series 255–269. that contained reading as well as listening passages in addition to supplementary texts. He found that only about 1,500 of the 2,000 most frequent words were encountered in the combined texts (reading, listening, supplementary). He suggests that other sources of meaning-focused input are likely necessary to learn high-frequency vocabulary. 2013 Brown, D. (2013). Types of words identified as This study examined the types of words that might cause problems for learners in reading. After the D unknown by L2 learners when reading. System, participants marked inflected and derived forms of high-frequency words as unknown, Brown 41, 1043–1055. concluded that even L2 learners with a larger vocabulary size might have limited knowledge of derivational affixes at the higher frequency levels. Based on the finding, Brown suggests that lemma may be an appropriate unit of counting (alternative to word families) for lexical profiling studies. Language Teaching 2013 Laufer, B. (2013). Lexical thresholds for reading Because of contrasting recommendations of lexical thresholds for reading comprehension (see LAUFER, B, D comprehension: What they are and how they can 1989; NATION, 2006; SCHMITT ET AL., 2011 etc.), Laufer suggests an OPTIMAL threshold that requires knowledge be used for teaching purposes. TESOL Quarterly, of 8,000 word families plus proper nouns to reach 98% coverage and a minimum threshold that requires 47(4), 867–872. around 5,000 word families including proper nouns to reach 95% coverage. She then describes a number of tools (complementary to those recommended in WEBB & NATION, 2008) that can help teachers identify a text’s lexical coverage and learners’ vocabulary size. Laufer suggests that measuring learners’ vocabulary size and their readiness for a text early on will enable teachers to identify potentially unknown words that might then be addressed via explicit word-focused instruction. (Continued ) 197 https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 198 (Continued) Year References Annotations Theme Ulugbek Nurmukhamedov and Stuart Webb 2013 Reynolds, B. L. (2013). Comments on Stuart Webb Reynolds questioned the validity of the word family as the unit of counting in studies of lexical B, D and John Macalister’s Is text written for children coverage. He argued that it was L1 rather than L2 research that provided support for using word useful for L2 extensive reading? TESOL Quarterly, families as the unit of counting. In their response, Macalister & Webb (2013)9 agreed on the need for 47(4), 849–852. L2 research to investigate the extent to which different units of counting are appropriate for learners at different proficiency levels. 2013 Webb, S., & Nation, I. S. P. (2013). Webb & Nation provided an overview of lexical profiling studies. They also discussed how the B, D Computer-assisted vocabulary load analysis. In RANGE program can be used to carry out lexical profiling studies and offered suggestions on how to C. Chappelle (Ed.), The encyclopaedia of applied interpret RANGE-based output for lexical analysis. linguistics (pp. 1–10). London: Wiley-Blackwell. 2013 van Zeeland, H., & Schmitt, N. (2013). Lexical van Zeeland & Schmitt’s study is the most comprehensive research examining the relationship A coverage in L1 and L2 listening comprehension: between lexical coverage and listening comprehension. They investigated L1 and L2 learners’ The same or different from reading listening comprehension of informal narratives. They found that listening comprehension may comprehension? Applied Linguistics, 34(4), largely depend on the level of lexical coverage, and suggested that although 98% coverage may 457–479. provide a high level of comprehension, 95% may be sufficient for good comprehension of informal narratives. 2013 Webb, S., & Macalister, J. (2013). Is text written This study examined the vocabulary size necessary to comprehend texts written for L1 children, L1 B for children useful for L2 extensive reading? adults and L2 learners, and aimed to determine the numbers of words needed to reach the 98% TESOL Quarterly, 47(2), 300–322. coverage point that is associated with adequate reading comprehension (see HU & NATION, 1990 and SCHMITT ET AL. 2011). Webb & Macalister found that 10,000 word families plus knowledge of proper nouns and marginal words were necessary to reach 98% coverage of texts written for L1 children and L1 adults, while 3,000 word families plus knowledge of proper nouns and marginal words were necessary to reach 98% coverage of texts written for L2 learners. They suggested that texts published for L1 children (and L1 adults) may be challenging for L2 learners to understand and might not be suitable for L2 extensive reading programs. 2014 Anthony, L. (2014). AntWordProfiler (Version AntWordProfiler is freely available software created by Laurence Anthony as an alternative to the D 1.4.1) [Computer Software]. Tokyo: Waseda RANGE (HEATLEY ET AL., 2002) and Vocabprofile (COBB n.d.) lexical profilers. Unlike RANGE, which is only University. Retrieved from http://www. compatible with Windows operating systems, AntWordProfiler is compatible with Windows, laurenceanthony.net/ Macintosh and Linux. 2014 Dang, T., & Webb, S. (2014). The lexical profile of This study examined the lexical profile of academic spoken English. The British Academic Spoken B academic spoken English. English for Specific English corpus was analysed in the study. Dang & Webb found that learners need a vocabulary of Purposes, 33, 66–76. around 4,000 word families plus proper nouns and marginal words to reach 95% coverage of academic spoken text. Their analysis also revealed that COXHEAD’S (2000) AWL accounted for 4.41% of academic spoken English. Knowledge of the AWL reduced the lexical burden of academic spoken English; learners who knew the AWL would only need a vocabulary of 3,000 word families plus proper nouns and marginal words to reach 95% coverage of academic spoken English. https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 2014 Kaneko, M. (2014). Is the vocabulary level of the This study analysed the vocabulary coverage of the reading section of the TOEFL iBT test by the B reading section of the TOEFL Internet-based test Educational Testing Service. It was found that between 12,000 and 13,000 word families, including beyond the lexical level of Japanese senior high proper nouns, were necessary to reach 98% coverage. It is supported by the findings of WEBB & school students? Vocabulary Learning and PARIBAKHT (2015) showing that more vocabulary is required to reach 98% coverage of the reading Instruction. doi: http://dx.doi.org/10.7820/vli.v03. sections of proficiency tests compared to other written texts. 1.kaneko 2014 Nation, P. (2014). How much input do you need Nation investigated the amount of authentic text required to encounter words a sufficient number C to learn the most frequent 9,000 words? Reading of times to potentially acquire certain vocabulary sizes. He found that if learners read 200,000 in a Foreign Language, 26(2), 1–16. words, they could potentially learn the most frequent 2,000 words incidentally, and if they read 3 million words, they may be able to learn the most frequent 9,000 words. 2015 Kaneko, M. (2015). Vocabulary size required for This study examined the vocabulary size necessary for comprehension of the listening section of B the TOEFL iBT Listening Section. The Language the TOEFL iBT test. Using NATION’S (2006) BNC lists, Kaneko reported that 3,000 and 6,000 word Teacher, 39(1), 9–14. families (plus proper nouns and interjections) were necessary to reach 95% and 98% coverage, respectively. Kaneko suggests that NATION’S (2012) BNC/COCA lists might yield more accurate text-coverage results because TOEFL iBT contains some words common in American English. 2015 Webb, S., & Paribakht, T. S. (2015). What is the This study examined the lexical profiles of listening, reading, and cloze test passages from an A, B relationship between the lexical profile of test English proficiency test (CanTEST) used for university admission purposes in Canada. The findings items and performance on a standardized revealed that for reading comprehension passages, 6,000 and 14,000 word families (plus proper English proficiency test? English for Specific nouns and interjections) were necessary to reach 95% and 98% coverage, respectively. Listening Purposes, 38, 34–43. comprehension passages required 4,000 and 10,000 word families (plus proper nouns and interjections) to reach 95% and 98% coverage, respectively. Webb & Paribakht found either no correlation or small correlations between the amounts of lexical coverage of the test passages and scores on those passages. This led them to suggest that larger vocabulary sizes rather than lexical coverage may be most important for comprehension. 2016 Nation, P. (2016). Types, lemmas, and word In response to considerable debate about the choice between lemmas, types and word families D families. In P. Nation (Ed.), Making and using when deciding on the unit of counting (e.g. BROWN, 2013; REYNOLDS, 2013), Nation suggests that word lists for language learning and testing researchers ‘need to look at the reasons why the counting is being done and who the lists will be Language Teaching (pp. 23–39). Amsterdam: John Benjamins. used with’ (p. 23). 2017 Schmitt, N., Cobb, T., Horst, M., & Schmitt, Schmitt et al. reviewed three influential studies of lexical coverage by COBB (2007), NATION (2006), A, B, D D. (2017). How much vocabulary is needed to use and VAN ZEELAND & SCHMITT (2013). Schmitt et al. recommended that further research on lexical English? Replication of van Zeeland & Schmitt coverage was needed and proposed that approximate replications of the studies are completed to (2012), Nation (2006) and Cobb (2007). Language establish more accurate lexical coverage figures. Teaching, 50(2), 212–226. (Continued ) 199 https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0261444819000028 Downloaded from https://www.cambridge.org/core. IP address: 98.220.114.49, on 07 Jun 2019 at 14:34:02, subject to the Cambridge Core terms of use, available at 200 Ulugbek Nurmukhamedov and Stuart Webb (Continued) Year References Annotations Theme 2017 Tegge, F. (2017). The lexical coverage of popular Tegge examined the vocabulary demands of pop songs found in Billboard charts as well as B songs in English language teaching. System, 67, teacher-selected songs for language teaching purposes. She found that teacher-selected songs had 87–98. a lower lexical demand than regular pop songs. n.d. Cobb, T. Web Vocabprofile [accessed June 18, The Compleat Lexical Tutor is a free online website created and maintained by Tom Cobb. It offers D 2016 from http://www.lextutor.ca/vp/], an a number of useful resources to language teachers and researchers. One of them is VocabProfile – a adaptation of Heatley, Nation & Coxhead’s (2002) web-based version of RANGE (HEATLEY ET AL., 2002). Similar to RANGE, VocabProfile classifies words in RANGE. a text by their frequency. VocabProfile allows users to analyse texts with several different word lists including WEST’S (1953) GSL lists plus COXHEAD’S (2000) AWL, NATION’S (2006) BNC lists, and NATION’S (2012) BNC/COCA lists.