Teaching and Teacher Education 174 (2026) 105426 Contents lists available at ScienceDirect Teaching and Teacher Education journal homepage: www.elsevier.com/locate/tate Research paper Teacher continuing professional development in formative assessment: A pathway to enhanced student achievement Jiayi Li a , Peter Yongqi Gu b,* a b City University of Macau, Avenida Padre Tomás Pereira, Taipa, Macau, China Victoria University of Wellington, von Zedlitz Building, 26 / 28 Kelburn Parade, Wellington, New Zealand A R T I C L E I N F O A B S T R A C T Keywords: Formative assessment Student learning English language achievement Teacher continuing professional development Formative assessment (FA) is widely recognized as integral to effective pedagogy. Research has largely confirmed its potential to improve learning outcomes. This study investigates how a structured 12-week continuing professional development (CPD) program in FA for English as a foreign language (EFL) teachers influenced student achievement in English language learning. Focusing on five secondary school teachers across two schools in China, the research addresses the question: To what extent does FA-focused teacher CPD translate into measurable improvements in students’ English language performance? Data were collected from 509 students (238 in CPD classes, 271 in non-CPD classes) through pre- and post-tests of language proficiency. Quantitative analysis revealed statistically significant gains in post-test performance among CPD-class students compared to their non-CPD counterparts. The findings not only affirm the value of integrating FA into teacher development frameworks but also highlight its potential as a strategy for improving language education outcomes in similar contexts. 1. Formative assessment for student achievement Formative assessment (FA) is defined as a process through which evidence of student learning is elicited, interpreted, and used by teachers and learners to improve instruction and learning and enhance self-regulated learning (SRL). While this pedagogical approach is widely endorsed for its theoretical benefits, a critical question remains regarding the extent to which FA, especially teacher professional development in FA, translates into measurable gains in student achievement, particularly within specific educational contexts. This issue is especially pronounced in China's English as a Foreign Language (EFL) sector, where the effective implementation of FA practices is a persistent challenge. Despite being emphasized in curriculum standards for over two decades (Ministry of Education of the People’s Republic of China (MOE), 2003, 2020), there is a lack of research examining whether FA-focused teacher professional development demonstrably improves student achievement within this specific context. This study addresses this gap by investigating the impact of a carefully designed teacher continuing professional development (CPD) program aimed at enhancing classroom-based FA. Drawing on both theoretical insights and practical applications, the research explores how improving teachers' FA literacy can lead to measurable improvements in student learning outcomes. In doing so, it contributes not only to the growing body of knowledge on FA, but also to the ongoing conversation about how best to support teachers in translating policy into effective pedagogical practice. 1.1. Conceptualizing and operationalizing FA Educational assessment is typically divided into formative assessment (FA), which emphasizes the learning process and ongoing improvement, and summative assessment, which is intended to measure learning outcomes for accountability purposes and is commonly associated with standardized tests (Leung & Mohan, 2004). FA is receiving increasing attention for its great significance in the enhancement of instructional quality and student learning outcomes. Researchers conceptualize FA as an ongoing process that involves clarifying: (1) where learners are going, (2) where they currently are, and (3) how to close the gap between the two (Black & Wiliam, 2009, 2012; Ramaprasad, 1983; Sadler, 1989). Rather than being a one-time test, FA entails the continuous elicitation and interpretation of evidence of student learning, followed by tailored feedback and * Corresponding author. E-mail addresses:

[email protected]

(J. Li),

[email protected]

(P.Y. Gu). https://doi.org/10.1016/j.tate.2026.105426 Received 29 May 2025; Received in revised form 29 January 2026; Accepted 30 January 2026 0742-051X/© 2026 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 instructional adjustments to improve outcomes (Broadfoot et al., 1999; Heritage, 2007; Popham, 2008). FA serves both informing and forming functions (Davison & Leung, 2009). Gu and Lam (2023) provided an overview of FA by combining the process and function views of FA. FA informs teachers and students about learning progress through evidence, and enables action via feedback and adaptive strategies. Through collaborative dialogue between teachers and students, FA serves as a teaching and learning tool rather than solely an evaluative mechanism (Black & Wiliam, 2009; Chappuis, 2009; Sadler, 1989). Each step, eliciting, interpreting, feedback, and action, is aligned with learning goals, ensuring assessment is meaningfully integrated into instruction to enable genuine progress. Operationalizing FA becomes particularly important so that the concept can be translated into classroom FA practice. Gu (2021) operationalizes the concept of FA into two forms (Fig. 1): minimum FA which focuses on the FA process, and ideal FA which includes both function and process. Minimum FA requires the completion of at least one cycle of FA practice, including the five components: (1) clarifying targets of teaching/learning/assessment, (2) eliciting evidence of student learning or understanding, (3) interpreting the evidence, (4) providing feedback, and (5) acting on the feedback. Each of the four steps (corresponding to components 2–5) revolves towards the achievement of the learning target in question. Only a complete cycle can make an assessment practice formative, moving students closer to their learning targets. Depending on the task, it often takes more than one cycle to help students reach their target. The spiraling cycles of FA only stop when the target is reached or when a judgement is made to postpone or abandon the target. Ideally, FA starts with a formative purpose (the intent of an assessment event being primarily for diagnosis, guidance, and supporting ongoing learning), goes through FA practice, and achieves formative effect (the outcome where the assessment process successfully leads to reaching or approaching the teaching and learning target). However, it is difficult to find ideal FA in real teaching contexts because teachers may not have an explicit and conscious formative purpose before enacting assessment in a classroom, and the FA practice may not lead to the formative effect of improving teaching and learning. This operationalization turns the concept of FA into practical chunks, which makes FA less of a concept on paper and more of an actionable exercise. The spiraling cycles provide teachers with a clear idea of what FA practice looks like, what classroom practices are formative and what are not. Moreover, it gives researchers concrete units for analyzing FA (Gu & Yu, 2020). Therefore, it is used in the present study to construct the framework for CPD in FA. 1.2. Effects of FA on student learning It has been claimed that FA can “double the speed of student learning” (Wiliam, 2007, p. 37). In Black and Wiliam's (1998b) original review, the effect sizes of FA ranged between d = .4 and d = .7. Empirical studies have demonstrated significant gains in student achievement when teachers integrate FA in the classroom (Black & Wiliam, 1998b; Earl, 2003; Gardner, 2006; Willis, 2010). For example, Black, Wiliam, and their colleagues (Black et al., 2003; Wiliam et al., 2004) designed a six-month project to help secondary school teachers to embed FA principles into classroom practice. At the end of the project, students in the experimental group achieved significantly higher test results than those of the control group, with a medium effect size (d = .32). Wiliam et al. (2004) noted that “teachers' practices were slow to change, and that most of the changes in practice … occurred towards the end of the year, so that the actual size of the effects found are likely to be underestimates of what could be achieved when teachers are emphasizing formative assessment as an integral part of their practice” (p. 56). Andersson and Palm's (2017) teacher professional development program in FA revealed that the intervention group of students significantly outperformed their control-group peers in a post-test after controlling for pre-test scores, with a medium effect size (d = .66). In recent years, early conclusions on the effectiveness of FA such as Black and Wiliam (1998b) have been challenged. Bennet (2011) contended that most of Black and Wiliam's (1998b) claims of effectiveness were “a mischaracterization that has essentially become the educational equivalent of urban legend” (p. 12). Kingston and Nash (2011) reviewed over 300 research studies and found that only 13 studies (with 42 independent effect sizes) met their inclusion criteria for a meta-analysis. The overall weighted mean effect size was small (d = .20). The authors categorized the FA interventions in these studies into five types based on the primary focus of the implementation: 1) Professional Development (N = 23): Studies where the core intervention involved teachers dedicating time to learning about and planning the implementation of various FA techniques. 2) Curriculum-Embedded Assessments (N = 7): Studies implementing open-ended assessments at key points within the curriculum, primarily aimed at understanding student learning processes. 3) Computer-Based Formative Systems (N = 6): Studies utilizing online systems to administer short, benchmark-like tests and provide score reports to teachers. 4) Specific Use of Student Feedback (N = 3): Studies in which providing students with detailed, criterion-referenced feedback to guide their learning was the central intervention being tested. 5) Other (N = 3): Studies focusing on other Fig. 1. Operationalization of FA (Gu, 2021, p. 3). 2 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 distinct practices, namely teacher-led classroom assessment activities, structured student self-reflection, or assessment dialogues embedded in classroom talk. Notably, the two implementation approaches yielding the highest mean effect sizes were professional development (d = .30) and computer-based systems (d = .28) (Kingston & Nash, 2011). A few issues have been identified as resulting in the lack of conclusive evidence. Dunn and Mulvenon (2009) noted the lack of consensus on definitions by pointing out that “without a clear understanding of what is being studied, empirical evidence supporting formative evidence will more than likely remain in short supply” (p. 2). Additionally, methodological gaps in existing research, such as insufficient details about training duration or intervention design, have obscured the relationship between specific training characteristics and outcomes (Dignath & Büttner, 2008). Kingston and Nash (2011) further underscored these limitations by pointing out that only a small percentage of their reviewed studies provided sufficient information for calculating effect sizes. These challenges collectively emphasize the need for quantitative investigations into the impact of FA on student achievement, which leads to the focus of this study. Recent meta-analyses and systematic reviews have refined our understanding. Lee et al. (2020) analyzed 33 K–12 studies in the U.S. and found a small overall effect (d = .29), with stronger impacts when FA emphasized student-initiated self-assessment (d = .61), formal feedback (d = .40), and medium-cycle timing (d = .52). Yao et al. (2024), in a global meta-analysis of 118 studies, confirmed a small but consistent effect (g = .25; g = .22 in the U.S.), with no significant differences across FA types (teacher, peer, self) or subjects. However, effects were larger in high school and in regions outside North America and Western Europe. Not all FA interventions yield positive results. Boström and Palm (2023) reported no significant effect of a comprehensive FA practice on Grade 7 math achievement, despite teacher participation in a 96-h professional development program. They attributed this null result to superficial implementation. Their teachers focused on teacher-led strategies (e.g., exit tickets) but rarely activated students as peer or self-assessors, missing the “spirit” of FA (Marshall & Drummond, 2006). Maskos et al. (2025) echoed this, finding mixed results in mathematics education, with consistent gains only for interactive computer-based assessments and interventions implemented over three months with clear learning intentions. Teacher capacity and context critically mediate FA effectiveness. Schildkamp et al. (2020) and Yan et al. (2021) identified three categories of prerequisites: (1) knowledge and skills (e.g., data literacy, pedagogical content knowledge), (2) psychological factors (e.g., self-efficacy, positive attitudes), and (3) social supports (e.g., collaboration, leadership). Yan et al. (2021) further distinguished between teacher intentions and actual implementation, revealing an “intention–behavior gap” shaped by working conditions, policy pressures, and student characteristics. Another critical factor mediating the effectiveness of FA is teachers' topical knowledge, which directly shapes their interpretation of instructional targets. This factor fundamentally determines the success or failure of FA practices in the classroom. For instance, in an ESL context, if the monthly instructional focus is developing students' ability to read informational texts, yet classroom FA activities predominantly emphasize grammatical accuracy, the targeted reading competency is unlikely to develop meaningfully. The problem of “missing disciplinary substance” in FA research was identified as early as 2011 (Coffey et al., 2011); however, considerably more research is needed to fully understand how content-specific knowledge influences FA implementation and outcomes. Within the ESL field, FA research has expanded rapidly in recent years. A Scopus search for empirical studies examining the impact of formative assessment yielded 30 publications from the past five years. Notably, only one of these studies (Taye, 2025) explicitly quantified FA's impact; and only one of the thirty focused on teacher professional development (Singh & Mahapatra, 2022). Taye (2025) examined writing accuracy among 41 tertiary-level EFL students in Ethiopia. Over an eight-week intervention, participants received teacher feedback, engaged in peer review and self-assessment, and followed a structured writing process. Post-intervention assessments revealed significant improvements in grammar, punctuation, spelling, and sentence structure. In Singh and Mahapatra's (2022) study, five in-service secondary ESL teachers in India completed a 45-day online professional development program focused on self- and peer-assessment, delivered via Google Meet. Data on teachers' perceived impact was gathered primarily through semi-structured interviews. The authors concluded that the findings support the use of self- and peer-assessment as effective formative assessment tools. It should be noted that many existing studies have examined specific aspects or components of FA practice such as questioning (Jiang, 2014) and feedback (Hattie & Timperley, 2007; Price et al., 2010; Shute, 2008), as well as assessment tools or assessment strategies such as self-assessment (Ross, 2006), peer assessment (Rohrbeck et al., 2003; Van Zundert et al., 2010), and portfolio assessment (Burner, 2014). These studies confirmed their potential in facilitating student learning. However, regardless of the importance of any specific component of FA, it cannot represent the whole process of FA practice. Likewise, assessment strategies can provide teachers and students with detailed learning information, but it is whether and how this information is used that determines the formativeness of an assessment strategy. The quality of assessment information influences how effectively it can be used to support learning. Information obtained through multiple observations and interpretations helps ensure that subsequent judgments and decisions are accurate and reliable (Gu & Li, 2020a, 2020b). Therefore, this study focuses on analyzing the impact of the whole process of teacher FA practice on student achievement. 1.3. Teacher continuing professional development Teacher Continuing Professional Development (CPD) represents a cornerstone of educational improvement, encompassing both the incidental learning that occurs through daily practice and the conscious, planned activities aimed at enhancing the quality of classroom instruction (Day, 1999). It involves teachers, whether working individually or collaboratively, in a continuous process of critical reflection and growth, through which they revitalize their commitment to the moral purposes of teaching while systematically developing their professional knowledge, planning, and instructional techniques (Day & Sachs, 2005). In its most impactful form, effective teacher CPD is precisely defined as “structured professional learning that results in changes to teacher knowledge and practices, and improvements in student learning outcomes” (Darling-Hammond et al., 2017, p. 2), a definition that explicitly links teacher learning to tangible benefits for pupils. The conceptualization of how CPD should be structured has evolved significantly. Timperley et al. (2007) synthesized a substantial body of research to propose the “teacher inquiry and knowledge-building cycle”, a model that positions the analysis of student learning needs and corresponding teacher learning needs as the essential starting point for any development activity. This model reconceives CPD as a series of collaborative and self-regulatory learning cycles, heavily emphasizing teacher agency and integrating the core features of professional learning communities with the disciplined inquiry of collaborative action research. Building on this, Darling-Hammond et al. (2017) conducted a comprehensive review of 35 methodologically robust studies over three decades, resulting in a detailed set of seven criteria characterizing high-quality CPD. According to their synthesis, effective professional development is: (1) content-focused, (2) incorporates active learning, (3) supports collaboration, (4) uses models of effective practice, (5) provides coaching and expert support, (6) offers feedback and reflection, and (7) is of sustained duration. It is critical to understand the pathway through which CPD exerts its influence. Desimone (2009) provided a foundational conceptual 3 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 framework for studying this process, positing a core causal sequence: (1) teachers experience high-quality professional development; (2) this experience increases their knowledge and skills and/or changes their attitudes and beliefs; (3) teachers then apply these new acquisitions to improve their instructional content or pedagogical approach; and (4) these instructional improvements subsequently foster increased student learning. However, this seemingly logical chain is not guaranteed. A crucial finding from the literature is that the relationship between teacher CPD and student outcomes is fundamentally indirect and can be broken. For instance, Darling-Hammond et al. (2017) noted that six of the 35 studies they reviewed resulted in measurable teacher learning but no corresponding student learning gains. Similarly, Yoon et al. (2007) found that only a handful of high-quality studies provided clear evidence that CPD contributed to the complete sequence of improved teacher knowledge, better teaching practices, and enhanced student achievement. These results serve as a critical reminder that changing classroom practice is a complex endeavor, and that positive effects on student learning cannot be assumed simply because teachers have participated in training. A recent contribution that helps to explain these inconsistencies comes from Sims et al. (2025), who conducted a systematic review and meta-analysis of 104 randomized controlled trials on teacher PD. Acknowledging that previous meta-analyses have consistently found only small average effects on pupil test scores (around .06), their study sought to move beyond documenting whether PD works to explain why some programs are more effective than others. They proposed a new theoretical framework arguing that for PD to be effective, it must successfully address four distinct purposes: instilling Insight (I) into teaching and learning, building Motivation (M) to change, developing Techniques (T) for implementation, and embedding changes in Practice (P). From this “IMTP” framework, they identified 14 specific, causally-active mechanisms (e.g., managing cognitive load, goal setting, modelling, feedback, and action planning) drawn from robust evidence in adjacent fields like cognitive science. Their meta-analytic tests provided qualified support, finding that PD programs incorporating more of these mechanisms tended to have a greater impact, and that “balanced” programs addressing all four IMTP purposes showed particularly promising effect sizes (d = .15). This theory offers a more granular and explanatory account of effective PD design, shifting the focus from checklists of surface-level features to the underlying causal mechanisms that actively support teacher change. A synthesis of this evolving body of research allows for a more nuanced conclusion about the conditions under which CPD in FA succeeds, and subsequently, when FA itself has a positive impact on student learning. The effects are highly conditional and operate through a multistage causal chain that can break down at several points. First, CPD in FA is most likely to succeed in changing teacher knowledge, beliefs, and practices when its design is explicitly mechanism-rich, moving beyond a mere “tick-box” exercise of including recommended features. Successful CPD does more than transmit information about FA; it is engineered to actively support the difficult process of transforming understanding into sustained, habitual practice. This requires a deliberate selection and combination of mechanisms aligned with the IMTP framework (Sims et al., 2025). The CPD must provide a clear, evidence-based Insight into the principles and value of FA, often from a credible source like the work of Black and Wiliam (1998b), while carefully managing cognitive load to avoid overwhelming participants. It must also consciously build Motivation through processes like explicit goal setting and reinforcement. Crucially, and where many programs fail, it must then develop practical Techniques through direct instruction, modelling of exemplary practice, and opportunities for low-stakes rehearsal accompanied by constructive feedback. Finally, and perhaps most importantly for sustained impact, it must include dedicated mechanisms to embed Practice, such as supported action planning, prompts/cues, and context-specific repetition within teachers’ own classrooms (Sims et al., 2025). This mechanism-rich approach is typically housed within the broader, supportive structures identified by earlier research: sustained duration, opportunities for collaboration, and coaching support (Darling-Hammond et al., 2017; Timperley et al., 2007). When CPD fails to change actual classroom practice, as evidenced by the studies that showed teacher learning but no student learning (Darling-Hammond et al., 2017), it is often because the program was strong on building Insight but critically weak on the mechanisms for developing Technique and embedding Practice, resulting in the well-documented “knowing-doing gap”. Second, FA itself only succeeds in improving student achievement when two key conditions are met: the CPD has first been effectively translated into skilled teacher practice, and that practice is implemented with depth and coherence. The impact of FA is not an automatic consequence of teachers learning about it; it is mediated by the quality of implementation. Positive effects on student achievement, as documented in studies like Black et al. (2003) and Andersson and Palm (2017), occur under specific conditions. Teachers must possess a deep, operational, and flexible understanding of FA that prevents rigid or misguided application. Furthermore, FA must be implemented not as a bag of tricks but as a holistic, integrated, and cyclic pedagogical process that is deeply embedded in the daily classroom culture. The mixed results and outright failures reported in the literature (Boström & Palm, 2023; De Lisle, 2015; Schneider & Randel, 2010) can often be traced to a breakdown in this chain: either the preceding CPD was insufficient to bring about genuine change in teacher practice, or the resulting FA practice was superficial, fragmented, or poorly adapted to the specific subject domain (e.g., EFL) and local contextual constraints, such as a dominant examination culture. This synthesis clarifies that the pathway from CPD to student achievement is a sequential, contingent process. Effective, mechanismrich CPD is a necessary prerequisite for changing teacher practice in profound and lasting ways. This skillfully and coherently changed practice, in turn, creates the necessary classroom conditions for FA to exert a moderate, positive effect on student learning, an effect that navigates a middle ground between unrealistically optimistic and unduly pessimistic claims. In recent years, numerous empirical studies have demonstrated that it is possible to help teachers develop their FA practice through thoughtfully designed CPD, with documented successes across a range of subjects and international contexts (Andersson & Palm, 2017; Black et al., 2003; De Neve et al., 2022; DeLuca et al., 2015; Oo et al., 2023). However, focused CPD for enhancing FA as holistic classroom practice that includes both informing and forming functions among English as a Foreign Language (EFL) teachers remains a relatively limited and under-explored area. Therefore, the study reported next is deliberately designed to build upon the synthesized insights from this review. The study addresses the question: To what extent does FA-focused teacher CPD translate into measurable improvements in students' English language performance? The study will implement a CPD program and incorporate specific mechanisms to develop teacher insight, motivation, technique, and embedded practice specifically in FA for the EFL context. The study will then empirically investigate the extent to which this carefully designed intervention successfully translates into improved teacher assessment practices and, ultimately, measurable gains in students’ English language performance. 2. This study This study explores the extent to which a teacher CPD program aimed to develop teacher FA literacy (Li & Gu, 2023) will lead to student improvement in English language achievement. Following Timperley et al.’s (2007) “teacher inquiry and knowledge-building cycle”, we designed and implemented a 12-week program for a group of senior secondary EFL teachers in Tangshan, China. This study adopts Gu's (2021) framework of FA which focuses on both the steps of FA practice and the central role of targets in guiding each step of FA towards the 4 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 individual interactions conducted over this period. Week 3: Sharing and goal setting In week three, a 4-h group meeting was held where teachers shared insights on FA and its components. Discussions revealed limited familiarity with curriculum standards, prompting a review of the General Senior High School Curriculum Standards. Based on baseline findings and teacher input, the group established three CPD goals: deepening understanding of FA cycles, developing skills to design and implement FA tasks, and fostering continuous reflection on FA practices. Stage 2: Round 1 of action research (Weeks 4–7) This stage conducted the first round of action research that lasted four weeks. At the beginning of this stage, the teacher professional development group collaboratively designed a classroom-based formative assessment (CBFA) task. In the subsequent week, each of the five teachers individually implemented this FA task in their respective classes and video-recorded one lesson that focused on this task. Selfreflection and group reflection sessions focused on the video-recorded lessons took place in the following two weeks. Week 4: Joint planning of FA task In the fourth week of the CPD program, the teacher development group held a collaborative planning session to design a CBFA task. A major issue we found in the baseline findings was that the teachers lacked awareness of the need to set concrete, realistic learning targets or clarify success criteria for students (Li & Gu, 2023). In response, the first action cycle prioritized defining long-, mid-, and short-term goals aligned with curriculum standards. The teachers were asked to link FA tasks to short-term goals, which contributed to broader objectives. A key strategy was analyzing the curriculum goals of developing student “Subject Core Competencies”, including language ability, cultural awareness, thinking capacity, and learning ability (Ministry of Education of the People's Republic of China, 2020), to design relevant classroom FA tasks. The session began with a discussion of the curriculum standards that teachers had examined the previous week. Participants shared their perspectives on the curriculum targets, highlighting the emphasis on developing students' core competencies in English, while also noting concerns about the limited specific guidance regarding FA within the standards. Prior to the collaborative task planning, the first author presented an illustrative lesson plan focused on “giving advice” as a model of a wellstructured FA task. This example provided teachers with a concrete understanding of key design principles and implementation strategies. Following this demonstration, the group collaboratively developed a CBFA task to be implemented and recorded the following week. Teachers selected the Reading & Thinking module from Unit 3: “Diverse Cultures” in the Chinese General Senior Secondary English Compulsory Textbook 3 (2019 edition), which features a travel journal by a Chinese student describing her visit to San Francisco. The team designed a task focusing on a reading strategy aligned with curriculum standards (see Appendix A for task plan), establishing the learning target as “developing a reading strategy to classify and organize information for understanding a travel journal” and defining corresponding success criteria. The teachers then detailed each component of the FA practice with their students’ needs in mind. For evidence elicitation, they planned to gather information on student learning primarily through observation of task performance and facilitated discussions. The group anticipated three potential levels of student performance and developed differentiated feedback and follow-up actions accordingly. Feedback strategies were designed to be responsive to achievement levels, ranging from praise and affirmation to targeted reminders and suggested answers. Corresponding follow-up actions included specific consolidation activities and additional task assignments to address learning gaps and extend understanding. Week 5: Implementation and Recording of FA task During this week, all five participating teachers implemented the planned CBFA task in their individual classes and video-recorded the achievement of targets. Five teachers from two schools (with School 1 generally outperforms School 2) teaching Level 1 senior secondary classes (equivalent to Grade 10) in the second semester of the 2020–2021 school year were invited to participate in this study. The participating teachers taught parallel classes and represented diverse backgrounds in terms of age (ranging from their 20s–50s), teaching experience (5–20 years), and academic qualification (four held bachelor's degrees and one a master's degree). It is important to note that none had previously received formal training in language assessment, either during their university studies or throughout their teaching careers. Despite their lack of assessment-specific preparation, all demonstrated strong enthusiasm and motivation to develop their classroom assessment literacy in FA through this CPD program. While the usefulness of the CPD on the participating teacher's FA knowledge, beliefs, and practices and the effect of the CPD on the improvement of the students' SRL abilities have been shown in other reports (Li & Gu, 2023, Li & Gu, 2024), this paper zooms in on the impact of the CPD in FA on students' English language achievement. 2.1. Research process This study was conducted in two stages: a baseline phase and a CPD phase. The baseline phase aimed to capture the initial state of teacher FA literacy, including teachers’ assessment knowledge, beliefs, and classroom FA practices, among five senior secondary EFL teachers. This phase preceded their participation in the CPD program during the subsequent CPD phase. From the baseline phase, it was found that the teachers had little knowledge of FA. Their lack of assessment knowledge was influenced by the prevalent exam-oriented culture in which they were educated, coupled with the absence of educational background and teacher training related to language assessment. Most teachers uncritically endorsed FA as a trending concept without substantive understanding. While many classroom assessment events included steps such as elicitation, interpretation, and feedback, these practices were not formative due to the absence of explicit learning targets and actionable steps to address learning gaps. Only a few assessment events incorporated the four steps of elicitation, interpretation, feedback, and follow-up actions. However, even these efforts were directionless, as teachers failed to align assessments with clear objectives, leaving students uncertain about learning expectations. Additionally, the few follow-up actions were haphazard and disconnected from the initial assessment focus. These findings underscore the need for structured teacher CPD to implement target-oriented FA practices in the classroom. Following the baseline phase, we designed and implemented the 12week CPD program to develop FA literacy among the same cohort of five EFL teachers. The CPD program was structured using Timperley et al.’s (2007) CPD model, which combined the features of communities of practice and collaborative action research. In this phase, the five teachers and the first author formed a community of learning and practice to engage in the CPD program, which was organized into four stages: 1) Preparation, 2) Round 1 of action research, 3) Round 2 of action research, and 4) Program evaluation. Stage 1: Preparation (Weeks 1–3) The preparation stage of the CPD program spanned three weeks and primarily involved reading, sharing, and discussing materials related to FA and targets for teaching, learning, and assessment. This stage ended with goal-setting for the CPD program. Weeks 1–2: Reading During the first two weeks, teachers engaged in reading selected materials tailored for Chinese secondary EFL teachers. These readings introduced principles and practical components of FA. A WeChat group was created to facilitate discussions and resource sharing, supplemented by a Chinese-language guide with reflective questions to aid comprehension. Teachers completed the first two readings in week one and the remaining two in week two, with 3 h of group discussions and 4 h of 5 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 lessons. These video recordings were subsequently shared within the professional development group to serve as primary resources for both self-reflection and collaborative group reflection sessions over the following two weeks. Week 6: Self-reflection for round 1 action During this week, the five teachers conducted self-reflections based on their video-recorded lessons, concentrating specifically on the implementation of the planned FA task. A set of guiding questions were designed to support their reflection process: (1) How effectively was the learning target communicated to students? (2) Was sufficient evidence of student learning elicited to support informed judgments? (3) Was the collected interpretation of evidence aligned with the established success criteria? (4) What types of feedback were provided to students? (5) To what extent were students supported in bridging learning gaps and achieving the learning target? (6) How did the actual implementation of the FA cycle align with or deviate from the planned approach? (7) What strengths and areas for improvement were evident in your practice? Teachers were encouraged to use these questions to reflect deeply on their FA practices and to document their insights in reflective journals. Week 7: Group reflection and evaluation for round 1 action This week, the teacher professional development group held an online session to reflect on and evaluate the first round of action research. The meeting was structured around the following guiding questions: (1) How effectively was each component of the planned FA cycle implemented? (2) Which aspects of the first action research round were successful and should be continued in the next round? (3) What challenges or difficulties arose during implementation? (4) What specific improvements can be made before the second round of action research? The teachers began by collectively reviewing and summarizing their experiences implementing each part of the FA cycle. They then shared successes and areas for improvement in their FA practices. Participants also highlighted effective strategies they observed in each other's videorecorded lessons, noting how peer examples provided valuable insights. The group openly discussed challenges encountered during implementation. Issues affecting only one or two teachers were addressed with targeted suggestions, while common difficulties were analyzed together to identify root causes. These common challenges were designated as focal points for the next round of task planning. In conclusion, the teachers collaboratively identified several areas for refining their FA practices in the second action research cycle. Additionally, they decided to revisit selected readings to further support their professional development. Stage 3: Round 2 of action research (Weeks 8–11) In this phase, teachers built upon reflections from the first action research cycle to design and implement a new CBFA task. As in the previous stage, each teacher delivered and video-recorded one lesson for subsequent analysis. Week 8: Joint planning for the FA task The professional development group held a 3-h collaborative session to design another CBFA task, focused on a curriculum-aligned reading strategy. The group used the “Reading & Thinking” module from Unit 4: “Space: The Final Frontier” (Book 3) as the basis for the task, which centered on summarizing main ideas (see Appendix B). Prior to planning, the group reaffirmed the importance of clear learning targets and the dual informing and forming functions of FA. They established the learning target as: “Developing a strategy for summarizing the main idea of each paragraph to enhance comprehension of the text on space exploration.” Success criteria were also defined, emphasizing the accurate identification of topic sentences or key information, and the concise summarization of main ideas. For evidence elicitation, teachers continued to use observation and questioning—methods they found effective in gathering evidence of learning—while also deciding to take notes during observations and engage in one-on-one student interactions. This round incorporated selfassessment and peer assessment alongside teacher assessment, encouraging collaborative interpretation and use of assessment evidence among teachers and students. The professional develop group anticipated three levels of student performance and designed differentiated feedback and follow-up actions: (1) Students meeting learning objectives received affirmative and encouraging feedback, followed by advancement to subsequent tasks. (2) Those struggling to summarize in their own words received targeted feedback and opportunities to address learning gaps. (3) If tasks exceeded student capabilities, teachers planned to adjust difficulty, provide scaffolding, re-teach, and reassess. A comprehensive set of feedback types and follow-up actions was developed, tailored to diverse classroom contexts. Outside the group session, several teachers also designed full-lesson FA tasks independently. Week 9: Implementation and Recording of the FA task This week, all five teachers implemented the planned CBFA task in their respective classes and recorded the lessons. All teachers received their video recordings for use in the self- and group-reflection sessions scheduled over the following two weeks. Week 10: Self-reflection for round 2 action The teachers conducted self-reflections based on their videorecorded lessons from the second action research round, using the same seven guiding questions from the first cycle to evaluate their implementation of the FA task. To support peer learning, teachers shared their videos in the WeChat group, allowing members to observe and analyze each other's assessment practices. Week 11: Group reflection and evaluation for Round 2 action The professional development group held a 3-h online session to reflect on and evaluate the second round of action research. The discussion followed the same guiding questions used in the first round. In addition to the formal meeting, teachers continued exchanging feedback via the WeChat group. Some also initiated one-on-one discussions with the first author, while teachers at the same school held smaller face-toface meetings to further explore insights and experiences. Stage 4: Program evaluation (Week 12) This one-week stage was dedicated to the comprehensive evaluation of the entire CPD program. All five teachers participated in guided reflections on their professional growth and the implementation of FA practices. The session encouraged teachers to openly discuss their experiences on developing assessment literacy and FA practices. Subsequently, the evaluation of the CPD program followed a structured approach, focusing on questions such as “What worked? What did not work? What changed? What did not change?” Furthermore, the group collectively summarized any implementation challenges they had encountered. Finally, the teachers provided suggestions for future CPD initiatives. Throughout the CPD, the teachers systematically integrated four steps into their FA practices. 1) Elicitation: involving teachers, students, and their peers as assessors; shifting focus from classroom functions (e.g., classroom management, requests or directions, and learning check) to the curriculum targets of developing subject core competencies; and using a combination of elicitation techniques (such as targeted questioning, structured classroom tasks, and interactive activities) to elicit meaningful learning evidence. 2) Interpretation: making on-the-spot judgments on students' subject core competencies; involving students in self-/peer assessment; and using explicit success criteria. 3) Feedback: providing task-specific feedback; engaging students in providing descriptive and actionable peer feedback; differentiating feedback to suit individual student performance and learning needs; and providing customized feedback, taking into account the emotional impact of assessment. 4) Follow-up action: offering students opportunities for improvement through scaffolded support and tailored tasks; and involving students in the process of acting on the feedback and closing their learning gaps. 6 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 As the program progressed, teachers progressively emphasized alignment of all FA steps with curriculum targets. Learning objectives were explicitly tied to the four subject core competencies, fostering not only language skills but also students’ SRL abilities. This alignment ensured that every step of FA practice cohesively supported the agreedupon objectives. employed to examine potential differences in exam performance between the CPD and non-CPD groups within each school. In MANCOVA, statistical differences are evaluated across multiple continuous dependent variables based on an independent grouping variable, while controlling for the influence of the covariate (Pituch & Stevens, 2015). This analysis considered three dependent variables: post-test 1 (exam after Round 1 of action research), post-test 2 (exam after Round 2 of action research), and post-test 3 (post-CPD). By employing the pre-test (baseline exam) as the covariate, MANCOVA mitigated the potential influence of students’ baseline language proficiency, thus revealing the true impact of the program on the three exams by partialing out pre-existing differences. Prior to conducting the MANCOVA, we verified compliance with key statistical assumptions, including multivariate normality, absence of outliers, homogeneity of variance, linearity, and homogeneity of regression slopes (Pituch & Stevens, 2015). Once these criteria were satisfied, we proceeded with the MANCOVA to evaluate whether the FA-focused CPD program significantly influenced student learning outcomes, statistically adjusting for baseline proficiency (pre-test scores). First, multivariate tests were employed to assess overall differences in mean scores between the CPD and non-CPD groups across all three post-tests after controlling for pre-test performance. Second, tests of between-subjects effects were performed to evaluate group differences for each post-test individually to identify specific junctures where the CPD program impacted outcomes, while controlling for pre-test scores. A p-value was set at .05 for the statistical significance of differences. Partial eta squared (partial η2) were examined to gauge effect sizes. Cohen's (1988) guidelines were used to interpret effect sizes: partial η2 smaller than .0099 indicated small effect sizes, while partial η2 between .0099 and .0588 suggested medium effect sizes, and partial η2 larger than .1379 indicated large effect sizes (Cohen, 1988). 2.2. Data collection and data analysis 2.2.1. Data collection The study collected test results from a total of 10 classes of 509 students, which included the five classes of 238 students (CPD group) taught by the five teachers who participated in the CPD program and five parallel classes of 271 students (non-CPD group) taught by teachers who were not part of the CPD program. The key difference between the groups was participation in the FA-focused CPD program described in this study. The number of students of the CPD and non-CPD groups in each school is shown in Table 1. To maintain ecological validity, the study utilized the schools' existing examination structure rather than introducing external tests. Data were drawn from four regular semester exams, timed to align with the four stages of the CPD program. The examinations in both schools were developed based on the shared curriculum and textbook content. They were designed by experienced teachers and district-level teacherresearchers, following the format of the university entrance exam that includes listening, reading, language use, and writing sections. This approach captured students’ language proficiency at key intervals while minimizing disruptions to standard classroom routines to ensure that findings reflected authentic learning contexts. The first monthly exam, taken during the baseline phase, served as a pre-test to establish students' initial English proficiency levels. The subsequent three exams—mid-term exam, second monthly exam, and final exam—served as three post-tests, or indicators of students’ English learning achievement at each of the three ensuing stages, end of round 1 action research, end of round 2, and about 4 weeks afterwards at the conclusion of the whole CPD program. To assess the impact of teacher CPD on student achievement, statistical comparisons were conducted between the CPD and the non-CPD groups of students. These analyses aimed to determine whether the FAfocused CPD program translated into measurable differences in learning outcomes. Due to differences in assessment instruments between the two schools, separate analyses were performed for each school using SPSS 29. This approach allowed results from one school to serve as an external validation check for the other in order to strengthen the robustness of findings. 3. Results 3.1. School 1 In School 1, the CPD group comprised two classes (N = 103 students) taught by the two teachers participating in the CPD program, while the non-CPD group included two parallel classes (N = 107 students) led by teachers who had not gone through the CPD program. Descriptive statistics for the exam results are shown first, followed by the statistical results of the MANCOVA analysis. 3.1.1. Descriptive statistics Table 2 presents the descriptive statistics comparing the performance of the CPD and non-CPD groups in School 1, with each exam scored out of a maximum of 150 points. Fig. 2 further visualizes the mean score trajectories of both student groups across the four exams. In the baseline phase (pre-test), the CPD group scored a mean of 96.1, 5.1 points lower than the non-CPD group's mean of 101.2. This reflected initial disparities in English proficiency between the two groups of students. As the CPD program unfolded, however, participating teachers implemented FA practices in their classes and developed their FA 2.2.2. Data analysis Although both schools utilized examinations developed from the same curriculum framework and following comparable formats, the specific instruments differed between schools. Our analysis therefore focused on measuring within-school progress across the four time points rather than making direct between-school comparisons. First, descriptive statistics were calculated to identify trends by comparing the CPD and non-CPD groups within each school. Key metrics included sample sizes, mean scores, and standard deviations for all tests, categorized by group. Next, one-way Multivariate Analysis of Covariance (MANCOVA) was Table 2 Descriptive statistics for the tests by group in school 1. Pre-Test Table 1 CPD and Non-CPD groups of students. CPD group School 1 School 2 Post-Test 1 Non-CPD group No. of classes No. of students No. of classes No. of students 2 3 108 142 2 3 111 169 Post-Test 2 Post-Test 3 7 Group N Mean Std. Deviation CPD Non-CPD CPD Non-CPD CPD Non-CPD CPD Non-CPD 103 107 103 107 103 107 103 107 96.1456 101.2243 96.0874 95.9439 106.8932 106.1776 115.8641 113.8178 12.19222 14.03957 13.34156 16.71774 11.49545 15.37476 9.77722 13.98810 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 Fig. 2. Mean scores of two groups of students in School 1. .902, F (3, 205) = 7.4, p < .001, partial η2 = .098. The partial eta squared value indicated a medium effect size (Cohen, 1988). It can be concluded that the CPD program effectively contributed to enhancing the students’ English language achievement in School 1. knowledge and beliefs; the CPD group's progress in learning was reflected in their exam results. Following Round 1 of action research (posttest 1), the CPD group's mean score exceeded that of the non-CPD group. This trend continued after Round 2 (post-test 2), with the CPD group maintaining a higher mean score than the non-CPD group. By the program's conclusion (post-test 3), the CPD group achieved a mean score of 115.9, surpassing the non-CPD group's 113.8 by 2.1 points. These results suggest that the CPD program positively influenced students' English language achievement in School 1. Table 4 Estimated marginal means in school 1. 3.1.2. MANCOVA To determine whether statistically significant differences in exam performance existed between the CPD and non-CPD groups in School 1, a one-way MANCOVA was conducted. Key assumptions were met. The analysis included three dependent variables: post-test 1 (after Round 1 of action research), post-test 2 (after Round 2), and post-test 3 (post-CPD program), with pre-test scores as the covariate. The multivariate tests results are shown in Table 3. As highlighted in the table, multivariate tests revealed statistically significant differences between the CPD and non-CPD groups across the combined post-tests after adjusting for baseline proficiency, Wilks' Λ = Dependent Variable Group Post-Test 1 CPD NonCPD CPD NonCPD CPD NonCPD Post-Test 2 Post-Test 3 Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound 98.135a 93.973a 1.098 1.077 95.971 91.850 100.299 96.095 108.835a 104.308a .934 .916 106.994 102.503 110.675 106.114 117.539a 112.206a .858 .841 115.848 110.547 119.230 113.864 a Covariates appearing in the model are evaluated at the following values: PreTest = 98.7333. Table 3 Multivariate Tests in School 1. 8 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 Table 4 shows the estimated marginal means for the CPD and nonCPD groups in School 1, adjusted for pre-test scores. It is evident that the CPD group consistently outperformed the non-CPD group across all post-tests during the program, and the performance gap widened by the conclusion of the CPD program. Results from the Tests of Between-Subjects Effects (Table 5) present the sustained positive impact of the CPD program on each post-test individually, controlling for baseline proficiency. As highlighted in the table, statistically significant differences were found between the two groups in each of the three post-tests at different stages of the CPD program, controlling for the covariate of pre-test. In post-test 1 after Round 1 of action research, the CPD group scored significantly higher than the non-CPD group (p = .008, partial η2 = .034), with a small effect size. In post-test 2 after Round 2 of action research, a highly significant difference emerged (p < .001, partial η2 = .054), with a small effect size. In post-test 3 after the program's conclusion, the difference remained highly significant (p < .001, partial η2 = .085), a medium effect size. These findings reflected the CPD group's progressive improvement in English achievement compared to their non-CPD peers, with effect sizes growing over time. Collectively, these findings provide strong evidence that the CPD program effectively enhanced students’ English language achievement in School 1, highlighting the positive influence of FA practices implemented by the participating teachers. 135), while the non-CPD group included three parallel classes (N = 164). The findings present descriptive statistics for exam results, followed by the outcomes of the MANCOVA. 3.2.1. Descriptive statistics Table 6 summarizes the comparative performance of the CPD and non-CPD groups in School 2, detailing the number of students, mean scores, and standard deviations for each exam by group. Fig. 3 illustrates the mean scores of the CPD and non-CPD groups across the four exams. The CPD group scored lower than the non-CPD group in the pre-test and post-test 1, but they outperformed the nonCPD group in the post-test 2 and post-test 3. In the pre-test, the mean score of the CPD group (111.3) was lower than that of the non-CPD group (113.2), showing the initial differences between the two groups. In Round 1 of action research, some teachers encountered practical Table 6 Descriptive statistics for the tests by group in School 2. Pre-Test Post-Test 1 Post-Test 2 3.2. School 2 Post-Test 3 In School 2, the CPD group of students comprised three classes (N = Table 5 Tests of Between-Subjects Effects in School 1. 9 Group N Mean Std. Deviation CPD Non-CPD CPD Non-CPD CPD Non-CPD CPD Non-CPD 135 164 135 164 135 164 135 164 111.2852 113.2439 103.9222 105.3994 106.4185 98.4299 109.5296 102.8506 12.56558 12.45489 17.25153 13.79541 14.85166 13.77903 13.79511 16.23846 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 Fig. 3. Mean scores of two groups of students in School 2. The partial eta squared value (η2 = .130) indicated a medium effect size, approaching Cohen's (1988) threshold for a large effect size (η2 > .1379). This showed the practical significance of the research outcome. These results confirm that the CPD program meaningfully enhanced students' English language achievement in School 2. Table 8 shows the estimated marginal means for the CPD and nonCPD groups in School 2, adjusted for baseline proficiency. The implementation challenges while attempting to apply FA principles, such as departing from established teaching routines and adjusting to students' emotional responses to reduced praise and increased corrective feedback (see Li & Gu, 2023). During this period, the CPD group had not yet made significant progress in integrating these new practices effectively. Their mean score (103.9) was slightly lower compared to the non-CPD group (105.4) in post-test 1. However, in Round 2 of action research, where the teachers successfully implemented FA practices, the CPD group's mean score (106.4) surpassed that of the non-CPD group (98.4) by 8 points, showing a substantial improvement in their English language achievement. The positive trend continued in post-test 3, with the CPD group consistently maintaining a higher mean score (109.5) than the non-CPD group (102.9) after the conclusion of the CPD program. Table 8 Estimated marginal means in School 2. 3.2.2. MANCOVA In order to explore whether statistically significant differences in exam performance existed between the CPD and non-CPD groups in School 2, a one-way MANCOVA was conducted, adjusting for baseline language proficiency. Statistical assumptions were verified before the analysis. As highlighted in Table 7, the results of multivariate tests results revealed highly significant differences between the groups across the combined post-tests, after adjusting for the covariate of pre-test, Wilks’ Λ = .870, F (3, 294) = 14.682, p < .001, partial η2 = .130. Dependent Variable Group Post-Test 1 CPD NonCPD CPD NonCPD CPD NonCPD Post-Test 2 Post-Test 3 Mean Std. Error 95% Confidence Interval Lower Bound Upper Bound 104.001a 105.435a 1.338 1.212 101.369 103.050 106.634 107.821 106.400a 98.077a 1.195 1.083 104.047 95.945 108.752 100.209 109.510a 102.730a 1.312 1.189 106.929 100.391 112.091 105.069 a Covariates appearing in the model are evaluated at the following values: PreTest = 112.3595. Table 7 Multivariate Tests in School 2. 10 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 adjusted means reveal that the CPD group outperformed the non-CPD group in post-test 2 and post-test 3. The results in Tests of Between-Subjects Effects (Table 9) show the effect of the CPD program on each post-test individually. As highlighted in the table, there was no statistically significant difference between the two groups in post-test 1 (p = .421). However, highly significant differences were found in post-test 2 (p < .001, partial η2 = .082) and posttest 3 (p < .001, partial η2 = .048). The partial eta squared values indicated a medium effect size in post-test 2 and a small effect size in post-test 3. The results from School 2 showed that initial challenges in teachers' implementation of FA practices corresponded with minimal gains in students’ English achievement. However, as teachers developed their assessment literacy and effectively implemented FA practices, a statistically significant improvement in student outcomes was observed. Collectively, findings from Schools 1 and 2 suggest that the FAfocused teacher CPD positively impacted students’ English proficiency. Despite lower initial baseline scores, CPD-group students exhibited substantial progress, ultimately outperforming their non-CPD peers. After statistically controlling for pre-test scores, post-test results revealed significant differences between groups in both schools, with medium effect sizes. The larger effect size observed in School 2, compared to School 1, suggests that the impact of the CPD intervention may be mediated by specific contextual or implementation factors. Our related research into teacher trajectories within this program indicates that teachers' engagement with and growth in FA practices can vary significantly based on individual factors such as career stage, pre-existing beliefs and practices, and personal agency. For instance, early-career teachers may demonstrate rapid adoption of new practices, while mid-career teachers might undergo a more complex process of ‘de-learning’ prior misconceptions. Meanwhile, late-career teachers, committed to lifelong learning, may integrate new knowledge at a different pace, potentially leading to substantive but later-blooming gains. These divergent professional learning trajectories, fostered within the same collaborative community of practice, likely lead to differences in the pace, pattern, and depth of instructional change. Consequently, this variation in teacher development may translate into differential effects on student learning outcomes across contexts, which could help explain the disparity in effect sizes observed between the two schools in this study. A detailed analysis of these individual and contextual dynamics is the focus of a separate manuscript. 4. Discussion This study has shown the effectiveness of FA in boosting student learning outcomes through a teacher CPD program, an approach recognized as one of the commonly implemented and effective FA intervention designs (Kingston & Nash, 2011). The results are in line with empirical findings such as Black et al. (2003), Wiliam et al. (2004), and Andersson and Palm (2017). These findings indicate that the effect of FA on student achievement may fall between the overly optimistic viewpoint of Wiliam (2007), who suggested that FA can “double the speed of student learning” (p. 37), and the pessimistic perspective of Kingston and Nash (2011), who claimed that FA only has a small impact on student learning. Moreover, these Table 9 Tests of Between-Subjects Effects in School 2. Dependent Type III Sum Source Variable of Squares df Mean Square F Sig. Squared Corrected Post-Test 1 167.598 2 83.799 .350 .705 .002 Model Post-Test 2 6755.313 2 3377.657 17.098 <.001 .104 Post-Test 3 3504.441 2 1752.221 7.596 <.001 .049 Post-Test 1 38900.521 1 38900.521 162.416 <.001 .354 Post-Test 2 22689.262 1 22689.262 114.854 <.001 .280 Post-Test 3 35528.519 1 35528.519 154.018 <.001 .342 Post-Test 1 6.026 1 6.026 .025 .874 .000 Post-Test 2 2029.765 1 2029.765 10.275 .001 .034 Post-Test 3 201.264 1 201.264 .872 .351 .003 Post-Test 1 155.775 1 155.775 .650 .421 .002 Post-Test 2 5190.657 1 5190.657 26.275 <.001 .082 Post-Test 3 3411.083 1 3411.083 14.787 <.001 .048 Post-Test 1 70895.497 296 239.512 Post-Test 2 58474.282 296 197.548 Post-Test 3 68280.708 296 230.678 Post-Test 1 3350759.500 299 Post-Test 2 3178270.000 299 Post-Test 3 3422874.500 299 Corrected Post-Test 1 71063.095 298 Total Post-Test 2 65229.595 298 Post-Test 3 71785.149 298 Intercept Pre-Test Group Error Total Partial Eta 11 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 frequency but rather a matter of necessity and intentionality. This understanding empowered the teachers to implement FA effectively, which must be a major reason that explains the improved student learning outcomes. This understanding also underscores why FA must be viewed as a holistic and integrated process rather than a collection of isolated techniques. When FA is reduced to a single component, such as feedback, questioning, or self-assessment, its formative potential is often lost. The essence of FA lies not in the presence of any one practice, but in the intentional linkage between eliciting evidence of learning, interpreting it in relation to a clear learning target, and enacting responsive instructional moves. Without this coherence, even well-executed feedback may serve only an evaluative or motivational function rather than a truly formative one. Empirical studies that isolate components or strategies like feedback or peer assessment, while valuable for understanding specific mechanisms, risk misrepresenting FA's complexity and thereby generating inconsistent or inconclusive evidence about its overall effectiveness. Indeed, the mixed findings in the literature, e.g., strong peer feedback effects in higher education (Huisman et al., 2019) versus null effects in K–12 formative assessment (Boström & Palm, 2023), can be partly attributed to this fragmentation. When studies treat FA as a discrete act rather than a dynamic, context-sensitive cycle, they may capture only partial or superficial implementations that lack the necessary alignment among goals, evidence, and action. As Lee et al. (2020) and Maskos et al. (2025) observe, effectiveness hinges on how components interact within a coherent practice, not on their mere presence. Thus, viewing FA holistically centered on purposeful responsiveness to student learning helps clarify why some studies report significant gains while others do not, and reinforces the need for research designs that capture FA as an integrated, iterative process rather than a checklist of strategies. results from the two participating schools independently show that enhancing teacher assessment literacy in FA can effectively improve their students’ learning outcomes even in a context dominated by a prevailing examination culture and the eminent high stakes of the university entrance examination (Gu, 2014). In multiple reviews on FA, the literature consistently points to a scarcity of high-quality and comparable studies from which generalizable conclusions can be drawn (Bennett, 2011; Kingston & Nash, 2011). Therefore, the findings of the present study add to this much-needed empirical evidence regarding the impact of FA on student achievement and the effectiveness of related interventions. That being said, we do concur with Wiliam (2019) that “In educational research, ‘what works’ is usually the wrong question because almost anything works somewhere, and nothing works everywhere. A better question is, ‘under what circumstances does this work’ (p. 137). In the following section, we highlight two issues for discussion: 1) the importance for teachers to have a clear operational understanding of FA during implementation, and 2) what we believe worked in this study, and what should go into an effective CPD program for teacher assessment literacy in FA. 4.1. What should FA look like inside the classroom? This is arguably the question most teachers would ask when they consider implementing FA, and it is a critical issue researchers must address before identifying FA during classroom observations. Over the years, several frameworks have been proposed to define and operationalize FA in practice. Wiliam and Thompson's (2008) five strategies represent a widely adopted model for classroom implementation, influencing numerous studies and practical applications (Andersson et al., 2017; Boström and Palm, 2023; Oo et al., 2023). Similarly, Ruiz-Primo and Furtak (2006, 2007) introduced the concept of “assessment conversations” through their ESRU framework (teacher Elicits, Student responds, teacher Recognizes, teacher Uses), emphasizing dialogue as a central mechanism for eliciting and interpreting student understanding. More recently, Gulikers et al. (2021) and Veugen et al. (2021) proposed a five-step model that outlines a structured yet flexible approach to FA. In our study, we drew upon Gu's (2021) framework, which closely aligns with the five steps outlined by Gulikers et al. (2021), but we emphasized the centrality of the learning target as the focal point around which all other components revolve in a spiraling cycle. This conceptualization helped clarify FA for participating teachers at the outset of the CPD program. However, as teachers began planning their FA activities in class, several important questions emerged: Does the appearance of one component represent FA? Do all components have to be present before a classroom assessment practice can be considered FA? Is FA always necessary? Are more FA events always preferable than fewer ones? Through collaborative engagement during the CPD program, particularly in the preparation stage, we arrived at nuanced answers to these questions. We agreed that any single component, e.g., feedback, no matter how significant, is not sufficient on its own to indicate the presence of FA. For an assessment to be truly formative, it must elicit information about student learning or understanding and interpret this against the learning target, even if done implicitly. While explicit feedback may or may not occur immediately, what distinguishes FA is the follow-up action taken to close the identified learning gap. Classroom assessments frequently occur, both formally and informally, but many end when teachers determine that students have met the intended target or that the learning gap does not require further intervention. Teachers constantly make spontaneous judgments about the level of feedback needed and whether follow-up action is warranted. This means that if defining features of FA must include both informing and forming functions, many classroom assessments that stop before or after feedback without follow-up action will not be counted as FA; and that completing a cycle of FA may not be needed in most of these circumstances. Therefore, classroom-based FA is not merely a matter of 4.2. What goes into an effective teacher CPD program for FA? The research reported in this paper is part of a larger study that looked at how a 12-week CPD program could help improve teachers' assessment literacy in making use of FA. Elsewhere, we have reported encouraging evidence for improved teacher assessment literacy (Li & Gu, 2023) and evidence suggesting improvement in the students’ SRL (Li & Gu, 2024). This report has shown evidence of significantly enhanced student learning outcomes. Taken together, the conclusion we draw from these findings is that a carefully designed teacher CPD program not only enhances teacher assessment literacy in FA, but also helps improve learning outcomes, and leads to enhanced student SRL ability. In addition to the clear operational definition of FA discussed above that provided a shared and consistent understanding, we believe a few other key factors such as contextual support, stakeholder agency, and implementation strategies, all contributed to the success of this study. Central to the effectiveness of the CPD program was the close collaboration between researchers and teachers, which fostered a strong community of learning and practice. This partnership was supported by open and unconditional backing from the city's Education Bureau Teaching and Research Commissioner and the school management, who facilitated onsite teacher development and granted access to their classes and student test scores. Also integral to the process were structured opportunities for teachers to plan and reflect as a group before and after each phase of the intervention, which allowed for continuous improvement and deeper engagement with FA practices. The enthusiasm and dedication of the participating teachers, alongside active student collaboration, further enriched the implementation. The implementation strategy emphasized goal clarification and setting, maintaining a sharp focus on teaching, learning, and assessment priorities. Another strategy that worked was the focus on one planned FA activity for the target lesson, and video recording the lesson to allow for later reflection and discussion. In addition, particular attention was given to feedback 12 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 reflection, and opportunities for improvement to enhance teacher assessment literacy and student outcomes. Drawing upon the encouraging findings from this study, we outline the following pathway for an effective teacher CPD program in assessment literacy for FA (Fig. 4). We contend that an effective teacher CPD program should first meet the eight core features, which have the potential to enhance teacher assessment literacy and student learning. The process our teachers underwent aligns powerfully with the IMTP framework (Sims et al., 2025), which explains its effectiveness. Initially, teachers built Insight (I) into FA through reading and learning from credible sources. The collaborative, sustained nature of the program simultaneously built Motivation (M) to change. The critical transformation occurred during the action research cycles, which operationalized the framework's final stages. Here, teachers' declarative knowledge was proceduralized, moving from insight to developing practical Technique (T). This was achieved through mechanisms like practical social support and the rehearsal of new strategies. The repeated cycles of planning, implementation, and reflection provided the context-specific repetition and feedback necessary to embed these techniques into sustained Practice (P), making them a habitual part of their repertoire. This structured journey from knowledge to embedded practice explains the tangible outcomes. The noticeable and persistent student learning progress observed during and after the program is a direct result of teachers not merely learning about FA, but having their development supported through the complete IMTP sequence, leading to profound and lasting changes in their classroom practice. processes, which played a central role in guiding learning. Ultimately, these combined efforts helped foster reflective practice among teachers and activate self- and co-regulated learning among students, reinforcing the transformative potential of FA when supported by thoughtful design, collaborative inquiry, and institutional support. Our findings align with prior research identifying key implementation factors of FA that support student learning (Heitink et al., 2016), though the influence of each factor may vary across contexts. For instance, while teacher agency emerged as a critical determinant of success in Andersson and Palm's (2018) study, teacher motivation and enthusiasm in our context were largely taken as given—our participants had already been proactively engaged in teacher-led development groups organized by the city's Education Bureau Teaching and Research Commissioner. The CPD program reflected key principles of effective professional development (Darling-Hammond et al., 2017). First, the CPD was strongly content-focused, aligning with the secondary school EFL curriculum, national English standards, and institutional priorities around FA. This helped ensure relevance and coherence for teachers. Second, active learning was central to the design, moving beyond passive lectures to engage teachers in contextualized, practice-based inquiry through two cycles of action research. This made it possible for teachers to engage in the process and transfer their learning to regular teaching. Third, collaboration was embedded throughout the program, which took the form of collaborative action research (Burns, 2010). The five participating teachers and the first author formed a professional learning group and engaged in shared planning, discussion, and reflection. Fourth, the CPD spanned 12 weeks, with over 24 h dedicated to group sessions and an additional 15 h of individualized support. This sustained duration well exceeded the recommended 20-h threshold (Desimone, 2009) for meaningful engagement and change. Fifth, a clear model of FA practice (Gu, 2021) guided classroom implementation, enabling teachers to set learning targets, collect and interpret evidence, and provide actionable feedback. Sixth, expert support from the researcher, drawing on her academic and practical experience, provided tailored guidance. Finally, feedback and reflection were embedded through self-reflection journals and peer discussions. This promoted teacher growth and co-regulated learning (Kubanyiova, 2012; Sansom, 2017). In addition to the seven core features discussed above, another crucial feature found in this study was the provision of opportunities for teacher engagement with feedback and reflection. Feedback and reflection informed the teachers as to what needed to be done for their professional development. The formative nature of the CPD program allowed the teachers to act on the feedback and reflection to improve their assessment practice inside the classroom and progress towards assessment literacy. Overall, the CPD combined content focus, active learning, collaboration, sustained duration, expert support, continuous 4.3. Research limitations We are keenly aware of a few research limitations which may lead to avenues for future research. First, data were collected from two schools within a third-tier city in China. This sampling approach may only capture a small range of teaching contexts, but it may not represent the diversity of teaching contexts. To address this limitation, it is recommended to conduct analysis across various regions and countries to gain a more holistic understanding of the effect of FA on student achievement in similar contexts. Second, the decision to use the schools’ existing exams was made to minimize instructional disruption and preserve ecological validity. However, as these curriculum-aligned tests differed in their specific structure (e.g., item types, weightings, and length) between the two schools, direct comparison of raw student scores across schools is not feasible. This limitation inevitably constrains the external validity and cross-site comparability of the study, which is why our primary analysis and interpretation focus on performance changes within each school Fig. 4. Pathway for teacher CPD of assessment literacy for FA 13 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 context. Wherever possible, future studies should attempt to use equivalent measures to allow for comparability of learning progress. A further limitation stems from the voluntary basis of teacher participation in the CPD intervention, which may introduce selfselection bias. Although baseline student achievement was statistically controlled, teachers who volunteer could differ in unmeasured characteristics—such as motivation or openness to innovation—that might influence outcomes. Future research would benefit from random assignment or the pre-measurement of such teacher attributes to better isolate the specific effects of professional development. Furthermore, another limitation concerns the interpretation of the observed student performance gains. While the assessments were curriculum-aligned, the study design cannot definitively isolate the precise mechanism. The effects could be attributed not only to a direct alignment between specific FA techniques and test items but also to broader positive outcomes of the CPD, such as enhanced student engagement, motivation, or classroom dynamics. However, insights from our parallel research (Li & Gu, 2024) suggest that FA implementation fosters SRL processes like goal-setting and progress monitoring, which are conducive to academic improvement. Thus, while we acknowledge the complexity of disentangling these interrelated factors, evidence points to FA's role in promoting SRL as a plausible primary pathway for the gains observed. Future studies incorporating direct classroom observations and detailed implementation analytics would help further elucidate these causal mechanisms. Last, the long-term sustainability of enhanced student English achievement and effect sizes remain unclear in this study. As noted by Wiliam et al. (2004), the actual sizes of the effects may be underestimated since teachers tend to change their assessment practices gradually before FA becomes an integral part of their teaching. In this study, data collection regarding students' test results concluded approximately one month after the CPD program ended, at the conclusion of the semester. Subsequent class reassignments and teacher rotations hindered longitudinal tracking. While we maintained regular contact with the teachers post-CPD and observed their continued implementation, reflection, and improvement of FA practices, tracking students’ learning achievements for a more extended period following CPD would be recommended to investigate the sustainability of enhanced achievement and effect sizes. supported to explore, plan, and reflect on their own practice. Over time, they became more confident and skilled in using FA to support student learning. A key finding was how the teachers’ assessment literacy developed throughout the process. Starting with confusion and uncertainty, they gradually built a clearer understanding of what FA is, and what it is not. They learned to recognize FA moments in their teaching, to give more effective feedback, and to make decisions based on student learning needs. This journey of growth in assessment literacy helped them become more reflective and responsive educators. This study makes several distinct contributions to the field of teacher professional development in FA. First, it moves beyond examining individual FA techniques in isolation, demonstrating instead the holistic impact of FA as an integrated pedagogical approach. The findings suggest that consequential validity evidence for FA requires a coherent classroom culture where assessment, teaching, and learning function as an interconnected system. While individual elements such as feedback may enhance student learning, they should not be mistaken for evidence of FA itself. Second, by implementing a structured CPD program designed to enhance teacher assessment literacy in FA and by measuring its effects, this study provides empirical evidence linking such professional development to improved student outcomes. This offers valuable support for the potential role of assessment literacy within the causal chain from FA-focused CPD to student learning, a relationship requiring further causal investigation. Finally, it offers a crucial domain-specific focus by validating the usefulness of FA through a CPD program specifically within the context of EFL learning. This addresses a significant gap, demonstrating that FA can be effectively cultivated to enhance English language performance even in examination-oriented educational environments, thus clarifying the circumstances under which such interventions succeed. The success of this program offers hope and direction for schools and teachers elsewhere. It proves that even in a system dominated by highstakes exams, FA can take root and make a difference. When school leaders and researchers work together, and when teachers are given time, tools, and trust, real change is possible. This study reminds us that teacher growth in assessment literacy is a journey that requires care, patience, and belief in the power of learning. With these elements in place, both teachers and students can thrive. 5. Conclusion CRediT authorship contribution statement In their classic review of FA, Black and Wiliam (1998a) confidently claimed that “The research reported here shows conclusively that formative assessment does improve learning” (p. 61). The confidence was so contagious that Popham (2010) went as far as saying “I would like to see this 12-word sentence inscribed in marble atop the entrance to every faculty lounge in the world” (p. 184). The enthusiasm quickly led to FA being adopted into educational policies and teacher education programs around the world (e.g., Crooks, 2011; Klenowski, 2011). Since then, many teachers have tried to put FA into practice, often with mixed results. At the same time, researchers have been working hard to build stronger evidence for how FA can truly enhance learning. This study shows that with the right support, teachers can successfully use FA to improve student learning. Teachers grew in their understanding and use of FA through close collaboration and clear guidance. They moved beyond theory and began applying FA in real classroom situations. This growth came from a well-designed professional development program that focused on active learning, teamwork, and regular reflection. Teachers were not given answers; they were Jiayi Li: Writing – review & editing, Writing – original draft, Methodology, Investigation, Conceptualization. Peter Yongqi Gu: Writing – review & editing, Writing – original draft, Supervision, Conceptualization. Declaration of competing interest we the undersigned declare that this manuscript is original, has not been published before and is not currently being considered for publication elsewhere. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We understand that the corresponding author is the sole contact for the editorial process. He is responsible for communicating with the other author about progress, submission of revisions and final approval of proofs. 14 Teaching and Teacher Education 174 (2026) 105426 J. Li and P.Y. Gu Appendix A. Planned CBFA Task for the First Round of Action Research Learning target Success criteria Elicitation of evidence Students should be able to use a reading strategy of classifying and organizing information to understand a travel journal. • Draw a diagram or timeline to organize information according to the time clues in the travel journal. • Classify information about what the author did in each period of the trip. • Cover the following information: Before trip: Li Lan camped in the Redwood Forest and visited Napa Valley. In the morning: Li Lan explored the neighbourhood, learned about Mission School art, and ate Mexican-Chinese noodles from a food truck. In the afternoon: Li Lan went to a historical museum. In the evening: Li Lan went to Chinatown for home style cooking. Tomorrow: Li Lan will go to a jazz bar in the Richmond District. • Students read the travel journal in 5–6 min and complete the timeline of the trip. • Teachers observe students doing the task. • Students work in pairs or groups to share and discuss their answers within the group. • Teachers observe student discussion. • Students share their answers. Interpretation Feedback Action Able to classify and organize information. • Confirmatory feedback • Praise • Corrective feedback • Raising awareness of key information • Provide advice for improvement • Provide possible answers • Explain the potential problems Move on to next level/task. Learning to classify and organize information. Unable to classify and organize information. Targeted consolidation Targeted task Appendix B. Planned CBFA Task for the Second Round of Action Research Learning goals Success criteria Elicitation of evidence Students should be able develop a reading strategy of summarizing the main idea. •Find the topic sentence and/or key information for each paragraph: Para 1: People have always wanted to learn more about space. Para 2: Scientists were determined to help humans realise their dream to explore space; in the 20th century. Para 3: risks/accidents/disasters; despite the risks; the desire to explore the universe never died. Para 4: China, progress, the third country in the world. Para 5: The future of space exploration remains bright. •Summarize the main idea of each paragraph into accurate, clear, concise, and complete sentences. •Students read the passage in 6–7 min, identify the topic sentences and/or key information, and use their own words to summarize the main idea for each paragraph. •Teachers observe students doing the task. •Students self-assess and revise their answers. •Students work in pairs or groups for peer assessment. •Teachers observe student discussion for common problems among groups and note down problems if necessary. •Students make improvements based on peer feedback. •Different levels of students share their answers to the class. Interpretation Feedback Action Able to summarize the main idea. •Confirmatory feedback •Reinforce •Encourage students to use the strategy in future reading •Remind key information in the paragraph •Explain the potential problems •Reinforce student performance •Provide advice or supplementary information for improvement •Provide suggested answers •Analyze the strengths in suggested answers •Summarize key points for the strategy •Raise awareness •Explain possible source of problem •Provide supplementary guidance •Work with students on a strategy to proceed Move on to next level/task Learning to summarize the main idea. Unable to summarize the main idea. Data availability •Scaffolding (e.g., gap-filling sentence structure: Paragraph 1: People are and scientists. Paragraph 2: In the 20th century,. Paragraph 3: Humans in spite of. Paragraph 4: China's space programme. Paragraph 5: The future of space exploration.) •Targeted consolidation (e.g., students improve their summaries) Targeted task: Read another passage and summarize the main idea. References Andersson, C., Boström, E., & Palm, T. (2017). Formative assessment in Swedish mathematics classroom practice. Nordic Studies in Mathematics Education, 22(1), 5–20. Andersson, C., & Palm, T. (2017). The impact of formative assessment on student achievement: A study of the effects of changes to classroom practice after a The data that has been used is confidential. 15 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 comprehensive professional development programme. Learning and Instruction, 49, 92–102. https://doi.org/10.1016/j.learninstruc.2016.12.006 Andersson, C., & Palm, T. (2018). Reasons for teachers' successful development of a formative assessment practice through professional development – A motivation perspective. Assessment in Education: Principles, Policy & Practice, 25(6), 576–597. https://doi.org/10.1080/0969594X.2018.1430685 Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5–25. https://doi.org/10.1080/ 0969594X.2010.513678 Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning: Putting it into practice. Oxford University Press. Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/ 0969595980050102 Black, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2), 139–148. https://doi.org/10.1177/ 003172171009200119 Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31. https://doi.org/ 10.1007/s11092-008-9068-5 Black, P., & Wiliam, D. (2012). Developing a theory of formative assessment. In J. Gardner (Ed.), Assessment and learning (2nd ed., pp. 206–230). SAGE Publications Ltd. Boström, E., & Palm, T. (2023). The effect of a formative assessment practice on student achievement in mathematics. Frontiers in Education, 8, 1–14. https://doi.org/ 10.3389/feduc.2023.1101192 Broadfoot, P. M., Daugherty, R., Gardner, J., Gipps, C. V., Harlen, W., James, M., et al. (1999). Assessment for learning: Beyond the black box. University of Cambridge. Burner, T. (2014). The potential formative benefits of portfolio assessment in second and foreign language writing contexts: A review of the literature. Studies In Educational Evaluation, 43, 139–149. https://doi.org/10.1016/j.stueduc.2014.03.002 Burns, A. (2010). Doing action research in English language teaching: A guide for practitioners. Routledge. Chappuis, J. (2009). Seven strategies of assessment for learning. Pearson. Coffey, J. E., Hammer, D., Levin, D. M., & Grant, T. (2011). The missing disciplinary substance of formative assessment. Journal of Research in Science Teaching, 48(10), 1109–1136. https://doi.org/10.1002/tea.20440 Cohen, J. (1988). Statistical power analysis for the behavioural sciences. Taylor & Francis Group. https://doi.org/10.4324/9780203771587 Crooks, T. (2011). Assessment for learning in the accountability era: New Zealand. Studies In Educational Evaluation, 37(1), 71–77. https://doi.org/10.1016/j. stueduc.2011.03.002 Darling-Hammond, L., Hyler, M. E., & Gardner, M. (2017). Effective teacher professional development. Learning Policy Institute. Davison, C., & Leung, C. (2009). Current issues in English language teacher-based assessment. Tesol Quarterly, 43(3), 393–415. Day, C. (1999). Professional development and reflective practice: Purposes, processes and partnerships. Pedagogy, Culture & Society, 7(2), 221–233. https://doi.org/ 10.1080/14681366.1999.11090864 Day, C., & Sachs, J. (2005). Professionalism, performativity and empowerment: Discourses in the politics, policies and purposes of continuing professional development. In C. Day, & J. Sachs (Eds.), International handbook on the continuing professional development of teachers (pp. 3–32). Open University Press. De Lisle, J. (2015). The promise and reality of formative assessment practice in a continuous assessment scheme: The case of Trinidad and Tobago. Assessment in Education: Principles, Policy & Practice, 22(1), 79–103. https://doi.org/10.1080/ 0969594X.2014.944086 De Neve, D., Leroy, A., Struyven, K., & Smits, T. (2022). Supporting formative assessment in the second language classroom: An action research study in secondary education. Educational Action Research, 30(5), 828–849. https://doi.org/10.1080/ 09650792.2020.1828120 DeLuca, C., Klinger, D., Pyper, J., & Woods, J. (2015). Instructional rounds as a professional learning model for systemic implementation of assessment for learning. Assessment in Education: Principles, Policy & Practice, 22(1), 122–139. https://doi.org/ 10.1080/0969594X.2014.967168 Desimone, L. M. (2009). Improving impact studies of teachers' professional development: Toward better conceptualisations and measures. Educational Researcher, 38(3), 181–199. https://doi.org/10.3102/0013189X08331140 Dignath, C., & Büttner, G. (2008). Components of fostering self-regulated learning among students. A meta-analysis on intervention studies at primary and secondary school level. Metacognition and Learning, 3(3), 231–264. https://doi.org/10.1007/s11409008-9029-x Dunn, K. E., & Mulvenon, S. W. (2009). A critical review of research on formative assessments: The limited scientific evidence of the impact of formative assessments in education. Practical Assessment, Research and Evaluation, 14(7), 1–11. Earl, L. (2003). Assessment as learning. Corwin. Gardner, J. (2006). Assessment for learning: A compelling conceptualisation. In J. Gardner (Ed.), Assessment and learning (pp. 197–204). Sage. Gu, P. Y. (2014). The unbearable lightness of the curriculum: What drives the assessment practices of a teacher of English as a foreign language in a Chinese secondary school? Assessment in Education: Principles, Policy & Practice, 21(3), 286–305. https://doi.org/ 10.1080/0969594X.2013.836076 Gu, P. Y. (2021). An argument-based framework for validating formative assessment in the classroom. Frontiers in Education, 6, Article 605999. https://doi.org/10.3389/ feduc.2021.605999 Gu, P. Y., & Lam, R. (2023). Developing assessment literacy for classroom-based formative assessment. Chinese Journal of Applied Linguistics, 46(2), 155–161. https:// doi.org/10.1515/CJAL-2023-0201 Gu, Y., & Li, J. (2020a). Validity in formative assessment [形成性评估的效度]. Foreign Language Education in China (外语教育研究前沿), 3(3), 34–41. https://doi.org/20966105(2020)03-0034-08. Gu, P. Y., & Li, J. (2020b). 形成性评估的效度验证方法 [Validation of formative assessment]. 外语教育研究前沿 [Foreign. Language Education in China], 3(4), 57–63. https://doi.org/2096-6105(2020)04-0057-07. Gu, P. Y., & Yu, G. (2020). Researching classroom-based assessment for formative purposes. Chinese Journal of Applied Linguistics, 43(2), 150–168. https://doi.org/ 10.1515/CJAL-2020-0010 Gulikers, J., Veugen, M., & Baartman, L. (2021). What are we really aiming for? Identifying concrete student behavior in co-regulatory formative assessment processes in the classroom. Frontiers in Education, 6, Article 750281. https://doi.org/ 10.3389/feduc.2021.750281 Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487 Heitink, M. C., Van der Kleij, F. M., Veldkamp, B. P., Schildkamp, K., & Kippers, W. B. (2016). A systematic review of prerequisites for implementing assessment for learning in classroom practice. Educational Research Review, 17, 50–62. https://doi. org/10.1016/j.edurev.2015.12.002 Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi Delta Kappan, 89(2), 140–145. https://doi.org/10.1177/003172170708900210 Huisman, B., Saab, N., van den Broek, P., & van Driel, J. (2019). The impact of formative peer feedback on higher education students' academic writing: A meta-analysis. Assessment & Evaluation in Higher Education, 44(6), 863–880. https://doi.org/ 10.1080/02602938.2018.1545896 Jiang, Y. (2014). Exploring teacher questioning as a formative assessment strategy. RELC Journal, 45(3), 287–304. https://doi.org/10.1177/0033688214546962 Kingston, N., & Nash, B. (2011). Formative assessment: A meta-analysis and a call for research. Educational Measurement: Issues and Practice, 30(4), 28–37. https://doi.org/ 10.1111/j.1745-3992.2011.00220.x Klenowski, V. (2011). Assessment for learning in the accountability era: Queensland, Australia. Studies In Educational Evaluation, 37(1), 78–83. https://doi.org/10.1016/j. stueduc.2011.03.003 Kubanyiova, M. (2012). Teacher development in action. Palgrave Macmillan UK. https:// doi.org/10.1057/9780230348424 Lee, H., Chung, H. Q., Zhang, Y., Abedi, J., & Warschauer, M. (2020). The effectiveness and features of formative assessment in US K-12 education: A systematic review. Applied Measurement in Education, 33(2), 124–140. https://doi.org/10.1080/ 08957347.2020.1732383 Leung, C., & Mohan, B. (2004). Teacher formative assessment and talk in classroom contexts: Assessment as discourse and assessment of discourse. Language Testing, 21 (3), 335–359. https://doi.org/10.1191/0265532204lt287oa Li, J., & Gu, P. Y. (2023). Developing classroom-based formative assessment literacy: An EFL teacher’s journey. Chinese Journal of Applied Linguistics, 46(2), 198–218. https://doi.org/10.1515/CJAL-2023-0204. Li, J., & Gu, P.. Formative assessment for self-regulated learning: Evidence from a teacher continuing professional development programme. System, 125. Scopus. https://doi. org/10.1016/j.system.2024.103414. Marshall, B., & Drummond, M. J. (2006). How teachers engage with Assessment for Learning: Lessons from the classroom. Research Papers in Education, 21(2), 133–149. https://doi.org/10.1080/02671520600615638. Maskos, K., Schulz, A., Oeksuez, S. S., & Rakoczy, K. (2025). Formative assessment in mathematics education: A systematic review. ZDM – Mathematics Education, 57(4), 679–693. https://doi.org/10.1007/s11858-025-01696-x Ministry of Education of the People’s Republic of China (MOE). (2003). General senior high school curriculum standards. People’s Education Press. Ministry of Education of the People’s Republic of China (MOE). (2020). General senior high school curriculum standards. People’s Education Press. Oo, C. Z., Alonzo, D., & Davison, C. (2023). Using a needs-based professional development program to enhance pre-service teacher assessment for learning literacy. International Journal of Instruction, 16(1), 781–800. https://doi.org/ 10.29333/iji.2023.16144a Pituch, K. A., & Stevens, J. P. (2015). Applied multivariate statistics for the social sciences: Analyses with SAS and IBM's SPSS (6th ed.). Routledge. Popham, W. J. (2008). Transformative assessment. Association for supervision and curriculum development. Popham, W. J. (2010). Wanted: A formative assessment starter kit. Assessment Matters, 2, 182–190. Price, M., Handley, K., Millar, J., & O'Donovan, B. (2010). Feedback: All that effort, but what is the effect? Assessment & Evaluation in Higher Education, 35(3), 277–289. https://doi.org/10.1080/02602930903541007 Ramaprasad, A. (1983). On the definition of feedback. Behavioural Science, 28(1), 4–13. https://doi.org/10.1002/bs.3830280103 Rohrbeck, C. A., Ginsburg-Block, M. D., Fantuzzo, J. W., & Miller, T. R. (2003). Peerassisted learning interventions with elementary school students: A meta-analytic review. Journal of Educational Psychology, 95(2), 240–257. https://doi.org/10.1037/ 0022-0663.95.2.240 Ross, J. A. (2006). The reliability, validity, and utility of self-assessment. Practical Assessment, Research and Evaluation, 11(10), 1–13. https://doi.org/10.7275/9wphvv65 Ruiz-Primo, M. A., & Furtak, E. M. (2006). Informal formative assessment and scientific inquiry: Exploring teachers' practices and student learning. Educational Assessment, 11(3), 205–235. https://doi.org/10.1080/10627197.2006.9652991 16 J. Li and P.Y. Gu Teaching and Teacher Education 174 (2026) 105426 Van Zundert, M., Sluijsmans, D., & Van Merriënboer, J. (2010). Effective peer assessment processes: Research findings and future directions. Learning and Instruction, 20(4), 270–279. https://doi.org/10.1016/j.learninstruc.2009.08.004 Veugen, M. J., Gulikers, J. T. M., & den Brok, P. (2021). We agree on what we see: Teacher and student perceptions of formative assessment practice. Studies In Educational Evaluation, 70, Article 101027. https://doi.org/10.1016/j. stueduc.2021.101027 Wiliam, D. (2007). Changing classroom practice. Educational Leadership, 65(4), 36–42. Wiliam, D. (2019). Some reflections on the role of evidence in improving education. Educational Research and Evaluation, 25(1–2), 127–139. https://doi.org/10.1080/ 13803611.2019.1617993 Wiliam, D., Lee, C., Harrison, C., & Black, P. (2004). Teachers developing assessment for learning: Impact on student achievement. Assessment in Education: Principles, Policy & Practice, 11(1), 49–65. https://doi.org/10.1080/0969594042000208994 Wiliam, D., & Thompson, M. (2008). Integrating assessment with instruction: What will it take to make it work? In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 53–82). Routledge. https://doi.org/10.4324/9781315086545. Willis, J. (2010). Assessment for learning as a participative pedagogy. Assessment Matters, 2, 65–84. https://doi.org/10.18296/am.0079, 2010. Yan, Z., Li, Z., Panadero, E., Yang, M., Yang, L., & Lao, H. (2021). A systematic review on factors influencing teachers' intentions and implementations regarding formative assessment. Assessment in Education: Principles, Policy & Practice, 28(3), 228–260. https://doi.org/10.1080/0969594X.2021.1884042 Yao, Y., Amos, M., Snider, K., & Brown, T. (2024). The impact of formative assessment on K-12 learning: A meta-analysis. Educational Research and Evaluation, 29(7–8), 452–475. https://doi.org/10.1080/13803611.2024.2363831 Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues & Answers Report, REL 2007-No.033). U.S. Department of education, institute of education sciences, national centre for education evaluation and regional assistance, regional educational laboratory southwest. Retrieved on 23 March 2024 from ies.ed. gov/ncee/edlabs/regions/southwest/pdf/rel_2007033_sum.pdf. Ruiz-Primo, M. A., & Furtak, E. M. (2007). Exploring teachers' informal formative assessment practices and students' understanding in the context of scientific inquiry. Journal of Research in Science Teaching, 44(1), 57–84. https://doi.org/10.1002/ tea.20163 Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144. https://doi.org/10.1007/BF00117714 Sansom, D. W. (2017). Reinvention of classroom practice innovations. ELT Journal, 71 (4), 423–432. https://doi.org/10.1093/elt/ccw116 Schildkamp, K., van der Kleij, F. M., Heitink, M. C., Kippers, W. B., & Veldkamp, B. P. (2020). Formative assessment: A systematic review of critical teacher prerequisites for classroom practice. International Journal of Educational Research, 103, Article 101602. https://doi.org/10.1016/j.ijer.2020.101602 Schneider, M. C., & Randel, B. (2010). Research on characteristics of effective professional development programs for enhancing educators' skills in formative assessment. In H. L. Andrade, & G. J. Cizek (Eds.), Handbook of formative assessment (pp. 251–276). Routledge. Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. https://doi.org/10.3102/0034654307313795 Sims, S., Fletcher-Wood, H., O'Mara-Eves, A., Cottingham, S., Stansfield, C., Goodrich, J., Van Herwegen, J., & Anders, J. (2025). Effective teacher professional development: New theory and a meta-analytic test. Review of Educational Research, 95(2), 213–254. https://doi.org/10.3102/00346543231217480 Singh, S., & Mahapatra, S. (2022). Online self- and peer-assessment for teacher professional development: A case study of the process and the perceived impact. TESOL Journal, 13(2). https://doi.org/10.1002/tesj.635 Taye, T. (2025). The effect of formative assessment strategies on EFL learners' writing accuracy at the tertiary level. Language Testing in Asia, 15(1). https://doi.org/ 10.1186/s40468-025-00417-1 Timperley, H., Wilson, A., Barrar, H., & Fung, I. (2007). Teacher professional learning and development: Best evidence synthesis iteration. New Zealand Ministry of Education. 17