(PDF) A linking hypothesis for eyetracki

Brain Research 1851 (2025) 149477 Contents lists available at ScienceDirect Brain Research journal homepage: www.elsevier.com/locate/brainres A linking hypothesis for eyetracking and mousetracking in the visual world paradigm☆ Michael J. Spivey * Department of Cognitive and Information Sciences University of California Merced United States A R T I C L E I N F O A B S T R A C T Keywords: Psycholinguistics Eyetracking Mousetracking Spoken word recognition Action-perception cycle Perception–action cycle Dynamical systems Embodied cognition For a linking hypothesis in the visual world paradigm to clearly accommodate existing findings and make unambiguous predictions, it needs to be computationally implemented in a fashion that transparently draws the causal connection between the activations of internal representations and the measured output of saccades and reaching movements. Quantitatively implemented linking hypotheses provide an opportunity to not only demonstrate an existence proof of that causal connection but also to test the fidelity of the measuring methods themselves. When a system of interest is measured one way (e.g., ballistic dichotomous outputs) or another way (e.g., smooth graded outputs), the apparent results can differ substantially. What is needed is one linking hypothesis that can produce both types of outputs. The localist attractor network simulation of spoken word recognition demonstrated here recreates eye and mouse movements that capture key findings in the visual world paradigm, and especially relies on one particularly powerful theoretical construct: feedback from the actionperception cycle. Visual feedback from the eye position enhancing the cognitive prominence of the fixated object allows the simulation to fit a wider range of findings, and points to predictions for new experiments. When that feedback is absent, the linking hypothesis simulation no longer fits human data as well. Future experiments, and improvements of this network simulation, are discussed. 1. Introduction For decades, research on language comprehension and research on visual perception progressed independently of one another, with a few exceptions (e.g., Cooper, 1974). Then, thirty years ago, Tanenhaus et al. (1995) introduced what is now called the Visual World Paradigm in psycholinguistics that demanded these two research areas join forces. By recording eye movements from participants listening to spoken instructions to move objects around, remarkable new insights were obtained about how efficiently language processing and vision perception interact (Eberhard et al., 1995). Findings from this convergence of frameworks have clearly shown that visual input can serve as a powerful context for real-time language comprehension and linguistic input can serve as a powerful context for real-time visual perception (Anderson e al., 2011). Despite this unprecedented access to real-time evidence for what objects in the display draw eye movements, and when, as a result of the ongoing linguistic input, there are still questions that linger to this day regarding how best to interpret these eye movement data. Researchers using the visual world paradigm have struggled to settle on a precise “linking hypothesis” that links the eye movement data to hypothesized activations of internal mental representations. The reasons for disagreement are manifold, including debates on how to handle the refractory period between saccadic eye movements, how to statistically analyze the pooled data of “fixation curves”, etc. And when computermouse tracking was added to the mix (Spivey et al., 2005), interpreting the full panoply of results became even more complicated. Debating over the exact parameters of a linking hypothesis in cognition research is not an uncommon thing. When a theory has some vagaries in how exactly it connects to the data that ostensibly supports it (perhaps containing some unmentioned assumptions), Meehl (1990) referred to this as a “loose derivational chain.” Meehl suggested that almost every theory in psychology suffers from a loose derivational chain connecting it to the data. As a result, it should not be surprising that mutually exclusive theories will sometimes point to the same dataset as their support. One approach that has helped the visual world paradigm literature take steps toward a more precisely explicated linking hypothesis for any given theory, or what Meehl would call a “tight This article is part of a special issue entitled: ‘30 Years Visual World Paradigm: The State of the Art’ published in Brain Research. * Address: Department of Cognitive and Information Sciences, University of California, Merced, Merced, CA 95343, United States. E-mail address:

[email protected]

. ☆ https://doi.org/10.1016/j.brainres.2025.149477 Received 2 July 2024; Received in revised form 13 January 2025; Accepted 23 January 2025 Available online 28 January 2025 0006-8993/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). M.J. Spivey Brain Research 1851 (2025) 149477 derivational chain,” has been to develop model simulations of the internal goings on of linguistic/cognitive processes and have those models generate data that can be directly compared to the eye-movement data (Magnuson, 2019; McMurray, 2023; Spivey, 2007). In discussing linking hypotheses for the visual world paradigm (Tanenhaus & Huettig, this issue), it is crucial that we understand the relationship between eye movements and hand movements. If what the eyes are looking at, or what the hand is reaching toward, is a fair indicator of what is active in the cognitive process of interest, then those eye or hand movements provide a link to that cognitive process. However, the systems underlying those two linking hypotheses are not independent of one another. Decades of multisensory integration research have extensively shown a close and complex relationship between the eyes and the hands (Driver & Spence, 1998; Maravita et al., 2003; see also Emberson et al., 2008). Therefore, the linking hypotheses associated with each of them may need to be combined into one joint linking hypothesis. Consider a hypothetical freeze-frame in a visual world paradigm experiment conducted on a computer screen: When the eyes have just moved from a Competitor object to the Target object but the hand hasn’t yet curved the computer mouse toward that Target object, are we to trust one of these as the linking hypothesis (i.e., telling us what the mind is thinking about) and ignore the other? How about a trial where the computer mouse is just about to click on the Target object and the eyes flick briefly over at the Competitor object? Rather than having to choose between one linking hypothesis or the other, perhaps a better solution would be to design one linking hypothesis that combines both eye and hand and accommodates their partial interdependency. In general, a linking hypothesis is expected to provide a transparent connection from the hypothesized (and invisible) internal dynamics of a system to the observed measurements from that system. In some circumstances, this can be quite complicated and difficult. The transformation from internal dynamics to system output can alter the data significantly and the measurement process itself may impose transformations on the data due to sampling rate issues or a limited range of sensitivity, or access to only certain dimensions of the system. One proving ground for a measurement process is computational simulation. Firstly, once implemented in a computational simulation, a linking hypothesis can serve as an existence proof for a proposed relationship between the simulation’s internal dynamics and the measurements of its outputs. Secondly, simulated systems that are then measured as though their internal dynamics were invisible can provide an opportunity to test those measurement methods and then compare their results with the actual ground truth of the simulation (Spivey, 2018). By measuring saccadic eye movements during the course of a language processing task, the introduction of the visual world paradigm (Eberhard et al., 1995; Tanenhaus et al., 1995) made it easier than ever before to move away from the tradition of collecting a single data point at the end of an event and instead collect multiple data points per psycholinguistic event; whether it be the comprehension of a single phoneme (McMurray & Spivey, 2000), or a spoken word (Allopenna et al., 1998), or a sentence (Spivey et al., 2002), or sentence production (Griffin & Bock, 2000), or even a lengthy unscripted dyadic conversation (Brown-Schmidt & Tanenhaus, 2008). Ever since this paradigm was developed, with its new and complicated data sets, the linking hypothesis between the mind and the eyes has been debated (Tanenhaus et al., 2000). Rather than eye movements being guided solely by the acoustic–phonetic properties of the most recently heard word (Allopenna et al., 1998), certain circumstances may cause them to be guided by the semantic properties of that word (Huettig & Altmann; 2005; Yee & Sedivy, 2006), or by an entire clause (Burigo & Knoeferle, 2011), or even by an anticipated word that hasn’t been spoken yet (Altmann & Kamide, 2007). Moreover, lexical activations may be changing at a level that is below the threshold to trigger a saccade (Teruya & Kapatsinski, 2019), and they may be changing while the eyes are engaged in a stable fixation whose minimum duration is about 200 ms (Matin, Shao, & Boff, 1993; McMurray, 2023). Accordingly, Magnuson (2019) strongly recommends the development of implemented computational models of the linking hypothesis. Similar debates have begun regarding how to interpret the linking hypothesis between the mind and the hand/computer-mouse (Freeman et al., 2011; Kieslich et al., 2020; Magnuson, 2005; Spivey et al., 2005; van der Wel et al., 2009). Throughout all this, the potential for an interdependency between the Mind-Eye linking hypothesis and the Mind-Hand linking hypothesis has been somewhat neglected (but cf. Levy, 2014). And exploring the consequences of a feedback loop in that link has gone almost unmentioned. What would it mean for there to be a feedback loop in the Mind-Eye linking hypothesis? In essence, it would mean that the eyes are not an impartial measure of what is going on in the mind. What cognition is pondering influences what the eyes look at, but then, in turn, what the eyes look at subtly influences what cognition then ponders. It seems obvious when put that way, but it necessarily means that the eyes cannot be treated as providing an unbiased report of cognitive processes because they also quickly influence those ongoing cognitive processes (Spivey & Dale, 2011). For example, Grant and Spivey (2003, Experiment 1) observed that people who were 30-seconds away from solving a diagrammatic insight problem tended to produce a particular type of eyemovement pattern on that diagram. Then, in the second experiment with new participants, they induced that type of eye-movement pattern by very slightly augmenting the display, and twice as many participants solved the insight problem (see also, Thomas & Lleras, 2007, 2009). As another example, Pärnamets et al. (2015) tracked people’s eye movements while they were making decisions about difficult moral quandaries with a two-alternative forced choice. Just when the participant’s waffling scanpath was displaying a mild bias toward the computer’s randomly-chosen preferred response option, the computer display interrupted the participant’s decision-making process and demanded an immediate response. The result was that instead of choosing the computer’s randomly-chosen preferred response option at chance (50% of the time), participants chose the computer’s preferred response option several percentage points more than that (see also Ghaffari & Fiedler, 2018; Falandays & Spivey 2020). Similar effects of eye position influencing cognition have been observed in the visual world paradigm. When Joshua Levy (2014) recorded both eye movements and mouse movements in the same task, he found that eye position predicted the magnitude of mouse trajectory curvature. While participants looked at two response options on the screen for Cohort condition trials (e.g., candy and candle) and for Control condition trials (e.g., candy and fork) in a spoken word recognition experiment, Levy observed that even among the subset of trials in which the eyes never looked at the competitor object, the Cohort condition still elicited greater mouse curvature (toward the distractor object) than in the Control condition. Thus, the greater average mouse trajectory curvature in the Cohort condition is not due solely to those trials in which the Cohort object was briefly fixated. This suggests that mouse movements are an informative measure in addition to what eyetracking can provide. However, more important than that, Levy also found that the subset of trials where participants briefly fixated the competitor object (either Cohort or Control), before finally fixating the correct Target object, elicited significantly greater mouse trajectory curvatures, compared to the subset of trials in which participants did not fixate the competitor object (Levy, 2014, Fig. 16). A linking hypothesis for the visual world paradigm that treats eye movements as nothing more than a “read-out” for the experimenter may have trouble explaining that difference. Levy’s results are consistent with the presence of visual feedback in the perception–action loop, such that pointing the foveas at an object serves not merely as an indicator to the experimenter that this object has cognitive prominence in the mind of the participant, but it also serves as an enhancer of that object’s cognitive prominence itself. Results like these suggest that any linking hypothesis for the visual world paradigm needs to take into account the fact that when the eyes move to fixate a particular object in the display, the visual input from that object on the foveas may function as a feedback loop 2 M.J. Spivey Brain Research 1851 (2025) 149477 from the action-perception cycle that increases the cognitive prominence of that object in the mind of the observer/listener. Decades ago, Mike Posner (1980) demonstrated that, with careful instructions, visuospatial attention can be effectively divorced from the fovea. Although abrupt visual and auditory cues in the periphery will typically trigger an eye movement to that location, this semi-reflexive action can be resisted with practice. Thus, the Mind-Eye linking hypothesis cannot be described as an obligatory linking hypothesis. A person can look at one thing and attend to something else in much the same way that they can reach for something while looking at something else. Although these unusual behaviors are somewhat rare, they make it so that neither the Mind-Eye linking hypothesis nor the Mind-Hand linking hypothesis are obligatory “strong” linking hypotheses. Especially if they loop back on themselves and each other, and can alter the other’s end result, then they must each individually be “weak” linking hypotheses. (If there exists a “strong” linking hypothesis, it would likely be neural decoding, see McMurray et al., 2022; Rybář and Daly, 2022). That said, even as “weak” linking hypotheses, the eyes and hand still provide a moderately reliable indicator of what is prominent in cognition, most of the time. By building a simplified model simulation that conjoins those two linking hypotheses, accommodates their interdependency, and integrates a feedback loop from the action-perception cycle, one can perhaps map fleeting eye and hand movements to their associated fleeting mental contents. a complex system can still provide some insight into what that system actually looks like. This is especially true with nonlinear time series analysis. In the 1980s, cognitive psychologists examined the statistical distribution of reaction times in a cognitive task, ignoring the specific order in which those trials took place, and determined that the flow of information in cognition sometimes looks continuous in time and sometimes looks discrete in time (see Meyer et al., 1988, for a review). In the end, the debate over discrete vs. continuous processing (e.g., Dietrich & Markman, 2003; Spivey, 2007) was not resolved simply by analyzing a distribution of reaction times. Instead of treating those data like a “Bag-of-RTs,” Van Orden et al. (2003) examined the specific order in which those trials took place. What they discovered in their nonlinear time series analysis of thousand-trial strings of reaction times in a cognition experiment was that the variance over time was not white noise. It was pink noise. Pink noise is variance that ebbs and flows in its amplitude with high power at low frequencies, medium power at medium frequencies, and low power at high frequencies. (White noise has equal power at all frequencies.) Pink noise naturally emerges in continuously flowing complex adaptive systems that exhibit selforganization (Van Orden et al., 2003) and that entail feedback loops spanning across time scales (He et al., 2010) – not in discrete stage-based systems that lack feedback loops (Wagenmakers et al., 2004). Of course, some continued debate about the role of pink noise in cognition and action still persists (Torre & Wagenmakers, 2009). However, the ubiquity of this statistical signature permeating throughout all living things is hard to ignore (Kello et al., 2008; Kello et al.), and suggests that the human mind just might be a continuously flowing complex adaptive system that exhibits feedback loops and self-organization (e.g., Shin & Kim, 2006; Van Orden et al., 2011). When a system is nonlinear, dynamic, and complex – such as a human mind or an ant colony or a whirlwind – then it can be difficult to know what part of it will provide the most informative measure. But nonlinear time series analysis is a powerful technique. Even with a severely imperfect measure of a complex adaptive system, collecting a long enough time series of data from almost anywhere on it can actually provide useful information. This is because the various subcomponents of a complex adaptive system are interdependent with one another. So even if an observer is measuring a suboptimal portion of that system, the more informative portions of the system have a constant continuous influence on the portion that is being measured, thus imbuing that imperfect measure with latent information about the entire system’s behavior. The Lorenz attractor (Lorenz, 1963) is an excellent test bed for demonstrating nonlinear time series analysis methods (Stephen et al., 2009) and for comparing measurement techniques (Spivey 2018). It is a system of three interdependent equations that describe the change in x, y, and z over time: dx/dt = 10(y-x), dy/dt = x(28-z)-y, and dz/dt = xy(8/3)z. Ruelle and Takens (1971) called it a strange attractor because the three values for x, y, and z can start almost anywhere in that space and the three equations will update themselves in a way that always produce a version of the “bent figure-eight” shape in Fig. 1A. Takens (1981) later proved that when a system’s components are interdependent like this, even a substantially imperfect time series measure of it can still allow a reconstruction of its data pattern in state space. Imagine an observer did not know what the Lorenz shape in Fig. 1A looked like, or even how many dimensions it has, but they had a reliable dense-sampling measurement probe recording the x-values over time. That time series would look like Fig. 1B, bearing little visual resemblance to the underlying structure of the actual system of interest. (This is a little bit like measuring behavioral output to infer the structure of internal cognition.) If Fig. 1B were all they could look at, this observer would not have a fair understanding what the Lorenz system is doing. But state space reconstruction (Takens, 1981) provides a method for embedding that one-dimensional time series of x-values into more dimensions. In the reconstructed 3-dimensional space (Fig. 1C), one dimension (x’) extracts every third x-value from the original time series, 2. Simulated linking hypotheses When one simulates a complex process that is already well understood and then performs different types of measurements on that simulation, one can compare the fidelity of various linking hypotheses, using a system where the ground truth actually is known (Spivey, 2018). A compelling example of this method comes from simulations of black holes (Akiyama et al., 2019). Simulations based on slightly different theories produce noticeably different high-resolution images of simulated black holes. However, even our most powerful telescopes are unable to produce an image that is detailed enough to allow one to adjudicate among the different simulations/theories. By applying an additional simulation of the measurement process itself (with its imperfections included) onto the simulated images, Akiyama et al. were able to produce an image of what the simulation would look like if it were measured by the Hubble telescope from 53 million light years away. That blurry image of the simulation (based on geometric crescent models) is a near-perfect match to the Hubble telescope’s actual blurry image of M87*, the supermassive black hole at the center of the Virgo A galaxy (Akiyama et al., 2019). In the case of these black hole simulations, the linking hypothesis is exactly that additional simulation of the transformations (e.g., gravitational lensing, magnetic fields, etc.) that the measurement process will unavoidably impose on the data from the original system. Similar examples of taking measures that were meant for the system of interest and instead applying them to a simulation of that system of interest can be seen in Rabovsky and McRae’s (2014) simulated EEG measures of a Hopfield network processing the semantics of lexical input and in Magnuson et al.’s (2020) neural decoding measures of their long short-term memory recurrent neural network of speech perception. This measure-the-simulation methodology (Spivey, 2018) not only tests the simulation for its ability to recreate human data but it also tests the measurement method itself, because one can compare the measured output to the “actual state of affairs” inside the simulation. And who can forget the infamous statistical measurements of fMRI data from a dead salmon, which “revealed” several voxels in the brain cavity with statistically significant activation (Bennett et al., 2009). Obviously, the measurement and analysis techniques employed there – common to fMRI studies at the time – were not providing a high-fidelity report of the actual state of affairs inside that salmon. If used properly, however, even severely imperfect measurements of 3 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 1. State space reconstruction of an iterated version of the Lorenz attractor (A). Given only a time series from the x-values (B), it can be embedded into three dimensions to produce an exemplary reconstruction (C). another dimension (x’’) extracts the next set of every third x-values, and the third dimension (x’’’) extracts the remaining set of every third xvalues. Thus, if the original x time series (Fig. 1B) had the following sequence of values in it, [6.77, 5.28, 3.89, 2.59, 1.41, 0.34, −0.62, −1.47, −2.23], then x’=[6.77, 2.59, −0.62], x’’=[5.28, 1.41, −1.47], and x’’’=[3.89, 0.34, −2.23]. When these three extracted time series are treated as a 3-dimensional reconstruction, the first point is plotted at coordinates [6.77, 5.28, 3.89], the next point is plotted at [2.59, 1.41, 0.34], and so on, producing a reconstruction that transforms the data in Fig. 1B into the image in Fig. 1C. Thus, without ever directly measuring y or z, the general underlying structure of the actual system of interest can be reconstructed. The reason this is possible is because the system’s components (i.e., the x, y, and z update equations) are interdependent with one another and the measurement process produces a time series of many data points – not just a single data point at the end of an event. But what if the observer did not have access to a probe that singled out an individual dimension? Attaching a measuring probe to an arbitrary location in the system could result in an untold mixture of data from x, y, and z at any one point in time. If the observer attached three semi-random measurement probes, with the intent of producing a 3-D data visualization, then each probe could record the Euclidian distance of the Lorenz trajectory at each time step (Fig. 2A). For each measurement probe landmark (in its arbitrary x,y,z location), the data would show the time series of these distance values (Fig. 2B). As before, each of those individual time series does not resemble the original system, but when they are plotted together as three dimensions (without the need for any embedding), they produce a structure that is quite similar to the original (Fig. 2C). As long as the three measurement probes are far enough apart from one another, this recovery of the original structure is virtually guaranteed. Since this particular measurement process goes from three dimensions in one spatial arrangement to three dimensions in another spatial arrangement, it is not really a reconstruction in this case, but more of an arbitrary coordinate transform. Finally, what if the observer had only one relatively arbitrary measurement probe attached to their system of interest and still held the goal of building a linking hypothesis that could allow that probe to provide insight into the underlying structure of that system? By essentially combining the power of state space reconstruction (Fig. 1) with the power of an arbitrary probe’s Euclidean distance (Fig. 2), one may still be able to recover the general structure of the hidden system – as long as the arbitrary probe is not in an unlucky location. Each row of Fig. 3 shows a different placement of a single landmark measurement probe (LM) recording distance values from the Lorenz attractor system (leftmost panels). In each case, this one-dimensional time series (middle panels) does not provide a visually satisfying link to what the original system looks like. (Although, one can perhaps discern that the lower time series has two rather discriminable oscillation regimes.) However, when this one-dimensional time series is embedded into three dimensions, via state space reconstruction (rightmost panels), it becomes clear that some landmark locations are indeed better than others. While the middle and lower rows show that their landmark placement resulted in reasonable reconstructions of the original Lorenz pattern, the top row involves a landmark placement (buried near the middle of the Lorenz attractor) that makes it difficult to distinguish the distance measures of one orbit lobe from the distance measures of the other orbit lobe. Naturally, if the landmark were to be placed almost equidistantly in between the two centers of the Lorenz loops, then its state space reconstruction will not look at all like the original Lorenz pattern. (This is reminiscent of the “accidental viewpoint” in object recognition research; Tarr et al., 1998). But as long as the landmark happens to be substantially farther away from one loop’s center than it is from the other (see middle and bottom rows), then the reconstruction will indeed resemble the original (see also Duch & Dobosz’s, 2011, “fuzzy symbolic Fig. 2. With three measurement probe landmarks (LM) placed in arbitrary locations around the iterated Lorenz system (A), individual time series will each fail to resemble the original structure (B). However, when those three landmarks have their distance values plotted together in x,y,z space, the result is a relatively faithful coordinate transform of the original system.(C). 4 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 3. With only one landmark measurement probe (LM) placed in an arbitrary location around the iterated Lorenz attractor system (leftmost column), individual time series will each fail to resemble the original structure (middle column). But as long as the landmark placement has a sizeable difference between its distance from one Lorenz loop and its distance from the other loop (middle and lower rows), then the state space reconstruction can be successful. dynamics”). By using simulated systems to explore the linking hypotheses between a system and the measurement of that system, one can find where the weaknesses are in one linking hypothesis compared to another (Spivey, 2018). One of the key lessons from this exploration of the Lorenz attractor is that different measurement techniques require different methods of analysis for building the linking hypothesis between the measured data and the inferred structure of the underlying system. Another key lesson here is, whatever your linking hypothesis, collecting multiple samples over time and paying attention to the specific order of those samples provides far richer information about the system in question than simply collecting a single data point at the end of each event. The visual world paradigm has taught us the same lesson. of spoken word recognition in which multiple lexical representations become partially active to varying degrees simultaneously as the first few phonemes (or sets of acoustic–phonetic features) are delivered to the auditory system (McClelland & Elman, 1986; see also Marslen-Wilson, 1987). On this timescale of hundreds of milliseconds, as more of the word is heard, most of those lexical items gradually decline in activation until only one remains and becomes the word that is recognized. However, a “cohort” word that shares several initial phonemes with the spoken word will maintain activation for longer than most other alternative words and thus may trigger a brief eye fixation of a competitor object with that name. This parallel distributed processing account first used the TRACE network model of spoken word recognition (McClelland & Elman, 1986) as the model of lexical activations followed by the Luce choice rule as the linking function that converts those smoothlychanging lexical activation curves into simulated smoothly-changing eye-fixation curves (Allopenna et al., 1998). However, since saccadic eye movements are ballistic, they cannot help but produce a bimodal distribution of some trials where a cohort competitor object was fixated and some trials where it was not fixated. This opens the door to an alternative interpretation in which, on those particular trials where a Cohort competitor was fixated, a sequentialized cognitive process initially considered only the competitor (thus triggering an eye movement to that object) and then revised that interpretation to consider only the correct Target (thus triggering a corrective saccade). One potential solution for adjudicating between this alternative account and the parallel distributed processing account is to record a less ballistic motor movement. Computer-mouse movements are usually much less ballistic and thus can curve toward a competitor object a lot, a moderate amount, just a little, or not at all. Spivey et al. (2005) found that when a Cohort object (e.g., a candy) was present in the display while hearing a Target word (e. g., “candle”), the mouse trajectory was reliably more curved than when an irrelevant Control object (e.g., a fork) was present instead of the Cohort object. Moreover, they found that – unlike saccadic eye movements – the ranges of curvatures in the cohort condition did not separate 3. The visual world paradigm Imagine you have a friend touring the U.S. and they call you on the phone to tell you that they left Memphis, Tennessee in the morning and they drove for six hours. That latency may tell you something about how tired they are, but it doesn’t tell you where they went. They could be in Atlanta, Georgia or Tulsa, Oklahoma, or Louisville, Kentucky or Bloomington, Indiana. Or maybe they just drove to Nashville, Tennessee and got lost on the way, taking a curved path. Tracking their actual location every hour would have been much more informative. By recording the sequence of saccadic eye movements to objects (or locations) that are referred to (or implied) in the incoming speech, the visual world paradigm (Tanenhaus et al., 1995) took an important step away from the psycholinguistic tradition of simply collecting a single reaction time at the end of each trial. The paradigm has been used to study a wide variety of real-time language phenomena (e.g., Tanenhaus & Huettig, this issue), but the focus here will be on spoken word recognition. Tracking the eye movements of participants listening to temporarily ambiguous words – such as “candle” and “candy” (SpiveyKnowlton, 1996) or “beaker” and “beetle” (Allopenna et al., 1998) – provided especially informative confirmation of a mainstream account 5 M.J. Spivey Brain Research 1851 (2025) 149477 into a bimodal distribution. With unimodal distributions, most of the Cohort condition trials exhibited a moderate amount of curvature and most of the Control condition trials exhibited a small amount of curvature. However, even with mouse-tracking, there exists another possible alternative interpretation in which a sequentialized cognitive process initially delays commitment to an interpretation and simply drives mouse movement straight upward for a short amount of time or a long amount of time, before finally committing to the correct interpretation and curving the mouse toward the Target object (van der Wel et al., 2009). In such a scenario, longer periods of uncertainty (e.g., due to the presence of a Cohort competitor object) will result in greater curvature. But, in this account, there is no reason for there ever to be trajectories that first venture into the competitor’s half of the screen before then curving toward the correct Target object (Spivey et al., 2010). Interestingly, that fraction of trials in which the mouse trajectory does lean first into the wrong half of the display before finally curving toward the correct Target object have been argued as evidence for cognitive events in which there was a “discrete change of mind” (Kieslich et al., 2020). Given this back-and-forth of alternative accounts, it seems clear that some more explicit treatment of a linking hypothesis is needed in both eyetracking and mousetracking. Eye movements are generally triggered earlier than mouse movements, thus making them more sensitive to the onset of a change in cognitive activation levels. However, since those eye movements tend to be ballistic saccades in these display conditions, they are imperfect at revealing the gradations in those changing activation levels. The smooth continuous movement of a computer mouse, while slower than eye movements, may be better at revealing those gradations. For resolving debates regarding sequentialized stage-based accounts of cognition versus parallel distributed processing accounts (Spivey, 2023), it seems clear that converging evidence from eye- and mouse-tracking at the same time may be crucial (Levy, 2014; Magnuson, 2005). The following simplified simulation is intended to assist the pursuit of more experiments that combine both eye- and mouse-tracking at the same time. The simulation poses as a conjoined linking hypothesis for how eye and hand can reveal cognitive processes, and it takes into account the stochastic aspects of saccade generation (Leach & Carpenter, 2001), the minimum fixation duration (McMurray, 2023), the interactions between eye and hand (Levy, 2014), and the feedback loop of motor output influencing sensory input (Spivey, 2023). This simplified simulation demonstrates how smoothly changing cognitive activation levels can generate abrupt eye movements, and how averages of those eye movement patterns can then resemble the original smooth activation curves. It also shows how those smoothly changing cognitive activations can cause smoothly changing mouse movements. And, importantly, it provides an existence proof for how smoothly changing parallel cognitive activations can generate mouse trajectories that lean first into the wrong half of the display before finally curving toward the correct Target object – without the need to postulate “discrete changes of mind” to account for those trajectories. simulation of a linking hypothesis allows one to probe into the hypothesized activations over time. In this case, it is during a spoken word recognition task that records both mouse movements and eye movements. Not only can it reveal how cognitive processes may produce various motor outputs, but it can also provide insight into how ongoing motor output might influence in-progress cognitive processes. For example, if the “eyes” in the model happen to semi-randomly fixate a competitor object in the display while it is hearing a temporarily ambiguous spoken word, the foveation of that object boosts the activation of that competitor in the model’s internal cognitive processing via feedback signals. Fig. 4 shows a schematic diagram of this simple localist attractor network. Although individual lexical items in the mind are surely not represented by individual biological units in the brain, they may often behave in a “functionally unitized” fashion (Stone & Van Orden, 1989), and simulating them as individual nodes allows for easy tracking of their activation values in a simplified implementation of a linking hypothesis. At the beginning of a simulated trial with this network, the Visual vector has two nodes activated at 1.0 for the two objects that are present and 0.0 for the other nodes. (The normalization process in this competition algorithm instantly converts those two 1.0 values to 0.5 each.) On each of timesteps 2 through 6, a phoneme is delivered to the lexical layer where a given word node receives 1.0 external input activation if that phoneme at that time is consistent with that word, and 0 external input activation if not. Thus, for the Lexical vector in Fig. 4, inputting the word “candle” at timestep 2 involves external additive input of [1 1 0 0] for the phoneme /k/ because that phoneme is consistent only with the candle and candy lexical nodes. The next timestep involves external additive input of [1 1 1 0] for the phoneme /æ/, then [1 1 1 0] for /n/, then [1 1 1 0] for /d/ because those phonemes are consistent with the candle, candy and handle lexical nodes. Finally, on the sixth timestep, external additive input to the Lexical vector is [1 0 1 0] for the phoneme /l/ because that phoneme is consistent only with the candle and handle lexical nodes. As the word “candle” has five phonemes and typically lasts about 300 ms, the model treats each time step as lasting 60 ms. At the beginning of each timestep, before any passing of activation takes place, the Lexical and Visual vectors are normalized by dividing each of their nodes by the sum of that vector, thus forcing the vector to sum to 1.0, as follows: Lexicaln,t = Lexicaln,t / Σ Lexicaln,t Visualn,t = Visualn,t / Σ Visualn,t For the next phase within a given timestep, the Lexical and Visual vectors perform a pointwise sum to generate the activation of the Integration layer. Notably, this Integration layer does not accumulate this 4. A localist attractor network A small localist attractor network (using the Normalized Recurrence competition algorithm; McRae et al., 1998; Spivey & Tanenhaus, 1998) is described here that takes both visual and auditory inputs and produces simulated eye movements and simulated computer-mouse movements. This simplified simulation is, of course, not a model of spoken word recognition. Nor is it a general model of eye movements and mouse movements. It is simply an existence proof of a possible linking hypothesis between lexical processes and eye- and mouse-movements that allows one to explore how parallel lexical activations might generate the varieties of vacillatory eye movements and curved mouse movements that are observed in the visual world paradigm. Similar to the Lorenz system explorations in Figs. 1–3, this Fig. 4. Localist attractor network simulation of the linking hypothesis between lexical activations and eye- and mouse-movements. All connections are bidirectional. (Activation values come from timestep 5 of the Cohort condition trial in Fig. 10.). 6 M.J. Spivey Brain Research 1851 (2025) 149477 input from Lexical and Visual on top of its previous activation values. The activation pattern is overwritten by this pointwise sum from the Lexical and Visual vectors, and then normalized, as follows: Integrationn,t = Lexicaln,t + Visualn,t Integrationn,t = Integrationn,t / Σ Integrationn,t The Visual vector serves as a record of where visual attention is being distributed at any point in time and its activation pattern is sent to two different motor output processes, one for eye movements and one for computer-mouse reaching movements (not unlike Munakata’s, 1998, network model of the A-not-B error). The Visual vector simply copies its activation to the Mouse vector, from which outputs are generated. Mouse movement is generated each timestep by changing the X mouse position by 50 pixels negatively (leftward) and positively (rightward) as a weighted function of the activations of the Target and Competitor nodes, respectively. The Y mouse position is simply increased by 50 pixels every time step. Thus, if the Target and Competitor nodes in the Visual and Mouse vectors are exactly equal activation, then the mouse moves straight up by 50 Y pixels with zero change in X pixels. However, as the Target and Competitor object nodes in the Mouse vector gradually become asymmetric in their activation values, the change in X position becomes nonzero and the mouse trajectory begins to curve. For a similar but more biologically realistic simulation of mouse movements that generates appropriate velocity profiles and landing positions, see Spivey et al. (2010). Eye movements are generated stochastically from the Visual vector (which is always normalized to function as a probabilistic distribution). For simplicity, the initial fixation is always triggered on the fourth timestep. In the Cohort condition with only a candle and a candy present in the display (Fig. 4), this means that first fixation is always 50:50 on the candle or the candy, because those first four phonemes are equally consistent with both words, resulting in equal activation for the Target and Competitor nodes in the Visual vector at that point in time. In the Rhyme condition, with a candle and a handle in the display, the candle object node has reached 0.56 activation by that time and the handle object node is at 0.44 activation (due to feedback from the Integration vector). In the Control condition at timestep four, with a candle and a fork in the display, the Target object node in the Visual vector is at 0.57 activation and the Competitor object node is at 0.43 activation at that fourth timestep. The next fixation can be triggered no earlier than timestep 7. As long as there has not been a change in eye position in the last three timesteps (180 ms), then a new fixation is eligible. (A saccadic refractory period of 180 ms was chosen because, with each timestep being 60 ms, it is the closest one can get to the approximate 200 ms refractory period; McMurray, 2023.) Every timestep that a new fixation is eligible, the saccade generation process uses the normalized activation of the Visual vector as a probability distribution from which to randomly sample a new fixation (e.g., Friston, Adams, Perrinet, & Breakspear, 2012). Thus, it is entirely possible for that “new fixation” to actually involve simply maintaining fixation on the currently fixated object. As activation of the correct Target object (i.e., candle) naturally rises over time, the likelihood of fixating it, and of staying fixated on it, increases. Importantly, while an object is being fixated, that object node in the Eyes vector is clamped at 0.55 activation and the non-fixated object’s node is clamped at 0.45 activation. This foveal prominence of 55:45 for the fixated object over the non-fixated object reflects how higher resolution in the fovea will boost the visual salience of that fixated object compared to the other object – particularly when the Eyes vector in this network sends its activation back to the Visual vector. (The parameter values 0.55 and 0.45 were chosen after it was observed that values 0.60 and 0.40 occasionally caused the model to select the wrong object.). The final phase within a given timestep for the normalized recurrence competition algorithm involves sending feedback from the Integration vector to the Lexical and Visual vectors. In this network (Fig. 4), the Eyes and Mouse vectors also send feedback to the Visual vector. Feedback involves adding a pointwise multiplication of the two vectors to the existing activation values in the vector receiving the feedback, as follows: Lexicaln,t+1 = Lexicaln,t + Integrationn,t * Lexicaln,t Visualn,t+1 = Visualn,t + Integrationn,t * Visualn,t + Mousen,t * Visualn,t + Eyen,t * Visualn,t After receiving this feedback, the next timestep begins with these vectors normalizing themselves again, to sum to 1.0, thus inducing a kind of lateral inhibition within each vector (but without the exponential term used by softmax or the Luce choice rule). Moreover, since this feedback is multiplicative, it allows the zeros in a vector to prevent the spread of information that has been categorically ruled out by that vector. For example, when the Visual vector receives feedback from the Integration vector supporting activation for an object that is not present in the display, multiplying it by the zero in the Visual vector prevents that spread. This feedback also causes the Visual vector to receive a mild bias toward whichever object the Eyes vector is fixating, as that Eyes vector (after timestep three) is always set at 0.55 activation for the fixated object and 0.45 activation for the non-fixated object. 5. Eyetracking-only simulation As a first “reality check” on the model to test it out with more typical visual world paradigm conditions, it was run with the Mouse-movement vector disconnected, and with four objects in the display, i.e., candle (Target), candy (Cohort), handle (Rhyme), and fork (Control). (In order to get interpretable movement trajectories, mouse-tracking experiments usually have only two response objects to choose among.) All model parameters listed above were the same in this simulation, except that the Eyes vector used 0.35 for the fixated object and 0.22 for each of the three non-fixated objects. This mild enhancement of the fixated object feeds back to the Visual vector and subtly biases it toward the object that is already being fixated. By converting the 0.35 activation values in the Eyes vector into 1.0 fixations of the corresponding object, Fig. 5 shows the average of 100 simulated trials, producing a pattern of early fixations of the Cohort object (candy) and later fixations of the Rhyme object (handle), generally similar to that seen in human data (Allopenna et al., 1998). Notably, at timestep 4 in this model, a first fixation is always forcibly triggered. At this early point in the simplified phonemic input, the Lexical vector has only exerted relatively minimal influence on the Visual vector. Therefore, early fixations of the different objects are somewhat exaggerated, compared to human data. Fig. 5. Average of 100 Eyetracking-Only trials, with all four objects in the display. With the spoken word “candle,” simulated fixations of the Target object candle (diamonds) rise quickly, but simulated fixations of the Cohort object candy (squares) also rise early on. Slightly later, simulated fixations of the Rhyme object handle (plus signs) rise briefly, as seen with human data. Fixations of the filler Control object fork (dashed line) do occur, but they are less common. 7 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 6 shows the results for when feedback from the Eyes vector to the Visual vector is turned off in this eyetracking-only simulation. Relatively similar results are generated, but the rise in fixations of the Cohort and of the Rhyme are slightly less pronounced than in Fig. 5. In general, visual feedback from the action-perception cycle appears not to be especially crucial for simulated data generated by this eyetracking-only version of the model. It will, however, prove crucial for fitting human data when the model is designed to generate both simulated eye movements and simulated computer-mouse movements. 6. Eye-and-mouse simulations When the Mouse vector is connected to the model, to generate simulated mouse movements, only two objects are present in the display (upper left and upper right) at any one time, as was the case in Levy’s (2014) eye-and-mouse-tracking experiment. Thus, when eye movements are measured from this version of the model (Fig. 7), there are only two objects to look at. Therefore, when the proportion of simulated fixations is graphed, it produces a substantially different plot. Fig. 7 shows averaged fixations from 100 simulated trials in the Cohort (candle and candy) condition, 100 simulated trials in the Rhyme (candle and handle) condition, and also 100 simulated trials in the Control (candle and fork) condition. The Cohort condition shows a late separation (around timestep 10) between averaged curves for fixating the Target object (asterisks) and fixating its competitor Cohort object. The Control condition shows an early separation (around timestep 7) between averaged curves for fixating the Target object (x’s) and fixating its competitor Control object. And the Rhyme condition shows an intermediate separation (around timestep 9) between averaged curves for fixating the Target object (circles) and fixating its competitor Rhyme object. As noted in the discussion of Fig. 4, the phonemic input patterns for this model are highly simplified, merely inputting a 1.0 if the incoming phoneme matches the phoneme in that position for a given word node in the Lexical vector, and a 0.0 otherwise. Normalization smooths those values out slightly and the multiplicative feedback from the Integration vector does as well. Importantly, that feedback from the Integration layer is influenced by the activation pattern in the Visual vector (since the Integration vector is just a normalized sum of Lexical and Visual). Thus, while word biases flow from the Lexical vector to the Integration vector, and then feedback to the Visual vector drives visual attention toward the object that has been named, biases in the Visual vector will also influence the activation patterns in the Integration vector and then (via feedback) in the Lexical vector, causing the lexical representations of visually attended objects to rise in activation slightly faster. (This mutual feedback between Visual and Lexical vectors, via the Integration Fig. 7. Averaged Target fixations from 100 simulated trials in each of the Control (x’s), Rhyme, (circles) and Cohort (asterisks) conditions. Averaged fixations of the Competitor objects in those conditions are solid lines without symbols. vector, has also been used to simulate reaction times in linguistic visual search tasks; Reali et al., 2006.) Fig. 8 shows the average Lexical activation over time for the Target word node, “candle,” in the Control condition (where the Control distractor occasionally drew the eyes away), compared to the Cohort and Rhyme conditions (where the distractor object frequently drew the eyes away). Thus, a concrete prediction from this model is that the activation of lexical representations themselves is slightly influenced by the name of the object that is currently being attended and/or fixated. This claim is consistent with the finding that, even in a silent visual search task, objects with longer spoken names elicit longer gaze durations (Zelinsky and Murphy, 2000). These promising simulated eye-movement results (e.g., Figs. 5–7) come from a highly simplified implementation of a linking hypothesis for the visual world paradigm that includes: a) parallel processing of continuously changing graded activations, b) delays in saccades due to the minimum duration of fixations, c) feedback from the perceptual results of oculomotor behavior, and d) computer-mouse movements as well. As a transparent linking hypothesis, this model allows one to inspect what the lexical activation curves themselves might look like (e. g., Fig. 8) when the simulated eye-movement data are being generated. When the simulated computer-mouse trajectories are examined next, it becomes clear that the feedback from the Eyes vector to the Visual Fig. 8. Averaged activation over time of the “candle” lexical node, from 100 simulated trials, in each of the Control (+’s), Cohort (asterisks), and Rhyme (circles) conditions. Fig. 6. Average of 100 Eyetracking-Only trials with feedback from the Eyes vector turned off. Results are generally similar to those in Fig. 5. 8 M.J. Spivey Brain Research 1851 (2025) 149477 vector is responsible for generating a variety of mouse trajectory shapes. Here, we take advantage of this model’s transparency to examine some activation curves inside it while it produces simulated mouse trajectories. Fig. 9 shows a single simulated Cohort condition trial in which the stochastic triggering of a saccade resulted in an initial fixation of the Target object, i.e., the candle. Follow the asterisks in Fig. 9C. In the Eyes vector, fixation of an object entails foveal prominence activation of 0.55, while the non-fixated object in peripheral vision has 0.45 activation. During timesteps 9 through 11 in Fig. 9C, there is a brief fixation of the competitor Cohort object, candy, followed by a return to the candle (asterisks). By that time, the Visual vector (Fig. 9B) and the mouse trajectory (Fig. 9D) have developed substantial momentum in favor of the Target object, so the late brief fixation of the competitor Cohort object has minimal effect on them. Compare Fig. 9 to Fig. 10, which shows a simulated trial in which an early fixation of the Cohort competitor (Fig. 10C, solid line) resulted in a computer-mouse trajectory that forayed initially into the Cohort’s half of the screen before finally curving toward the correct Target object (Fig. 10D). This results from feedback that the Visual vector is receiving from the Eyes vector (which is implementing foveal prominence in favor of the fixated competitor Cohort object from timestep 4 to 10), causing the competitor Cohort object node in the Visual vector (solid line in Fig. 10B) to be briefly higher in activation than the Target object node (asterisks in Fig. 10B). The Visual vector copies its activation pattern to the Mouse vector and those values are used as weights for averaging a leftward mouse movement (toward the Target) and a rightward mouse movement (toward the competitor Cohort). Fig. 10 is a concrete manifestation of how Levy’s (2014) eye-andmouse-tracking results may have emerged. Recall that he found that the subset of trials in which participants looked at a competitor object elicited greater mouse movement curvature than the subset of trials in which participants did not look at a competitor object. In Fig. 10C, the initial fixation of the candy (timesteps 4–10) not only sends feedback that causes greater activation of the candy node in the Visual vector (Fig. 10B), and causes indirect feedback to the Lexical vector to result in a longer lingering partial activation of candy’s word node (recall Fig. 8), but it also causes the mouse to move partly into the candy’s half of the display screen (Fig. 10D). Since the Cohort competitor object happened to be fixated early and long in this trial, the mouse movement was exceptionally curved, much as was observed in Levy’s (2014) eye-andmouse-tracking experiment. It is worth noting that a computer-mouse trajectory with a shape like that in Fig. 10D would be categorized as a “discrete change of mind” by Kieslich et al. (2020). However, one can see from examining the activation curves in the Lexical and Visual vectors (Fig. 10A and 10B) that – in this simulation – it did not actually arise from an early discrete decision that was later overturned. Rather, both the Lexical and Visual vectors clearly show early uncertainty that is gradually (and nonmonotonically) resolved over time. This non-monotonicity results from the one and only place where stochasticity has been introduced Fig. 9. In this simulated Cohort condition trial, panel A shows activation of lexical items, with candle (asterisks) and candy initially rising together. At timestep six, candy is ruled out by the /l/ phoneme and drops in activation, while the handle lexical node rises some more. Eventually, feedback from the Integration vector causes the candle node to win the competition. Panels B and C show the Visual and Eyes vectors, while D shows the simulated mouse trajectory. 9 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 10. In this simulated Cohort condition trial, the stochastic triggering of a saccade resulted in an initial fixation of the cohort competitor (the candy, solid line in Panel C). Feedback from the Eyes vector to the Visual vector results in a temporary bias toward the Cohort object (solid line in panel B). This mild bias generates a slight curve of the mouse trajectory briefly into the competitor Cohort’s half of the display screen before turning toward the Target object. 7. Simulated mouse trajectories into this simplified model: the triggering of eye movements. At timestep 4, a stochastic triggering of an eye movement (Fig. 10C) produced an initial fixation of the competitor Cohort object (the candy), where the Eyes vector temporarily has a foveal prominence of 0.55 activation (solid line) for the candy and 0.45 activation (asterisk line) for the candle. Feedback from this Eyes vector to the Visual vector (Fig. 10B) temporarily causes slightly greater activation of the visual candy node (solid line) compared to the visual candle node (asterisk line). The activation patterns from the Visual vector continuously flow into the Mouse vector to generate initial movement somewhat toward the competitor Cohort object (Fig. 10D). Several timesteps later, the stochastic triggering of a new eye movement produces a fixation of the candle (at timestep 11 in Fig. 10C), and feedback now guides the Visual vector and thus the Mouse vector toward the correct Target object. This simulation stands as an existence proof indicating that a mouse trajectory that curves initially into a competitor object’s half of the screen and then curves toward the correct Target object can emerge from a continuous cascaded flow of parallel activation patterns that temporarily produce a graded prominence in activation of the competitor (due to an initial brief eye fixation of the competitor object). As can be seen by the smooth graded activation patterns over time in Fig. 10A and 10B, an exceptionally-curved trajectory like this does not necessarily entail that there was a discrete change of mind. Fig. 11 shows a set of 22 different types of trajectories that are produced in the Cohort condition. Among the roughly half of trials in which the first fixation was on the Target object, there is a small variety of about seven types of trials in which the Competitor cohort was never fixated, or briefly fixated, or fixated for a lengthy period of time, or the eyes flitted back and forth a few times. The resulting feedback from the Eyes vector on those Target-fixated-first trials produces a tightly clustered group of very similar trajectories that all have substantial curvature (see the cluster of seven trajectory types in Fig. 11 that do not venture into the right half of the display screen). By contrast, among the half of trials in which the first fixation was on the competitor Cohort object, there is a wider variety of types of trials, all with exceptionallyhigh mouse curvature. Since the competition process takes longer in these Cohort-fixated-first trials, there is a greater timespan within which to have subtle variations of fixating the Cohort for three to eight timesteps before finally fixating the Target, or fixating the Cohort, then the Target, then the Cohort again, then finally the Target, etc. These different types of eye movement sequences send feedback to the Visual vector which then guides the formation of the computer-mouse trajectory in a fashion that produces about 15 different types of exceptionallycurved trajectories – all of which venture initially into the right 10 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 11. Simulated computer-mouse trajectories from the Cohort condition (candle and candy). The exceptionally curved trajectories come from trials where the first fixation was on the candy. Fig. 13. Simulated computer-mouse trajectories from the Control condition (candle and fork). It is noteworthy that the mouse trajectories in Fig. 13 end up somewhat lower in the display than in Figs. 11 and 12. Instead of applying polynomial equations with the Mouse activation patterns that generate realistic velocity profiles toward a pre-specified target location (Spivey et al., 2010), this simple model just adds 50 y-pixels to the mouse cursor on each time step. Thus, with the resolution of uncertainty happening quickly in the Control condition, a smaller number of time steps results in a smaller number of y-pixels traversed. Interestingly, in this Control condition, feedback from the Integration vector to the Lexical vector causes some subtle early separation of the Lexical activation values for candle and candy beginning as early as timestep 3, compared to the separation at timestep 6 in the Cohort condition. Essentially, the presence of a candle, and the absence of a candy, allows the lexical recognition process to get a slightly early start inside the Lexical vector. In this simplified simulation of a linking hypothesis, the temporary Lexical uncertainty (between “candle” and “candy”) is being partially resolved by the Visual certainty that there is no candy in the display. Although feedback may have subtle effects in the Lexical vector of this network, the feedback from the Eyes vector to the Visual vector is quite powerful. The most-curved mouse trajectories are due to trials in which the competitor Cohort object was fixated (sometimes twice) and the foveal prominence feedback from the Eyes vector caused the Visual vector to drive the Mouse vector toward that competitor Cohort object. When the feedback from the Eyes vector is removed from the network, the computer-mouse trajectories change dramatically. In an alternative architecture for this model (as done in Fig. 6 with eyetracking-only), a version was simulated with the feedback from the Eyes vector to the Visual vector turned off (e.g., removing the final addition term in the Visual feedback equation). In this mode, the Visual vector does not receive feedback from the Eyes vector that subtly boosts activation of whatever the fovea is pointed at. In this version of the model, the averaged eye fixations (much like Fig. 7) still show early separation for the Control condition, late separation for the Cohort condition, and intermediate separation for the Rhyme condition. However, unlike the human data, this no-feedback-from-Eyes version of the model never produces computer-mouse trajectories that venture into the competitor Cohort object’s half of the display screen. In fact, across a variety of trials with different fixation patterns, this no-feedback-from-Eyes version of the model produces the exact same substantially-curved mouse movement every time in the Cohort condition, and the exact same moderately-curved mouse movement every time in the Rhyme condition, and the exact same slightly-curved mouse movement every time in the Control condition (Fig. 14). This makes (competitor Cohort) half of the display screen. In the Rhyme and Control conditions of this simulated linking hypothesis, fixations of the competitor object are briefer and less common, and they never result in a mouse trajectory that veers into the competitor object’s half of the display screen. The Rhyme condition produces computer-mouse trajectories (Fig. 12) that are similar in curvature to the subset of Cohort trials in which the Target object was fixated first (the less curved cluster in Fig. 11). The slightly less-curved set of trajectories in Fig. 12 come from Rhyme trials in which the Target object was fixated first, and the slightly more-curved set come from Rhyme trials in which the Rhyme object was fixated first. In the Control condition (Fig. 13), the model triggers a first fixation of the competitor Control object (the fork) almost half the time, but it is brief and is rarely fixated after the Target object has been fixated. Since the activation of the fork node in the Lexical vector is always at zero (because feedback is multiplicative), the Visual vector’s fork node drops in activation precipitously despite any briefly supportive feedback from the Eyes vector. On about half of the trials, the fork is initially fixated for 3, 4, or 5 timesteps and then the candle is finally fixated. On the other half, only the candle is ever fixated. The Target-fixated-first trials produce computer-mouse trajectories with minimal curvature, and the competitor-fixated-first trials produce trajectories with only slightly more curvature. Fig. 12. Simulated computer-mouse trajectories from the Rhyme condition (candle and handle). 11 M.J. Spivey Brain Research 1851 (2025) 149477 Fig. 14. Mouse trajectories from the version of the model with feedback from the Eyes vector turned off. In each panel, 100 trials are overlaid on top of one another. Although eye-movement patterns varied on each trial, mouse trajectories did not. Results are quite different from those in Figs. 11–13, where the Eyes vector was allowed to send feedback to the Visual vector. before a response selection process has been fully completed that allows these two measures to provide evidence for parallel partial-activation of multiple representations during an ambiguous or uncertain period of time in the cognitive process of interest. An important consequence of the eyes moving before a response selection process is complete is that new visual input is thus delivered to this response selection process while it is still in progress. Therefore, while changes in ongoing cognitive processes naturally produce changes in motor output, it also appears to be the case that changes in the perceptual results of motor output cause changes in those same ongoing cognitive processes (see also, Lepora & Pezzulo, 2015; Nakayama et al., 2023; Spivey & Dale, 2011). Due to this action-perception cycle allowing ongoing motor output to influence ongoing cognitive processes, both eyetracking (Grant & Spivey, 2003) and mousetracking (Lepora & Pezzulo, 2015) are particularly well suited to reveal the many ways in which cognition is grounded not just inside the brain but also in the body and its actions (e.g., Chemero, 2009; Cook & Tanenhaus, 2009; Gibbs, 2005; Pezzulo et al., 2011; Shapiro, 2019; Spivey, 2020). This feedback loop from ongoing motor output to ongoing sensory input, thereby influencing ongoing cognitive processing, means one should expect that the visual input from the object first looked at in a visual world experiment (especially during the ambiguous portion of the linguistic input) will, itself, influence the ongoing internal cognitive processes. One can easily imagine a variety of new visual world paradigm experiments (with or without mouse-tracking) that introduce realtime manipulations of the task or of the objects in the field of view (perhaps with a saccade-contingent display change) at just the right time to surreptitiously alter the eye movement pattern and thus influence the lexical activations over time. Subtle manipulations in the display can gently bias how the eyes peruse the response options and thereby influence the decision (Grant & Spivey, 2003; Pärnamets et al, 2015). In fact, even a slightly biased mouse-cursor plotting algorithm can smoothly curve an uncertain decision-making process toward a different option (Falandays, Spevack, Pärnamets, & Spivey, 2021b). Exploring the parameter space for the feedback from the Eyes vector to the Visual vector and its consequences on mouse movements suggests that the foveal prominence parameter, set at 55:45 in this linking hypothesis simulation, plays an important role in influencing the results. This simulated linking hypothesis is rather sensitive to the particular settings for that crucial parameter. When that parameter was tested instead at 60:40, the simulation produced occasional errors (selecting the competitor object in the end, which is rare in most visual world paradigm experiments). This may indicate that retinal size and visual contrast of the response options, as well as viewing angle of the entire display, may turn out to be very important methodological specifications in any mousetracking experiment (see also Kieslich et al., 2020, for an excellent list of methodological recommendations). If the non-fixated sense because the stochastic triggering of eye movements is the only random variation in this model, and that randomness is cordoned off from the rest of the network when its feedback is turned off. In this nofeedback-from-Eyes version of the model, trials with fixations of the Competitor object produce the same trajectory curvature as trials without fixations of the Competitor object – very unlike the data reported by Levy (2014). Thus, by exploring modifications of the architecture of this simulation, it becomes clear that – at least, in the context of this particular network architecture – those mouse trajectories that curve first into the competitor Cohort object’s half of the display and then curve back to the Target are indeed due specifically to trials in which the first fixation was on the Cohort object, thus subtly boosting that object’s activation in cognitive processes, followed by a corrective saccade to the correct Target object. Feedback from the Eyes vector to the Visual vector is necessary for the simulation’s output to resemble Levy’s (2014) eyeand-mouse-tracking data. These simulations suggest that immediate feedback from the perceptual results of oculomotor behavior may be an important component of the linking hypothesis for the visual world paradigm. Not only do ongoing cognitive processes quickly influence where the eyes go, but where the eyes go also quickly influences those ongoing cognitive processes (e.g., Spivey & Dale, 2011). 8. Discussion Both eyetracking and mousetracking take advantage of the fact that a given cognitive process can still be in-progress when those motor outputs are being executed, thus allowing them to provide a time series of data from the temporal course of that cognitive process. Although a common stage-based assumption in cognitive psychology for decades held that motor output was not initiated until its concomitant cognitive process was completed, evidence for motor output being executed before a cognitive process was complete had been occasionally reported in a variety of areas in the cognitive and neural sciences. For example, Coles et al. (1985) demonstrated that a partially activated (but not executed) response option can still result in below-threshold muscle activity for that non-selected response hand. Abrams and Balota (1991) showed that continuously-valued activation of a response option can affect not only the reaction time of that response but also the force with which the motor movement is carried out. Gold and Shadlen (2000) found that microstimulation of neurons in the frontal eye fields reveal that below-threshold activation of an alternative oculomotor command can sway the direction of the actually-executed eye movement. And Cisek and Kalaska (2010) review a wide range of electrophysiological evidence in motor cortex for two simultaneously competing skeletomotor commands early on during movement planning and execution. According to most linking hypotheses for the visual world paradigm, it is precisely because these eye and hand outputs are occasionally triggered 12 M.J. Spivey Brain Research 1851 (2025) 149477 response option is too far in the periphery of the visual field or is simply of low contrast, then the relative activation of its internal representation may suffer substantially, and the act of semi-stochastically fixating one of the response options first may occasionally result in the participant choosing that first fixated response option, just like the model does – even if it is transparently the wrong object. This exploration of the foveal prominence parameter in this linking hypothesis predicts that small retinal size (or low visual contrast) of the response options, as well as wide retinal angles between the two response options, could cause an increase in errors and in trajectory curvatures. These linking hypothesis simulations may also provide some insight toward understanding the occasional presence of bimodal curvature distributions in some mousetracking experiments. Fig. 8 shows one cluster of mouse trajectories from Target-fixated-first trials that is somewhat separate from another cluster of mouse trajectories from Cohort-fixated-first trials. When bimodality does occasionally show up in the distribution of mouse trajectory curvatures (e.g., Freeman & Dale, 2013; Kieslich et al., 2020), this simplified simulation of a linking hypothesis suggests that it may be due simply to one group of trials in which the non-chosen response option was initially fixated and another group of trials in which the chosen response option was initially fixated. By tracking eye movements and mouse movements at the same time, and examining these distributions, a deeper understanding of those sometimes-bimodal distributions of mouse curvatures might be achieved. Perhaps they are the result of a feedback loop from eye movements to cognition. fixation data (McMurray et al., this issue); for a recent review, see Ito and Knoeferle (2023). In mousetracking, innovative analysis methods include decision landscape visualizations based on trajectories (Zgonnikov et al., 2017), cluster analysis of trajectories (Wulff et al., 2019), and state-space dynamic modeling of trajectories (Calcagnì et al., 2019). That said, none of those statistical methods quite function as processing models that can pose as a linking hypothesis between activations of internal representations and the measured eye- and handmovements. What has proven more difficult in the visual world paradigm literature is settling on the right linking hypothesis (e.g., Allopenna et al., 1998; Farmer et al., 2007; Huettig et al., 2011; Kukona & Tabor, 2011; Magnuson 2019, this issue; Mayberry et al., 2009, McMurray, 2023; Tanenhaus et al., 2000; Teruya & Kapatsinski, 2019). Although the internal dynamics of lexical activation are often assumed to be relatively smooth and continuous over time, saccadic eye movements in the visual world paradigm are sequential and vacillatory (Tanenhaus et al., 1995). At the same time, computer-mouse movements in the visual world paradigm are smoothly-curving and sometimes non-monotonic (Spivey et al., 2005). These differing types of motor outputs pose a challenge for a linking hypothesis that is designed to map one set of parallel partiallyactivated lexical alternatives to those two diverse measurements. The simplified simulations herein provide a basis to address that challenge. A good linking hypothesis is crucial for mapping observed data onto hypothesized processes in the internal mechanisms of a system of interest. When those internal mechanisms are difficult to measure directly, recording their indirect output is often all one has. Explorations with measuring a simulation can provide insight into what are good ways to measure and what are bad ways. The Lorenz attractor demonstrations (Figs. 1–3) showcase how a time series of a data stream taken from the system of interest can often be quite informative for reconstructing what the internals of the system look like. (For similar demonstrations with Conway’s Game of Life, see Spivey, 2018). What’s more, the visual world paradigm (Tanenhaus et al., 1995) has been showcasing that same basic insight for three decades now. Collecting just a single data point at the end of an event, such as a reaction time, turns out not to be a very good way to measure a system. Collecting a time series of data throughout the course of the event, such as naturally-occurring saccadic movements of the eyes and smooth reaching movements of the hand, is a better way. When a linking hypothesis has access to a time series (instead of a single data point from each unconnected event), it has a much better chance at reconstructing the processes carried out by those hidden internal mechanisms inside the system of interest. However, different linking hypotheses for the same data will make different predictions about those internal processes. Therefore, any proposed linking hypothesis must be thoroughly examined and tested. For example, given the early visual world paradigm findings of Eberhard et al. (1995), showing that different sets of objects in the display influence the comprehension of spoken words and sentences, it would be imperfect for a linking hypothesis in the visual world paradigm to predict that context would simply not influence spoken word recognition at all (e.g., Forster’s, 1976, autonomous search model). And given the findings of Allopenna et al. (1998), it would be imperfect for a linking hypothesis to predict that competitor Cohort objects would draw errant fixations but competitor Rhyme objects would not (e.g., Marslen-Wilson’s, 1987, cohort model). Moreover, it would be imperfect for the linking hypothesis to ignore the fact that eye fixations have a minimum duration of about 200 ms (McMurray, 2023), and that subthreshold activation levels of lexical items may be changing without triggering an eye movement (Teruya & Kapatsinski, 2019). It might also be imperfect for a linking hypothesis in the visual world paradigm to account only for eye movements (Tanenhaus et al., 1995) and not for computer-mouse movements as well (Spivey et al., 2005). Finally, and perhaps most importantly, it would be imperfect for the linking hypothesis to treat saccadic eye movements as if they were nothing more than an output system, when it is clear that they are in consummate control of a crucial 9. Conclusion Long ago, in a spicy debate with Anne Treisman, Alexander van der Heijden (1996, p. 360) presaged this feedback loop that results from perceptually-driven eye movements filtering how visual perception receives its next inputs: “Treisman is right in doubting whether my formulation ‘perception is for selection and selection is for action’ is the complete story. In my view it is not the complete story, but only a part of it. What is omitted is the role played by eye movements and fixations. … Because eye movements are clearly actions and because these actions determine in large part the visual world that we perceive, it is also easy to see in what direction the complement of my story has to be found: Perception is for selection, selection is for action, and action is for perception.”. Since its introduction (Tanenhaus et al, 1995), tracking eye movements in the visual world paradigm has generated controversy. For example, concerns have been raised about how long the display is previewed and pre-processed before linguistic input is initiated (Andersson et al., 2011) but recent studies have indicated that even subvocally pregenerating the names of the objects is not responsible for the results (Apfelbaum et al., 2021). And, of course, like all psycholinguistic experiments, it is crucial for the visual world paradigm to include a substantial number of “filler” stimuli to ensure that participants do not become accustomed to specific patterns in the critical stimuli (Ferreira & Ferreira, 2024). These concerns are relatively easy to deal with methodologically. For instance, in Spivey et al. (2002), only 18 of the 108 sentences that each participant heard were critical stimuli; the other 90 were “filler” sentences. Given that multiple eye movements, and a curved reaching movement, often precede the completion of a trial in the visual world paradigm, it stands to reason that some of those movements are indeed being triggered before the cognitive process is fully completed. However, these complex time series datasets comprising multiple eye movements and meandering computer-mouse movements can sometimes prove difficult to analyze statistically. This state of affairs has led to the adaptation of a variety of improved statistical methods for the visual world paradigm, including growth curve analysis of eye movement data (Mirman et al., 2008), multilevel logistic regression analysis of eye movement data (Barr, 2008), and the index-based approach that fits curve parameters to 13 M.J. Spivey Brain Research 1851 (2025) 149477 causes its internal activation to temporarily slightly exceed that of the Target object. Of course, this simplified simulation of a linking hypothesis is not without its own lingering imperfections that will need to be corrected in future iterations. A set of simplifications were employed in the present simulation of a linking hypothesis to keep the number of free parameters to a minimum. The extremely simple equations for the normalized recurrence competition algorithm itself do not have to carry any free parameters of their own. However, a few of the other components in the model carry some free parameters that can be adjusted. Since the word “candle” has an average spoken duration of about 300 ms, and it carries roughly five phonemes, delivery of one phoneme per timestep resulted in treating each timestep as 60 ms. With each timestep equivalent to 60 ms, the minimum duration of a fixation was set to three timesteps (180 ms), and the initial fixation was always triggered at timestep four. The foveal prominence values of 0.55 and 0.45 in the Eyes vector were chosen because values of 0.6 and 0.4 occasionally resulted in the final selection of the competitor Cohort object (e.g., error trials, which are rare in the human data). The parameters for the computer-mouse output were simply fixed at + 50 Y pixels per timestep and a range of −50 to + 50 X pixels per time step (weighted by the activation values in the Mouse vector to determine left/right direction). Future expansion of this model could include a variety of improvements on the speech input parameters, such as: a) more timesteps for the vowels, which tend to have longer durations than consonants, b) a larger set of word nodes, and thus more nodes in all vectors, c) feeding the Lexical vector with activation values that come from output from the TRACE model (e.g., Allopenna et al., 1998) or perhaps a noisy-channel Bayesian model of spoken word recognition (e.g., Ryskin & Fang, 2021). Improvements on the mouse output parameters could be implemented by using Henis and Flash’s (1995) polynomial equations for reaching movements with realistic velocity profiles to program two targeted movements to the two objects and execute a single weighted average of the two of them based on the continuously changing activation values in the Mouse vector (see Spivey et al., 2010). As one of those activations/weights gradually ramps up from 0.5 to 1.0, and the other gradually ramps down from 0.5 to 0, the resulting weighted average of those two simultaneous movement commands is a single continuous trajectory that starts out moving toward the midpoint of the two alternative objects, leans initially toward one or the other, and then eventually smoothly curves toward the correct Target object. Improvements on eye movement parameters can be included as well. It is worth noting that irrespective of how the lexical activation curves look in the Lexical vector (e.g., smoother, more of them, etc.), the Visual vector in this network (which drives the Eyes and Mouse vectors) will only have nonzero activations for objects that are actually present in the visual display – and then those activations will get normalized. Thus, if there are two objects, they will always start with activations of 0.5 and 0.5 (Fig. 7). If there are four objects, they will always start with activations of 0.25 each (Fig. 5). This means that a stochastic triggering of the first eye movement always at the same (fourth) timestep will inevitably result in averaged fixation curves showing an initial sudden jump up to that 0.5 (or 0.25) level. Therefore, to obtain smoother initial rising of averaged fixation curves, such as that observed by Spivey-Knowlton (1996) and Allopenna et al. (1998), future versions of this simulation will need to introduce some variance to the timing of that first eye movement. In addition to fine-tuning the free parameters of the present simulations from this model, future work with this simulated linking hypothesis will also need to account for a wider range of findings in spoken word recognition using the visual world paradigm. For example, the network will need to be expanded to include similarity among objects in the display based not only on overlap in acoustic–phonetic properties with other words in that language (Fig. 7) but also with words in the participant’s second language (Marian & Spivey, 2003), and also overlap in semantic features (Yee & Sedivy, 2006), overlap in visual features (and narrow) input system: the fovea. A good linking hypothesis needs to be ready to handle the fact that where the eyes are pointed subtly alters the input for that visual context. To include all these constraints responsibly, Magnuson (2019) suggested that it is necessary for a linking hypothesis in the visual world paradigm to be computationally implemented in an explicit fashion to produce concrete simulations of data. Thus, in order to form a more perfect linking hypothesis, the present simulations employed a simplified localist attractor network (Fig. 4) that uses the normalized recurrence competition algorithm (Spivey, 2007) and avoids all of the imperfections listed above. The conjoined eye-and-mouse linking hypothesis simulated herein can be summarized as follows. Eye position and hand/mouse position are each approximate indicators of what is going on in the mind but they have distinctive limitations in how they provide those data. Since fixations tend to have a minimum duration of about 200 ms, there will be intermittent periods of time during which they are unable to provide an immediate update on what is going on in the mind. Since hand/mouse movements are much slower than saccadic eye movements, they will always be somewhat behind in providing their report of what is going on in the mind (but they will not have those intermittent periods of time during which they provide no update at all). The simulations with this linking hypothesis indicate that continuous graded parallel activations of lexical representations and of visual representations are capable of simultaneously producing both the ballistic dichotomous output of saccades and the smooth graded (and sometimes non-monotonic) reaching trajectories of mouse movements. Most importantly, the conjoined linking hypothesis presented here, with its feedback from the Eyes vector to the Visual vector, suggests that these eye movements are not unidirectional indicators that merely allow an experimenter to get a peek into those cognitive processes. In order to fit the human data, their information must flow bidirectionally. Cognitive processes determine where the eyes go and then the object that the eyes are pointed at immediately influences those same cognitive processes (Pärnamets, et al., 2015; Spivey & Dale, 2011). Given the simplified nature of this localist attractor network and its minimal free parameters, it is encouraging that it fits human data from the seminal visual world paradigm studies of spoken word recognition (Allopenna et al., 1998), also mousetracking of spoken word recognition (Spivey et al., 2005), and the simultaneous combination of the two (Levy, 2014). It allows investigation of hypothesized internal graded activation values that sometimes trigger new eye fixations and sometimes do not (Figs. 8-10). It recreates the effects on eye movements that are exerted by the presence of a competitor Cohort object and by a competitor Rhyme object (Figs. 5 & 7). It recreates the effects on computer-mouse trajectories that are exerted by the presence of those competitor objects as well (Figs. 11-13). And it even recreates Levy’s (2014) finding that trials in which a competitor is fixated tend to be the trials with the greatest mouse trajectory curvatures (see discussion of Figs. 10-11). Importantly, this simulated linking hypothesis includes a feedback loop that allows oculomotor output to subtly change the visual input, i. e., where the eyes are pointed will boost salience of the object on the fovea. When the simulated computer-mouse trajectories are examined (Fig. 11), the inclusion of this action-perception cycle provides a potential explanation for those computer-mouse trajectories that curve first into a competitor Cohort object’s half of the screen before turning toward the Target object. Those exceptionally-curved trials all involved an initial fixation of the competitor Cohort object before a new fixation then went to the Target object. Rather than assuming these trials indicate some form of discrete internal commitment to the competitor Cohort object, followed by a discrete internal correction toward the Target object (Kieslich et al., 2020), this simulated linking hypothesis provides an existence proof for how those non-monotonic computermouse trajectories could emerge from continuously-valued representations that are both partially-active in parallel, along with a stochastic triggering of an eye movement to the competitor Cohort object that 14 M.J. Spivey Brain Research 1851 (2025) 149477 (Huettig & Altmann, 2007), and proximity in state space based on corpus statistics (Huettig, Quinlan, McDonald, & Altmann, 2006). Once the network is handling a sufficient range of findings in spoken word recognition, it will need to be expanded to explore extra-lexical linguistic phenomena as well, such as sentence processing (e.g., Farmer et al., 2007; Kukona & Tabor, 2011; Mayberry et al., 2009). This simulated linking hypothesis could even add a form of anticipation of upcoming linguistic elements, e.g., Altmann & Mirković, 2009, by including an expanded language model that relies on bidirectional pattern completion (see Falandays, Nguyen, & Spivey, 2021a). Finally, with that expanded language model, one could add speech production to the network, allowing this linking hypothesis to eventually simulate eyeand hand-movements in dyadic unscripted conversation in natural joint tasks (e.g., Brown-Schmidt & Tanenhaus, 2008; Dideriksen et al., 2023; Ryskin et al., 2023). Finally, determining whether the localist nature of this simplified network is sufficient to reproduce that variety of results, or whether a more distributed representational scheme may be necessary (e.g., Cree & McRae, 2003; Mirman & Magnuson, 2009), may allow this simplified linking hypothesis to make some advances in our understanding of how words are represented in the human mind and how those representations interact with one another (Spivey, 2023). For example, what exactly does it mean for a lexical representation to be “functionally unitized” (Stone & Van Orden, 1989)? While a localist node set at 0.5 activation may be seen as relatively equivalent to its corresponding distributed representation with exactly half of its features (or microfeatures) activated, what are the differential consequences for processing throughout the network and its action-perception cycle with the environment? The upcoming feedback loop of new experiments and new simulations may help determine whether or not the benefit of increased transparency provided by localist attractor networks (e.g., McClelland & Elman, 1986) is worth losing the increased biological realism provided by fully distributed networks (e.g., Magnuson et al., 2020). However, rather than assuming it is better to use one or the other, perhaps they can both provide mutually supportive and converging insights into the cognitive processes associated with language processing in a naturalistic context – particularly when the oculomotor output forms a feedback loop that alters how the visual input influences ongoing cognitive and linguistic processes. Akiyama, K., Alberdi, A., Alef, W., Asada, K., Azulay, R., Baczko, A.K., Rao, R., 2019. First M87 event horizon telescope results. VI. The shadow and mass of the central black hole. Astrophys. J. Lett. 875 (1), L6. Allopenna, P.D., Magnuson, J.S., Tanenhaus, M.K., 1998. Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. J. Mem. Lang. 38 (4), 419–439. Altmann, G.T., Kamide, Y., 2007. The real-time mediation of visual attention by language and world knowledge: linking anticipatory (and other) eye movements to linguistic processing. J. Mem. Lang. 57 (4), 502–518. Altmann, G.T., Mirković, J., 2009. Incrementality and prediction in human sentence processing. Cognit. Sci. 33 (4), 583–609. Anderson, S.E., Chiu, E., Huette, S., Spivey, M.J., 2011. On the temporal dynamics of language-mediated vision and vision-mediated language. Acta Psychol. 137 (2), 181–189. Andersson, R., Ferreira, F., Henderson, J.M., 2011. I see what you’re saying: The integration of complex speech and scenes during language comprehension. Acta Psychol. 137 (2), 208–216. Apfelbaum, K.S., Klein-Packard, J., McMurray, B., 2021. The pictures who shall not be named: Empirical support for benefits of preview in the Visual World Paradigm. J. Mem. Lang. 121, 104279. Barr, D.J., 2008. Analyzing ‘visual world’eyetracking data using multilevel logistic regression. J. Mem. Lang. 59 (4), 457–474. Bennett, C.M., Baird, A.A., Miller, M.B., Wolford, G.L., 2009. Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: an argument for multiple comparisons correction. Neuroimage 47 (Suppl 1), S125. Brown-Schmidt, S., Tanenhaus, M.K., 2008. Real-time investigation of referential domains in unscripted conversation: a targeted language game approach. Cognit. Sci. 32 (4), 643–684. Burigo, M., Knoeferle, P., 2011. Visual attention during spatial language comprehension: Is a referential linking hypothesis enough. In: Proceedings of the 33Rd Annual Conference of the Cognitive Science Society, pp. 919–924. Calcagnì, A., Lombardi, L., D’Alessandro, M., Freuli, F., 2019. A state space approach to dynamic modeling of mouse-tracking data. Front. Psychol. 10, 2716. Chemero, A., 2009. Radical Embodied Cognitive Science. MIT Press. Cisek, P., Kalaska, J.F., 2010. Neural mechanisms for interacting with a world full of action choices. Annu. Rev. Neurosci. 33 (1), 269–298. Coles, M.G., Gratton, G., Bashore, T.R., Eriksen, C.W., Donchin, E., 1985. A psychophysiological investigation of the continuous flow model of human information processing. J. Exp. Psychol. Hum. Percept. Perform. 11 (5), 529–553. Cook, S.W., Tanenhaus, M.K., 2009. Embodied communication: Speakers’ gestures affect listeners’ actions. Cognition 113 (1), 98–104. Cooper, R.M., 1974. The control of eye fixation by the meaning of spoken language: a new methodology for the real-time investigation of speech perception, memory, and language processing. Cogn. Psychol. 6, 84–107. Cree, G.S., McRae, K., 2003. Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J. Exp. Psychol. Gen. 132 (2), 163–201. Dideriksen, C., Christiansen, M.H., Tylén, K., Dingemanse, M., Fusaroli, R., 2023. Quantifying the interplay of conversational devices in building mutual understanding. J. Exp. Psychol. Gen. 152 (3), 864–889. Dietrich, E., Markman, A.B., 2003. Discrete thoughts: Why cognition must use discrete representations. Mind Lang. 18 (1), 95–119. Driver, J., Spence, C., 1998. Attention and the crossmodal construction of space. Trends Cogn. Sci. 2 (7), 254–262. Duch, W., Dobosz, K., 2011. Visualization for understanding of neurodynamical systems. Cogn. Neurodyn. 5, 145–160. Eberhard, K.M., Spivey-Knowlton, M.J., Sedivy, J.C., Tanenhaus, M.K., 1995. Eye movements as a window into real-time spoken language comprehension in natural contexts. J. Psycholinguist. Res. 24, 409–436. Emberson, L.L., Weiss, R.J., Barbosa, A., Vatikiotis-Bateson, E., Spivey, M.J., 2008. Crossed hands curve saccades: Multisensory dynamics in saccade trajectories. In: Proceedings of the 30Th Annual Conference of the Cognitive Science Society, pp. 369–374. Falandays, J.B., Nguyen, B., Spivey, M.J., 2021a. Is prediction nothing more than multiscale pattern completion of the future? Brain Res. 1768, 147578. Falandays, J.B., Spevack, S., Pärnamets, P., Spivey, M., 2021b. Decision-making in the human-machine interface. Front. Psychol. 12, 624111. Falandays, J.B., Spivey, M.J., 2020. Biasing moral decisions using eye movements: Replication and simulation. In: Proceedings of the 42nd Annual Conference of the Cognitive Science Society, pp. 2553–2558. Farmer, T.A., Anderson, S.E., Spivey, M.J., 2007. Gradiency and visual context in syntactic garden-paths. J. Mem. Lang. 57 (4), 570–595. Ferreira, F., Ferreira, V.S., 2024. Psycholinguistics. In: Frank, M.C., Majid, A. (Eds.), Open Encyclopedia of Cognitive Science. MIT Press. Forster, K.I., 1976. Accessing the mental lexicon. In: Wales, R.J., Walker, E.C.T. (Eds.), New Approaches to Language Mechanisms. North-Holland, pp. 257–287. Freeman, J.B., Dale, R., 2013. Assessing bimodality to detect the presence of a dual cognitive process. Behav. Res. Methods 45, 83–97. Freeman, J.B., Dale, R., Farmer, T.A., 2011. Hand in motion reveals mind in motion. Front. Psychol. 2, 59. Friston, K., Adams, R.A., Perrinet, L., Breakspear, M., 2012. Perceptions as hypotheses: saccades as experiments. Front. Psychol. 3, 151. Ghaffari, M., Fiedler, S., 2018. The power of attention: Using eye gaze to predict otherregarding and moral choices. Psychol. Sci. 29 (11), 1878–1889. Gibbs Jr, R.W., 2005. Embodiment and Cognitive Science. Cambridge University Press. Gold, J.I., Shadlen, M.N., 2000. Representation of a perceptual decision in developing oculomotor commands. Nature 404 (6776), 390–394. CRediT authorship contribution statement Michael J. Spivey: Writing – review & editing, Writing – original draft, Visualization, Software, Project administration, Methodology, Formal analysis, Conceptualization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgement This manuscript benefitted from comments from Bob McMurray, Jim Magnuson, and Chris Kello. Data availability No data was used for the research described in the article. References Abrams, R.A., Balota, D.A., 1991. Mental chronometry: beyond reaction time. Psychol. Sci. 2 (3), 153–157. 15 M.J. Spivey Brain Research 1851 (2025) 149477 Grant, E.R., Spivey, M.J., 2003. Eye movements and problem solving: Guiding attention guides thought. Psychol. Sci. 14 (5), 462–466. Griffin, Z.M., Bock, K., 2000. What the eyes say about speaking. Psychol. Sci. 11 (4), 274–279. He, B.J., Zempel, J.M., Snyder, A.Z., Raichle, M.E., 2010. The temporal structures and functional significance of scale-free brain activity. Neuron 66 (3), 353–369. Henis, E.A., Flash, T., 1995. Mechanisms underlying the generation of averaged modified trajectories. Biol. Cybern. 72 (5), 407–419. Huettig, F., Altmann, G.T., 2005. Word meaning and the control of eye fixation: Semantic competitor effects and the visual world paradigm. Cognition 96 (1), B23–B32. Huettig, F., Altmann, G.T., 2007. Visual-shape competition during language-mediated attention is based on lexical input and not modulated by contextual appropriateness. Vis. Cogn. 15 (8), 985–1018. Huettig, F., Quinlan, P.T., McDonald, S.A., Altmann, G.T., 2006. Models of highdimensional semantic space predict language-mediated eye movements in the visual world. Acta Psychol. 121 (1), 65–80. Huettig, F., Rommers, J., Meyer, A.S., 2011. Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychol. 137 (2), 151–171. Ito, A., Knoeferle, P., 2023. Analysing data from the psycholinguistic visual-world paradigm: Comparison of different analysis methods. Behav. Res. Methods 55 (7), 3461–3493. Kello, C.T., Anderson, G.G., Holden, J.G., Van Orden, G.C., 2008. The pervasiveness of 1/ f scaling in speech reflects the metastable basis of cognition. Cognit. Sci. 32 (7), 1217–1231. Kello, C. T., Bhat, H., Turner, M. A., Alviar, C. (under revision). Hierarchical temporal structure and nested process composition. Kieslich, P.J., Schoemann, M., Grage, T., Hepp, J., Scherbaum, S., 2020. Design factors in mouse-tracking: What makes a difference? Behav. Res. Methods 52, 317–341. Kukona, A., Tabor, W., 2011. Impulse processing: A dynamical systems model of incremental eye movements in the visual world paradigm. Cognit. Sci. 35 (6), 1009–1051. Leach, J.C.D., Carpenter, R.H.S., 2001. Saccadic choice with asynchronous targets: evidence for independent randomisation. Vision Res. 41 (25–26), 3437–3445. Lepora, N.F., Pezzulo, G., 2015. Embodied choice: how action influences perceptual decision making. PLoS Comput. Biol. 11 (4), e1004110. Levy, J. (2014). Examining the tools used to infer models of lexical activation: Eye-tracking, mouse-tracking, and reaction time. Master’s Thesis. University of Massachusetts, Amherst. https://scholarworks.umass.edu/masters_theses_2/95/. Lorenz, E.N., 1963. Deterministic nonperiodic flow. J. Atmos. Sci. 20 (2), 130–141. Magnuson, J.S., 2005. Moving hand reveals dynamics of thought. Proc. Natl. Acad. Sci. 102 (29), 9995–9996. Magnuson, J.S., 2019. Fixations in the visual world paradigm: where, when, why? J. Cultural Cognitive Sci. 3 (2), 113–139. Magnuson, J. S. (this issue). TRACE-ing fixations in the visual world paradigm: Extending linking hypotheses and addressing individual differences by simulating trial-level behavior. Brain Research. Magnuson, J.S., You, H., Luthra, S., Li, M., Nam, H., Escabi, M., Rueckl, J.G., 2020. EARSHOT: A minimal neural network model of incremental human speech recognition. Cognit. Sci. 44 (4), e12823. Maravita, A., Spence, C., Driver, J., 2003. Multisensory integration and the body schema: close to hand and within reach. Curr. Biol. 13 (13), R531–R539. Marian, V., Spivey, M., 2003. Competing activation in bilingual language processing: Within-and between-language competition. Biling. Lang. Congn. 6 (2), 97–115. Marslen-Wilson, W.D., 1987. Functional parallelism in spoken word-recognition. Cognition 25 (1–2), 71–102. Matin, E., Shao, K.C., Boff, K.R., 1993. Saccadic overhead: Information-processing time with and without saccades. Percept. Psychophys. 53, 372–380. Mayberry, M.R., Crocker, M.W., Knoeferle, P., 2009. Learning to attend: A connectionist model of situated language comprehension. Cognit. Sci. 33 (3), 449–496. McClelland, J.L., Elman, J.L., 1986. The TRACE model of speech perception. Cogn. Psychol. 18 (1), 1–86. McMurray, B., 2023. I’m not sure that curve means what you think it means: Toward a [more] realistic understanding of the role of eye-movement generation in the Visual World Paradigm. Psychon. Bull. Rev. 30 (1), 102–146. McMurray, B. et al. (this issue). From real-time measures to real world differences: New [and old] statistical approaches to individual differences in real-time language processing. Brain Research. McMurray, B., Sarrett, M.E., Chiu, S., Black, A.K., Wang, A., Canale, R., Aslin, R.N., 2022. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 260, 119457. McMurray, B., & Spivey, M. (2000). The categorical perception of consonants: The interaction of learning and processing. In Proceedings of the Chicago Linguistics Society (Vol. 34, No. 2, pp. 205-220). McRae, K., Spivey-Knowlton, M.J., Tanenhaus, M.K., 1998. Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension. J. Mem. Lang. 38 (3), 283–312. Meehl, P.E., 1990. Why summaries of research on psychological theories are often uninterpretable. Psychol. Rep. 66 (1), 195–244. Meyer, D.E., Osman, A.M., Irwin, D.E., Yantis, S., 1988. Modern mental chronometry. Biol. Psychol. 26 (1–3), 3–67. Mirman, D., Magnuson, J.S., 2009. Dynamics of activation of semantically similar concepts during spoken word recognition. Mem. Cogn. 37 (7), 1026–1039. Mirman, D., Dixon, J.A., Magnuson, J.S., 2008. Statistical and computational models of the visual world paradigm: Growth curves and individual differences. J. Mem. Lang. 59 (4), 475–494. Munakata, Y., 1998. Infant perseveration and implications for object permanence theories: A PDP model of the AB task. Dev. Sci. 1 (2), 161–184. Nakayama, K., Moher, J., Song, J.H., 2023. Rethinking vision and action. Annu. Rev. Psychol. 74 (1), 59–86. Pärnamets, P., Johansson, P., Hall, L., Balkenius, C., Spivey, M.J., Richardson, D.C., 2015. Biasing moral decisions by exploiting the dynamics of eye gaze. Proc. Natl. Acad. Sci. 112 (13), 4170–4175. Pezzulo, G., Barsalou, L.W., Cangelosi, A., Fischer, M.H., McRae, K., Spivey, M.J., 2011. The mechanics of embodiment: A dialog on embodiment and computational modeling. Front. Psychol. 2, 5. Posner, M.I., 1980. Orienting of attention. Q. J. Exp. Psychol. 32 (1), 3–25. Rabovsky, M., McRae, K., 2014. Simulating the N400 ERP component as semantic network error: Insights from a feature-based connectionist attractor model of word meaning. Cognition 132 (1), 68–89. Reali, F., Spivey, M.J., Tyler, M.J., Terranova, J., 2006. Inefficient conjunction search made efficient by concurrent spoken delivery of target identity. Percept. Psychophys. 68, 959–974. Ruelle, D., Takens, F., 1971. On the nature of turbulence. Les Rencontres PhysiciensMathématiciens De Strasbourg-RCP25 12, 1–44. Rybář, M., Daly, I., 2022. Neural decoding of semantic concepts: A systematic literature review. J. Neural Eng. 19 (2), 021002. Ryskin, R.A., Fang, X., 2021. The Many Timescales of Context in Language Processing Vol. 75, 201–243. Ryskin, R. A., & Spivey, M. J. (2023). Toward sophisticated models of naturalistic language behavior Comment on“ Beyond Simple Laboratory Studies” by A. Maselli et al. Physics of Life Reviews, 47, 191–194. Shapiro, L., 2019. Embodied Cognition. Routledge. Shin, C.W., Kim, S., 2006. Self-organized criticality and scale-free properties in emergent functional neural networks. Phys. Rev. E 74 (4), 045101. Spivey-Knowlton, M. J. (1996). Integration of visual and linguistic information: Human data and model simulations. PhD dissertation. University of Rochester. https://www. proquest.com/openview/ec504b24f9f2b3608946ef1d7cca0fe6. Spivey, M., 2007. The Continuity of Mind. Oxford University Press. Spivey, M.J., 2018. Discovery in complex adaptive systems. Cogn. Syst. Res. 51, 40–55. Spivey, M.J., 2020. Who You Are: The Science of Connectedness. MIT Press. Spivey, M.J., 2023. Cognitive science progresses toward interactive frameworks. Top. Cogn. Sci. 15 (2), 219–254. Spivey, M.J., Dale, R., 2011. Eye movements both reveal and influence problem solving. In: Liversedge, S. (Ed.), The Oxford Handbook of Eye Movements. Oxford University Press, Oxford, pp. 551–562. Spivey, M.J., Dale, R., Knoblich, G., Grosjean, M., 2010. Do curved reaching movements emerge from competing perceptions? J. Exp. Psychol. Hum. Percept. Perform. 36, 251–254. Spivey, M.J., Grosjean, M., Knoblich, G., 2005. Continuous attraction toward phonological competitors. Proc. Natl. Acad. Sci. 102 (29), 10393–10398. Spivey, M.J., Tanenhaus, M.K., 1998. Syntactic ambiguity resolution in discourse: modeling the effects of referential context and lexical frequency. J. Exp. Psychol. Learn. Mem. Cogn. 24 (6), 1521–1543. Spivey, M.J., Tanenhaus, M.K., Eberhard, K.M., Sedivy, J.C., 2002. Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution. Cogn. Psychol. 45 (4), 447–481. Stephen, D.G., Boncoddo, R.A., Magnuson, J.S., Dixon, J.A., 2009. The dynamics of insight: Mathematical discovery as a phase transition. Mem. Cogn. 37, 1132–1149. Stone, G.O., Van Orden, G.C., 1989. Are words represented by nodes? Mem. Cogn. 17, 511–524. Takens, F. (1981). Detecting strange attractors in fluid turbulence. in D. A. Rand and L. S. Young (eds.), Symposium on Dynamical Systems and Turbulence, Lecture Notes in Mathematics (pp. 366-381). Springer: Berlin. Tanenhaus, M. K. & Huettig, F. (this issue). 30 years visual world paradigm: The state of the art. Brain Research. Tanenhaus, M.K., Magnuson, J.S., Dahan, D., Chambers, C., 2000. Eye movements and lexical access in spoken-language comprehension: Evaluating a linking hypothesis between fixations and linguistic processing. J. Psycholinguist. Res. 29, 557–580. Tanenhaus, M.K., Spivey-Knowlton, M.J., Eberhard, K.M., Sedivy, J.C., 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268 (5217), 1632–1634. Tarr, M.J., Williams, P., Hayward, W.G., Gauthier, I., 1998. Three-dimensional object recognition is viewpoint dependent. Nat. Neurosci. 1 (4), 275–277. Teruya, H., Kapatsinski, V., 2019. Deciding to look: Revisiting the linking hypothesis for spoken word recognition in the visual world. Language, Cognition and Neuroscience 34 (7), 861–880. Thomas, L.E., Lleras, A., 2007. Moving eyes and moving thought: On the spatial compatibility between eye movements and cognition. Psychon. Bull. Rev. 14, 663–668. Thomas, L.E., Lleras, A., 2009. Covert shifts of attention function as an implicit aid to insight. Cognition 111 (2), 168–174. Torre, K., Wagenmakers, E.J., 2009. Theories and models for 1/fβ noise in human movement science. Hum. Mov. Sci. 28 (3), 297–318. van der Heijden, A.H.C., 1996. Two stages in visual information processing and visual perception? Vis. Cogn. 3 (4), 325–361. van der Wel, R.P.R.D., Eder, J., Mitchel, A., Walsh, M., Rosenbaum, D., 2009. Trajectories emerging from discrete versus continuous processing models in phonological competitor tasks: A commentary on Spivey, Grosjean, and Knoblich (2005). J. Exp. Psychol. Hum. Percept. Perform. 35, 588–594. Van Orden, G.C., Holden, J.G., Turvey, M.T., 2003. Self-organization of cognitive performance. J. Exp. Psychol. Gen. 132 (3), 331–350. 16 M.J. Spivey Brain Research 1851 (2025) 149477 Van Orden, G. C., Kloos, H., Wallot, S. (2011). Living in the pink: Intentionality, wellbeing, and complexity. In Philosophy of complex systems (pp. 629-672). NorthHolland. Wagenmakers, E.J., Farrell, S., Ratcliff, R., 2004. Estimation and interpretation of 1/fα noise in human cognition. Psychon. Bull. Rev. 11 (4), 579–615. Wulff, D.U., Haslbeck, J.M., Kieslich, P.J., Henninger, F., Schulte-Mecklenbeck, M., 2019. Mouse-tracking: Detecting types in movement trajectories. In: A Handbook of Process Tracing Methods. Routledge, pp. 131–145. Yee, E., Sedivy, J.C., 2006. Eye movements to pictures reveal transient semantic activation during spoken word recognition. J. Exp. Psychol. Learn. Mem. Cogn. 32 (1), 1–14. Zelinsky, G.J., Murphy, G.L., 2000. Synchronizing visual and language processing: An effect of object name length on eye movements. Psychol. Sci. 11 (2), 125–131. Zgonnikov, A., Aleni, A., Piiroinen, P.T., O’Hora, D., di Bernardo, M., 2017. Decision landscapes: visualizing mouse-tracking data. R. Soc. Open Sci. 4 (11), 170482. 17

(PDF) A linking hypothesis for eyetracking and mousetracking in the visual world paradigm