(PDF) Scaling laws in cognitive sciences

Review Scaling laws in cognitive sciences Christopher T. Kello1, Gordon D.A. Brown2, Ramon John G. Holden4, Klaus Linkenkaer-Hansen5, Theo 1 Cognitive and Information Sciences University 2 Department of Psychology, University of Warwick, 3 Department de Llenguatges i Sistemes Informatics, Jordi Girona Salgado 1-3, 08034 Barcelona, Catalonia, 4 Center for Perception, Action and Cognition, OH 45221-0376, USA 5 Department of Integrative Neurophysiology, VU Scaling laws are ubiquitous in nature, and they pervade neural, behavioral and linguistic activities. A scaling law suggests the existence of processes or patterns that are repeated across scales of analysis. Although the vari- ables that express a scaling law can vary from one type of activity to the next, the recurrence of scaling laws across so many different systems has prompted a search for unifying principles. In biological systems, scaling laws can reflect adaptive processes of various types and are often linked to complex systems poised near critical points. The same is true for perception, memory, language and other cognitive phenomena. Findings of scaling laws in cognitive science are indicative of scaling invariance in cognitive mechanisms and multiplicative interactions among interdependent components of cognition. The scaling law debate In the past, the ubiquity of the normal curve was observed throughout nature, but not satisfactorily explained. Then developments such as the central limit theorem showed how random, independent effects combine to produce the normal curve, thereby explaining its ubiquity. Today the normal curve is sometimes taken for granted, although still appreciated for the beauty and power with which it brings order to randomness. The normal curve fails, however, to describe crucial facts about living systems and other complex systems – because such systems are more than collections of random, independent effects. Their complexity is defined by intri- cate regularities and dependencies that span multiple temporal and spatial scales of analysis. For instance, synchronization errors in a finger-tapping experiment fol- low the normal distribution, yet the temporal sequence of errors is highly non-random [1]. In other words, ments of living systems often obey scaling laws rather than linear relations or Gaussian statistics. Bringing order to such regularities, which are inherent in nature’s complex- ities, including the complexities of cognition, has proven to be as difficult as bringing order to randomness. Most generally, scaling laws express one variable as a nonlinear function of another raised to a power, f(x) / xa, with a6¼0. Scaling laws are observed throughout the Corresponding author: Kello, C.T. ( Ferrer-i-Cancho3, Rhodes1 and Guy C. Van Orden4 of California , Merced, 5200 North Lake Rd., Merced, CA 95343, USA Coventry CV4 7AL, United Kingdom Universitat Politecnica de Catalunya, Campus Nord, Edifici Omega, Spain Department of Psychology, University of Cincinnati, PO Box 210376, Cincinnati, University Amsterdam, De Boelelaan 1085, 1081 HV Amsterdam, the Netherlands sciences, notwithstanding difficulties in determining whether measurements actually conform to scaling laws. Other functions (such as exponentials) can also provide good fits to data, and skeptics sometimes contend that scaling laws do not always provide the best fit [2,3]. How- ever, improved statistical tests have provided strong evi- dence of pervasive scaling laws [4–6] over a substantial (although limited) range of scales [7–10]. Even among scientists who acknowledge the existence of scaling laws, some still see them as largely uninforma- tive because there are many ways to produce scaling laws, and some of those ways are idiosyncratic or artifactual [11]. Thus, their unifying order could be more illusory than enlightening. However, as the extent of unexplained coinci- dence grows with each reported power law, coincidence becomes increasingly difficult to accept. We could instead seriously consider the hypothesis that scaling laws describe a fundamental order in living and complex sys- tems. This working perspective motivates principles and theories to explain scaling laws in terms that can cross or integrate disciplines. Although these debates over scaling laws have a long history in nearly every scientific discipline and domain, they have emerged only recently in cognitive science. Indeed, many cognitive scientists are yet unfamiliar with the debate, or the pervasiveness and meaning of scaling laws in other sciences. Here, we review evidence of scaling laws in cognitive science, at neural, behavioral and lin- guistic levels of description. The evidence indicates that cognitive phenomena occurring at relatively small temporal and spatial scales are intimately linked to those occurring at relatively large scales. This linkage can be explained by rooting cognitive functions in principles of statistical physics. measure- Scaling laws in perception, action and memory Although the demonstration of the pervasiveness of scaling laws could be new to cognitive science, a few classic examples are well known in the field. Studies of psycho- physics and motor control, in particular, have produced some of the most lawful phenomena of human behavior, including some allometric scaling laws (see Glossary). Stevens’ law is one psychophysical example for which the physical magnitude of a stimulus (S) is proportional

[email protected]

). to its perceived intensity (I) raised to a power a, S / Ia [12]. 1364-6613/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.02.005 Available online 1 April 2010 223 Review Trends in Cognitive Sciences Vol.14 No.5 Glossary With regard to motor control, Lacquaniti et al. [13] dis- covered a two-thirds power law in which angular velocity Allometric scaling laws: traditionally refer to relationships between different measures of anatomy and/or physiology that hold true across species or (A) is proportional to the curvature (C) of drawing move- organs of different sizes. A classic example is Kleiber’s law that relates ments, A / C.66. organism mass to metabolic rate as mr0.75, and holds true across species ranging from bacteria to whales [75]. The mass of gray matter versus white Traditionally, such laws have been investigated inde- matter in brains also obeys an allometric scaling law across species [76]. pendently of each other, especially when they are found in Criticality: refers to the state of dynamical systems poised near phase different research domains (e.g. perception versus action transitions, and it is characterized by scale invariance, for example power- law temporal fluctuations (1/f scaling) and power-law distributions with critical for the examples above). It is commonly assumed in cog- exponents. The Ising model of ferromagnetism is a classic example of a system nitive science that different domains entail different mech- with a critical point between ordered and disordered phases [77]. Self- anisms. However, common principles can underlie organized criticality refers to models such as the ‘‘sandpile’’ model for which the critical point is an attractor [78]. seemingly disparate mechanisms, and similar scaling laws Heavy-tailed distributions: have tails that decay more slowly than exponen- are suggestive of such principles. Most broadly, the prop- tially. All power-law distributions are heavy-tailed, but not all heavy-tailed erty of scale invariance inherent to the Stevens’ and two- distributions are power laws (e.g. the lognormal distribution is heavy-tailed but is not a power-law distribution). thirds laws (and all scaling laws) implies a property or Le´vy flights: are random walks (i.e. flights) for which each step is drawn principle that is adaptive at all scales (i.e. scaling laws can from a power-law distribution (the direction of each step is typically exist because natural selection or other mechanisms of random but can instead be determined by some rule or algorithm). Points visited by steps in Le´vy flights tend to be clustered in space, where clusters adaptation select and repeat a pattern or process across are separated by very large steps occasionally drawn from the heavy tail of scales). Consistent with this implication, Copelli et al. [14– the distribution. 16] hypothesized that Stevens’ law reflects maximization Lognormal distributions: are heavy-tailed and have probability density functions that are normally distributed under a logarithmic transformation: of sensitivity and dynamic range in sensory systems, and ðln xmÞ2 Harris and Wolpert [17] hypothesized that the two-thirds PðxÞ ¼ xsp1 ﬃﬃﬃﬃ e 2s 2 for x > 0: 2p law reflects minimization of movement errors caused by For certain parameter values, lognormal distributions can be difficult to noise in motor systems. distinguish from power law distributions [79]. A compelling fact about the Stevens’ and two-thirds Metastability: is a delicate type of stability that is a property of systems poised near their critical points. It stems from the fact that small (microscopic) laws (which is also true of many other scaling law obser- perturbations to near-critical systems can result in system-wide (macroscopic) vations) is that data closely follow their power-law func- changes in their states. Thus, states are only tenuously stable, resulting in tions over more than three orders of magnitude. For many (nearly) equally potential states near critical points. Pareto distributions: are one type of power-law distribution used to model instance, different muscles and muscle groupings are phenomena from a diversity of fields, including economics, physics, anthro- employed and coordinated for movements of very small pology, computer science, geology and biology. Its probability density function versus very large curvature, yet all obey the two-thirds is PðxÞ ¼ a xxmin aþ1 a for x > x min ; law. Scaling over multiple orders of magnitude is compel- where xmin is used to express lower bounds that often exist on physical ling because it ties together ostensibly different mechan- quantities (e.g. volumes and masses of particles must be >0). isms at disparate scales. Given evidence for several other Power-law distributions: have heavy-tailed probability functions of the form P(x)xa, where typically 0<a<3. These distributions have properties of self- scaling laws in perception and action [18,19], one is led to similarity and scale invariance. principles that generally tie together perceptual and motor ‘‘Rich get richer’’: refers to a growth process (i.e. Yule process) whereby the mechanisms across scales. probability of incrementing some quantity associated with a given unit (e.g. population of a city, frequency of a word) is proportional to its current value. The purview of scaling laws broadens as we further Such quantities grow to be power-law distributed, for example networks that consider their occurrence in other domains of cognitive grow by preferential attachment have power-law distributed links (i.e. scale- free [80]). function. Memory is a natural domain to consider after Scale-free networks: are those with power-law distributions of links per node perception and action, and indeed scale invariance has (i.e. node degree). The heavy tail means that some nodes act as hubs because been found in memory retrieval (Figure 1). Maylor et al. they are linked to a substantial proportion of all nodes in the network. 1/f Scaling: (also known as 1/f noise, pink noise or flicker noise) occurs in [20] instructed participants to recall what they did (or will time series with Long-range temporal correlations. A time series can be do) in the previous (or next) day, week or year. Rate of item correlated with itself (i.e. autocorrelated) at varying temporal lags k, and recall was generally invariant across target recall period: autocorrelations, C(k), are typically long-range if they decay slowly as an inverse power of lag, C(k)ka. Expressing this power law in the frequency on average, participants recalled an invariant five items/ domain yields S( f)fa, where f is frequency, S( f) is spectral power and min regardless of the span over which recall was bounded. a1 for 1/f scaling. This dynamic scaling of memory retrieval is consistent Self-similarity and scale invariance: both refer to objects or mathematical functions that exhibit similar shapes or relations among variables at different with a scale-invariant temporal ratio model of memory scales. A self-similar object is such that each portion can be considered a [21,22], in which discriminability of memories depends on reduced-scale image of the whole. Mathematical fractals such as the Koch ratios of temporal intervals between encoding and retrie- snowflake are examples of ideal self-similarity objects because their contours are recursively and identically repeated across all scales. A power-law val events. For example, two different memory traces distribution is scale invariant because multiplying x by a constant c only encoded 8 versus 10 min in the past (temporal ratio=8/ scales the function: P(cx)=caP(x), where P(x)=xa. Scale invariance in nature 10) will be as confusable with each other as two traces tends to be approximate and statistical, as illustrated by the coastline of Britain: it is not that a particular contour is repeated exactly across scales of the encoded 8 versus 10 h in the past. This model explicitly ties coastline. Instead, the statistical relation between measured length of the scales together because the same memory and retrieval coastline and size of the measuring stick is invariant across different zoom levels [81]. processes are hypothesized to operate across all timescales Zipf’s law: refers to a power-law distribution traditionally expressed in terms of and no qualitative distinction exists between short-term frequencies of occurrence of a certain variable (e.g. word rank or population and long-term memory processes. Such an approach size of cities). When word frequencies are said to follow Zipf’s law, their rank r (the most frequent word has rank 1, the second most frequent word has rank 2, appears necessary to explain the many scale-similar effects and so on) is related to frequency as f(r)ra. in human memory and learning (for analogous effects in animal learning, see Ref. [23]). 224 Review Trends in Cognitive Sciences Vol.14 No.5 Figure 1. Scaling in retrospective and prospective memory. Recall data are plotted showing scaling in the retrieval of retrospective (a) and prospective (b) memories from periods varying from a day to a year [20]. Participants were given 4 min to recall ‘‘. . .jobs, appointments, and things you have done yesterday/in the last week/in the last year’’ (retrospective) or ‘‘. . .jobs, appointments, and things you intend to do tomorrow/in the next week/in the next year’’ (prospective) as one-word summaries. The figure shows that the cumulative number of items recalled (on the y-axis) by the end of each of eight 30-s recall periods (on the x-axis). These did not vary as a function of the time interval from which recall was permitted (day/week/year) – recall rate was timescale invariant. Another type of scaling law in memory comes from a other response times) appear to be best modeled as mix- classic free recall paradigm, yet was only recently discov- tures of lognormal and power-law distributions (Figure 3). ered by drawing an analogy to studies of animal foraging These heavy-tailed distributions remain contentious behaviors [24]. Birds, monkeys, fish and numerous other nevertheless because they are difficult to reconcile with species have been reported to search for food in Le´vy flight traditional theories of RTs based on additive interactions patterns [25], which have been hypothesized as effective among component processes. search strategies because they cover more territory than, This debate has only just begun for RTs, but for a for example, a random walk with normally distributed different power-law distribution of linguistic behavior, it steps [26]. Searching for items or events in memory is like has been ongoing for over 50 years. In his pioneering work, foraging, particularly in tasks such as free recall of mem- G. K. Zipf [29] studied the inverse power law of word usage bers of a given semantic category (e.g. animals) in a given that bears his name (Figure 3). Zipf’s law as originally time period [27]. Rhodes and Turvey [24] analyzed inter- formulated states that the frequency of a word ( f) in a response time intervals (IRIs) from this classic memory given corpus is proportional to the inverse of its frequency task, which are analogous to steps from one recalled item to rank (r), f / 1r . Zipf’s law is apparently a universal prop- the next. The authors found IRIs to be power-law distrib- erty of human language, yet its origins remain controver- uted with exponents very similar to those found in animal sial. Power laws such as Zipf’s law are found not just in foraging (Figure 2). These comparable results suggest that word usage but in many aspects of language, such as Le´vy flights are generally adaptive across a variety of syntactic dependency networks [30], and letter sequences search ecologies. These results also illustrate how scaling in lexicons [31]. laws can lurk unnoticed in data for decades, in the absence Zipf originally explained his law in terms of a principle of theories and analytic techniques necessary to recognize of least effort, which states that language structure and them. language use minimize both speakers’ and listeners’ efforts. Speakers prefer high-frequency words for ease of Scaling laws in reaction times and word frequencies memory recall, and listeners prefer low-frequency words Another ‘‘lurking’’ scaling law was recently discovered in with unambiguous meanings. Zipf hypothesized that his the distributions of word-naming latencies of individual law reflects a compromise between these competing con- readers [28]. Cognitive psychologists have known for dec- straints on communication. The same basic principle can ades that reaction time (RT) distributions tend to be also be applied at other linguistic scales, which would positively skewed, but usually this skew has been treated explain Zipf’s law as an adaptive property of communi- as mere deviation from normality; indeed, very long RTs cation. are typically considered outliers and hence are removed or Some researchers, however, claim that Zipf’s law is truncated. Extreme values are expected, however, if RTs inevitable (and therefore uninteresting) because randomly are drawn from heavy-tailed distributions, rather than generated letter sequences can also exhibit scaling [32]. Gaussian distributions. Lognormal and power-law distri- Numerous and recent analyses have refuted this claim by butions are heavy-tailed, and naming latencies (as well as showing that random letter sequences do not mimic closely 225 Review Trends in Cognitive Sciences Vol.14 No.5 Figure 2. Le´vy flights in animal and memory foraging. An artificially generated Le´vy flight path is shown in two dimensions in (a) (note the clusters and clusters within clusters). In (b), estimated Le´vy flight power-law exponents [9,83] are graphed as straight lines in log–log coordinates for four different species: (i) spider monkey; (ii) bigeye tuna; (iii) leatherback turtle; and (iv) Magellanic penguin. Analogous histograms are shown in (c) for four representative participants in a category member free recall task [24]. The histograms are of inter-response intervals (IRIs) between successive category member recalls. enough the actual shape of Zipf’s law in real languages; phenomena such as word order [40]. In contrast, scaling Zipf’s law is universally and nontrivially descriptive of laws offer a new source of linguistic universals at the level language use [33–36]. Although this important law of the large scale organization of language and also offer remains unaddressed in most linguistic theories, it has the possibility to integrate linguistics and cognitive been hypothesized to underlie two fundamental properties science. of human language: syntax and symbolic reference [37]. Once a communication system organizes the patterning of Scaling laws and criticality word frequency according to Zipf’s law, a rudimentary form Widely reported evidence of scaling laws calls for cognitive of language emerges for free as a side effect of competing and linguistic theories that explain their ubiquity [41,42]. constraints of communication [38]. As a starting point for providing an explanation of the It has been argued that statistical laws of language ubiquitous presence of scaling laws, a key alternative to might be interconnected [39], and these interconnections additive summations of components is multiplicative inter- appear to include scaling laws. For instance, the power law actions which produce heavy-tailed distributions [28,43]. of word connectivity in syntactic dependency networks Multiplicative interactions in cognition can be expressed could be a natural consequence of Zipf’s law for word when the operation of one component depends on the frequencies [30,37]. Traditional research on typology of state of another, which is often expressed empirically as linguistic universals has focused on sentence level interaction effects. The preponderance of such effects in 226 Review Trends in Cognitive Sciences Vol.14 No.5 Figure 3. Power-law distributions of word naming latencies and word frequencies. Distributions of speeded word naming latencies (milliseconds, ms) for three representative readers are shown in the left column, in log–log coordinates (reproduced, with permission from [28]). Heavy blue lines are observed distributions and yellow lines are mixtures of ideal lognormal and inverse power-law distributions falling with 90% confidence intervals of observed distributions. Some readers exhibited heavy tails and hence greater power-law proportions (a), others were more balanced (b), and still others were predominantly lognormal (c). Plots on the right show inverse power law distributions of word ranks (reproduced from Ref. [33]). Words were counted and ranked by frequency count in four different texts (black lines): Alice’s Adventures in Wonderland (d); Hamlet (e); David Crockett (f); and The Origin of Species (g). Rank distributions (blue lines) are compared with those generated by a random text model in which letters and spaces were sequentially sampled according to their probabilities in real texts (red dashed lines). The random text model does not match observations of Zipf’s law because (i) observations fall outside 3 standard deviations of the random text model; (ii) random texts have relative humps in the higher frequencies and wider plateaus in the lower frequencies; and (iii) the rank histogram of random texts extends well beyond that of real texts. cognitive phenomena suggests a system in which multi- spatial and temporal long-range correlations, as well as plicative interaction is the rule [41], and the exception is power-law distributions. Box 2 uses the Ising model as an linear combinations of component effects amenable to illustrative example, one that stands as a pillar of stat- linear decomposition (e.g. additive and subtractive logic). istical physics, and a good starting point for cognitive Systems dominated by multiplicative interactions are scientists interested in investigating scaling laws as emer- known to produce heavy-tailed distributions (Box 1). gent from multiplicative, interdependent components of As a rule, multiplicative interactions also create inter- cognition. dependencies among component activities over time. These In statistical physics, scaling laws have been studied for interdependencies can lead to long-range correlations decades in the context of phase transitions [46,47]. When when component effects travel through feedback loops systems are poised near order–disorder phase transitions across scales of component interactions, thereby changing (i.e. critical points), microscopic changes can propagate the dynamics of interactions [44]. Interdependence has through spatial correlations across many scales to become been shown in model systems to generate self-similar macroscopic effects that evolve on many time scales. Thus, structures and fluctuations [45] and thereby generate criticality yields multiple-scale dynamics expressed as 227 Review Trends in Cognitive Sciences Vol.14 No.5 Box 1. Additive versus multiplicative effects When measurements are independent and measured values are multiplicatively (i.e. bigger cities are more likely to have larger essentially sums of independent effects, the central limit theorem growth rates than smaller cities [82]). The consequence is that city leads one to expect a normal distribution of values (i.e. a Gaussian populations appear to be power-law distributed over a wide range of probability function; Figure I blue). Illustrative examples are distribu- sizes [10,80]. Multiplicative effects can also lead to lognormal tions of organism size in a population, such as height or weight, and distributions (Figure I red), and simple multiplicative models have distributions of scores on various tests of cognitive ability, such as the been shown to generate either lognormal or power-law distributions IQ test. Each observation of size or IQ is independent of other depending on small parametric changes [79]. Lognormal and power- observations, and although factors affecting these measures are law distributions are both heavy-tailed, and hence heavy-tailed myriad and poorly understood, they are assumed to make largely distributions are often interpreted as evidence for multiplicative independent and additive contributions to each individual’s size or IQ. processes. An important difference between heavy-tailed and normal Normal distributions are not expected when measured values distributions is that moments of the former (e.g. mean and variance) reflect multiplicative combinations of effects. An illustrative example poorly characterize the distribution (in fact, they are undefined for is distributions of city population sizes. Cities appear to grow certain power-law distributions). Figure I. Idealized normal (blue), lognormal (red), and power law (green) probability functions are plotted in raw (left), semi-log (middle) and log–log (right) coordinates. spatial and temporal long-range correlations [48]. Evi- A connection between criticality, metastability and com- dence for criticality has been investigated in a wide variety putation was first proposed for cellular automata [59,60] of physical, biological, computational and social systems and has since been demonstrated in the dynamics of neural [49]. networks [61]. Beggs and Plenz [62] found that cortical The possible role of criticality in cognitive science can slice preparations exhibit critical branching dynamics (i.e. be illustrated through neural networks [50–53]. A funda- ‘‘neural avalanches’’), and probabilistic spiking models mental requirement of any neural network is to transmit were shown to optimize information transmission near and process information via activities of its neuronal their critical points. Neurons can also be modeled by components. On timescales of milliseconds to seconds thresholding sums of incoming weights to be above or and even minutes, information is transmitted in neural below zero (+1, –1), and networks of threshold neurons networks via action potentials (i.e. spikes). Regardless of have similarly been shown to optimize memory and repres- how information is coded in spikes, neurons must be able entational capacity near their critical points [56], and to affect each others’ spiking dynamics to transmit and psychophysical models have linked Stevens’ law with cri- process information. Thus, if neurons are too indepen- ticality in neural network dynamics [14–16]. At a different dent of each other, information cannot be transmitted. scale, Zipf’s law was recently derived from a critical point But if neurons are too interdependent, their spiking between speaker and listener efforts quantified in infor- dynamics will be slaved to each other, and hence mation theoretic terms [63–65]. Taken together, these too uniform and unchangeable to code information. models realize testable connections between criticality Criticality strikes a balance between independence and cognition as expressed in neural and behavioral versus interdependence among component activities, activity. and when applied to neural spiking dynamics, it could Evidence for criticality in cognitive science has also support information transmission and processing. More come in the form of temporal long-range correlations (i.e. generally, evolution can favor critical states because 1/f scaling), which can be seen in fluctuation time series as their associated metastability (i.e. the delicate stability undulations at many timescales. 1/f scaling has been that characterizes systems poised near their critical observed in many aspects of neural and behavioral activity points) strikes an optimal compromise between change [6]. For instance, 1/f scaling has been observed in acoustic (flexibility and adaptation) and stability (memory and energy fluctuations across word repetitions and in fluctu- continuity) necessary for information transmission and ations of the amplitude envelope of ongoing neuronal computation [47,53–58]. oscillations in healthy subjects (Figure 4). The amplitude 228 Review Trends in Cognitive Sciences Vol.14 No.5 Box 2. Short-range versus long-range correlations In physical systems, events occurring nearby in time or space are brightness levels are independent across lights and the correlation often similar to each other, and such similarities typically fall off as function will be near zero for all distances >0. If instead neighbor distance increases. Physicists use the correlation function to express conformity is heavily weighted, then brightness levels will be the effect of distance on similarity, and the observed shape of this interdependent and approach uniformity, with a correlation function function constitutes evidence about the type of system being near one across a wide range of distances. observed. Neither extreme is typical of physical systems. Instead, component To illustrate we use a characterization of the Ising model [77]. interactions are somewhere between independent and interdepen- Imagine a 2D grid of lights of varying brightness (from off to dent. Weak interactions can result in short-range correlations (Figure I maximum), where brightness is a function of two variables. One is a green) that decay exponentially with distance. Stronger interactions random noise factor (individual to each light) and the other is a can result in long-range correlations that decay more slowly (Figure I neighbor conformity factor whereby each light tends towards the pink) (i.e. as an inverse power of distance). The correlation function brightness of its four nearest neighbors on the grid. These two can also be defined for distances in time, with an analogous variables are weighted together to determine the brightness of each comparison between weak (short-range) versus strong (long-range) light. In this illustration, the correlation function measures the degree interactions. No interactions can result in uncorrelated noise (Figure I to which lights have equal brightness levels as a function of their grey), and integrating over uncorrelated noise results in a random distance apart on the grid. If noise is heavily weighted, then walk (Figure I brown). Figure I. Four example time series are plotted in the left-hand panel: random samples from a normal distribution with zero mean and unit variance (i.e. white noise, in grey), a running sum of white noise (i.e. brown noise, also known as a random walk, in brown), 1/f noise (i.e. pink noise, in pink) and an autoregressive moving average (ARMA, in green), where each sampled value is a weighted sum of a noise sample, plus the previous noise value, plus the previous sampled value. Idealized autocorrelation functions are shown in the middle panel for each of the time series, where k is distance in time. Note that white noise (i.e. pure independence) has no correlations, ARMA has short-range correlations that decay exponentially with k, 1/f noise has long-range correlations that decay as an inverse power of k and brown noise has correlations that decrease linearly with k. Idealized spectral density functions (where f is frequency and S( f) is spectral power) are shown in the right-hand panel in log–log coordinates. White, pink and brown noises correspond to straight lines with slopes of 0, 1 and 2, whereas ARMA plateaus in the lower frequencies. envelope can be illustrated by tracing a line from peak to range correlations in collective measures of component peak along a given waveform (i.e. the convex hull). Inter- activities. These predictions are supported by the evidence estingly, the temporal scaling of amplitude fluctuations in reviewed here for neural avalanches [62], power-law distri- ongoing oscillations has recently been associated with butions in word frequencies [33] and reaction times [28], cognitive impairments such as depression [66] and demen- and analyses showing pervasive 1/f scaling in neural [54] tia [67]. Also, criticality has been supported by multifractal and behavioral activity [5] fluctuations. Adding multifrac- patterns in the same data previously supporting 1/f scaling tality to the mounting evidence means that metastability [44]. Multifractal patterns occur when scaling relations near critical points is the only candidate hypothesis that (i.e. their exponents) vary over time or space, thereby could explain the existing data. adding a further dimension of complexity to data. 1/f scaling characterizes the central tendency of multi- Concluding remarks fractal human performance, and thus the intrinsic fluctu- In this brief review, a variety of scaling laws in cognitive ations in neural and behavioral activity, be they from ion science were discussed that plausibly express adaptive channels or brain images or text sequences [55]. 1/f scaling properties of perception, action, memory, language and suggests that criticality underlies cognitive function at computation. The working hypothesis of criticality can multiple scales and levels of analysis. Although obser- provide a general framework for understanding scaling vations of 1/f scaling in isolation do not constitute conclus- laws and has motivated the application of new analytical ive evidence for criticality (for other explanations, see Refs tools to understand variability in cognitive systems. Much [3,68–70]), multifractal 1/f scaling greatly strengthens the work lies ahead, however, to further test these new case [44]. Additionally, criticality predicts power-law hypotheses and also to bring more scientists into the distributions and pervasive temporal and spatial long- debate (Box 3). 229 Review Trends in Cognitive Sciences Vol.14 No.5 Figure 4. 1/f scaling in neural and behavioral activity. A band-pass filtered signal (6.7–13.3 Hz, thin blue lines) from a single channel (0.1–100 Hz) of magnetoencephalography (MEG) recording is shown in (a) at two time scales, filtered through a Morlet wavelet with a passband from 6.7 to 13.3 Hz (reproduced, with permission from [84]). The log–log power spectrum of the resulting amplitude envelope of the oscillations (a, thick red lines) is shown in (b). Evidence for 1/f scaling is seen in the negatively sloped line for MEG data (open red circles), and evidence against an artifactual explanation is seen in the contrasting flat line for reference channel control data (filled black circles). 1/f scaling indicates that ongoing neural oscillations carry a long-range memory of their own dynamics across hundreds or even thousands of cycles. The same type of memory is also found in acoustic power intensity fluctuations in spoken word repetitions, shown in (c) for one speaker’s 1024 repetitions of the word ‘‘bucket’’ (reproduced with permission from [5]). Intensity fluctuations are shown separately for each acoustic syllable, at three different passbands (center frequencies of 150 Hz, 6 kHz and 13 kHz). In total, 90 fluctuation series were observed for each of 10 speakers, and the 1/fa exponent was estimated for each series. The resulting distribution (d) was centered around a1. Some of this work will need to address difficulties in amounts of data to be collected, which can be prohibitive, distinguishing alternative accounts of data (e.g. long- but technological advances are making large datasets range versus short-range correlations and exponential more viable (e.g. in brain imaging and electronic versus lognormal versus power-law distributions). corpora). Recent advances in model identification methods have Other work will need to advance models of cognitive strengthened conclusions, but evidence is still more processes, because most of them are currently not compelling when scaling laws are observed to span many designed to account for scaling laws, yet scaling laws orders of magnitude. Such observations require large appear widespread in cognitive science. Research is needed to determine whether current models could explain scaling laws within their purview, perhaps with Box 3. Outstanding questions small modifications or extensions, or whether new models and theories are needed to explain them. It is likely that common principles will be needed to fully explain some How can scaling laws be robustly and reliably detected in cognitive science data? observations of scaling laws. Preferential attachment and How can models of cognitive processes explain observations of self-organized criticality are two examples proposed scaling laws? to explain a wide range of scaling law observations How many scaling laws in cognitive science can be explained by throughout nature, including cognitive science [5,71]. fundamental principles such as those found in statistical physics? Although it is unlikely that all observations of scaling How can variability in scaling law exponents be explained as it is found across different individuals, different measures of cognitive laws in cognitive science have a common explanation, performance and different measurement conditions? they can deepen our understanding of scaling laws and How are scaling law exponents empirically and theoretically their meaning for cognitive function (e.g. if some are related across different measures and types of scaling laws in logical or mathematical consequences of other more cognitive science? fundamental laws of nature). 230 Review Trends in Cognitive Sciences Vol.14 No.5 Any fundamental approach to scaling laws in cognitive 21 Brown, G.D.A. et al. (2007) A temporal ratio model of memory. Psychol. Rev. 114, 539–576 science will need to explain variability observed in scaling 22 Brown, G.D.A. et al. (2008) Serial and free recall: common effects and law parameters estimated from data. Scaling laws are common mechanisms? A reply to Murdock (2008). Psychol. Rev. 115, parameterized by exponents, and exponents are observed 781–785 to vary across individuals [28,72], across tasks [6] and across 23 Gallistel, C.R. and Gibbon, J. (2000) Time, rate, and conditioning. time [44]. It should not be surprising that cognitive science Psychol. Rev. 107, 289–344 24 Rhodes, T. and Turvey, M.T. (2007) Human memory retrieval as Le´vy data are the most complex in this regard throughout nature, foraging. Physica A 385, 255–260 because exponents of scaling laws in other empirical 25 Reynolds, A.M. and Rhodes, C.J. (2009) The Le´vy flight paradigm: domains are often observed to be relatively constant. It is random search patterns and mechanisms. Ecology 90, 877–887 an open question how variability in scaling laws reflects the 26 Lomholt, M.A. et al. (2008) Le´vy strategies in intermittent search flexibility and contextuality of cognition [73,74]. processes are advantageous. Proc. Natl. Acad. Sci. U. S. A. 105, 11055–11059 27 Bousfield, W.A. and Sedgewick, C.H.W. (1944) An analysis of Acknowledgments sequences of restricted associative responses. J. Gen. Psychol. 30, This article originated from a symposium entitled ‘‘Scaling Laws in 149–165 Cognitive Science’’ that the authors presented at the 31st Annual 28 Holden, J.G. et al. (2009) Dispersion of response times reveals cognitive Meeting of the Cognitive Science Society in Amsterdam, the dynamics. Psychol. Rev. 116, 318–342 Netherlands. The authors would like to thank the reviewers and A. 29 Zipf, G.K. (1949) Human Behavior and the Principle of Least Effort: An Corral for helpful comments and suggestions. GDAB was supported by Introduction to Human Ecology, Addison-Wesley grant RES 062 23 0545 from the Economic and Social Research Council 30 Ferrer i Cancho, R. et al. (2004) Patterns in syntactic dependency (UK). RFiC was supported by the SESAAME-BAR [Secuencias networks. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 69, 051915 Simbo´licas: Ana´lisis, Aprendizaje, Minerı´a y Evolucio´n-Bacelona (in 31 Kello, C.T. and Beltz, B.C. (2009) Scale-free networks in phonological English ‘‘Symbolic Sequences: Analysis, Learning, Mining and and orthographic wordform lexicons. In Approaches to Phonological Evolution-Barcelona’’)] project of the Spanish Ministry of Science and Complexity (Chitoran, I. et al., eds), Mouton de Gruyter Innovation (TIN2008-06582-C03-01). KL-H was supported by the 32 Miller, G.A. and Chomsky, N. (1963) Finitary models of language users. Innovative Research Incentive Schemes of the Netherlands In Handbook of Mathematical Psychology (Luce, R.D. et al., eds), pp. Organization for Scientific Research. JGH was supported by NSF BCS- 419–491, Wiley 0446813, and JGH and GVO were supported by BCS-0642718. CTK was 33 Ferrer-i-Cancho, R. and Elveva˚g, B. (2010) Random texts do not exhibit supported by NSF BCS-0842784 and the Keck Futures Initiative. the real Zipf’s law-like rank distribution. PLoS ONE 5 (3), e9411 34 Ferrer-i-Cancho, R. and Gavalda`, R. (2009) The frequency spectrum of References finite samples from the intermittent silence process. J. Am. Soc. Inf. 1 Chen, Y. et al. (1997) Long memory processes (1/f alpha type) in human Sci. Technol. 60, 837–843 coordination. Phys. Rev. Lett. 79, 4501 35 Ferrer i Cancho, R. and Sole´, R.V. (2002) Zipf’s law and random texts. 2 Wagenmakers, E-J. et al. (2004) Estimation and interpretation of l/f Adv. Complex Syst. 5, 1–6 alpha noise in human cognition. Psychon. Bull. Rev. 11, 579–615 36 Cohen, A. et al. (1997) Numerical analysis of word frequencies in 3 Farrell, S. et al. (2006) 1/f noise in human cognition: is it ubiquitous, artificial and natural language texts. Fractals 5, 95–104 and what does it mean? Psychon. Bull. Rev. 13, 737–741 37 Ferrer-i-Cancho, R. et al. (2005) The consequences of Zipf’s law for 4 Gisiger, T. (2001) Scale invariance in biology: coincidence or footprint of syntax and symbolic reference. Proc. R. Soc. Lond. B Biol. Sci. 272, a universal mechanism? Biol. Rev. 76, 161–209 561–565 5 Kello, C.T. et al. (2008) The pervasiveness of 1/f scaling in speech 38 Ferrer-i-Cancho, R. (2006) When language breaks into pieces. A conflict reflects the metastable basis of cognition. Cogn. Sci. 32, 1217–1231 between communication through isolated signals and language. 6 Kello, C.T. et al. (2007) The emergent coordination of cognitive Biosystems 84, 242–253 function. J. Exp. Psychol. Gen. 136, 551–568 39 Ko¨hler, R. (1986) Zur linguistischen Synergetik: Struktur und Dynamik 7 Thornton, T.L. and Gilden, D.L. (2005) Provenance of correlations in der Lexik, Brockmeyer psychophysical data. Psychon. Bull. Rev. 12, 409–441 40 Haspelmath, M. et al. (2005) World Atlas of Language Structures, 8 Gilden, D. (2009) Global model analysis of cognitive variability. Cogn. Oxford University Press Sci. 33, 1441–1467 41 Van Orden, G.C. et al. (2003) Self-organization of cognitive 9 Sims, D.W. et al. (2008) Scaling laws of marine predator search performance. J. Exp. Psychol. Gen. 132, 331–350 behaviour. Nature 451, 1098–1102 42 Van Orden, G.C. et al. (2005) Human cognition and 1/f scaling. J. Exp. 10 Clauset, A. et al. (2009) Newman, Power-law distributions in empirical Psychol. Gen. 134, 117–123 data. SIAM Rev. 51, 661–703 43 Van Orden, G.C. et al. (2009) Living in the pink: intentionality, 11 Rapoport, A. (1982) Zipf’s law revisited. In Studies on Zipf’s Law (Guiter, wellness, and complexity. In Philosophy of Complex Systems: H. and Arapov, M.V., eds), pp. 1–28, Studienverlag Brockmeyer Handbook of the Philosophy of Science (Hooker, C., ed.), Elsevier 12 Stevens, S.S. (1957) On the psychophysical law. Psychol. Rev. 64, 153– 44 Inhen, E.A.F. and Vereijken, B. Beyond 1/f fluctuations in cognitive 181 performance. J. Exp. Psychol. Gen. (in press). 13 Lacquaniti, F. et al. (1983) The law relating the kinematic and figural 45 Turcotte, D.L. and Rundle, J.B. (2002) Self-organized complexity in the aspects of drawing movements. Acta Psychol. 54, 115–130 physical, biological, and social sciences. Proc. Natl. Acad. Sci. U. S. A. 14 Copelli, M. and Campos, P.R.A. (2007) Excitable scale free networks. 99, 2463–2465 Eur. Phys. J. B Cond. Matter Complex Syst. 56, 273–278 46 Christensen, K. and Moloney, N.R. (2005) Complexity and Criticality, 15 Copelli, M. et al. (2005) Signal compression in the sensory periphery. Imperial College Press Neurocomputing 65–66, 691–696 47 Chialvo, D.R. (2008) Emergent complexity: what uphill analysis or 16 Kinouchi, O. and Copelli, M. (2006) Optimal dynamical range of downhill invention cannot do. New Ideas Psychol. 26, 158–173 excitable networks at criticality. Nat. Phys. 2, 348–351 48 Bak, P. and Paczuski, M. (1995) Complexity, contingency, and 17 Harris, C.M. and Wolpert, D.M. (1998) Signal-dependent noise criticality. Proc. Natl. Acad. Sci. U. S. A. 92, 6689–6696 determines motor planning. Nature 394, 780–784 49 Sornette, D. (2004) Critical Phenomena in Natural Sciences: Chaos, 18 Chater, N. and Brown, G.D.A. (1999) Scale-invariance as a unifying Fractals, Self-organization, and Disorder: Concepts and Tools, (2nd psychological principle. Cognition 69, 17–24 edn), Springer 19 Chater, N. and Brown, G.D.A. (2008) From universal laws of cognition 50 Levina, A. et al. (2007) Dynamical synapses causing self-organized to specific cognitive models. Cogn. Sci. 32, 36–67 criticality in neural networks. Nat. Phys. 3, 857–860 20 Maylor, E.A. et al. (2001) Scale invariance in the retrieval of retrospective 51 Chialvo, D.R. et al. (2008) The brain: What is critical about it? AIP Conf. and prospective memories. Psychon. Bull. Rev. 8, 162–167 Proc. 1028, 28–45 231 Review Trends in Cognitive Sciences Vol.14 No.5 52 de Arcangelis, L. et al. (2006) Self-organized criticality model for brain 67 Montez, T. et al. (2009) Altered temporal correlations in parietal alpha plasticity. Phys. Rev. Lett. 96, 028107 and prefrontal theta oscillations in early-stage Alzheimer disease. 53 Poil, S-S. et al. (2008) Avalanche dynamics of human brain oscillations: Proc. Natl. Acad. Sci. U. S. A. 106, 1614–1619 relation to critical branching processes and temporal correlations. 68 Grigolini, P. et al. (2009) A theory of 1/f noise in human cognition. Phys. Hum. Brain Mapp. 29, 770–777 A Stat. Mech. Appl. 388, 4192–4204 54 Linkenkaer-Hansen, K. et al. (2001) Long-range temporal correlations 69 Medina, J.M. (2009) 1/f alpha noise in reaction times: a proposed model and scaling behavior in human brain oscillations. J. Neurosci. 21, based on Pieron’s law and information processing. Phys. Rev. E 79, 1370–1377 011902 55 Kello, C.T. and Van Orden, G.C. (2009) Soft-assembly of sensorimotor 70 Torre, K. and Wagenmakers, E-J. (2009) Theories and models for 1/ function. Nonlinear Dynamics Psychol. Life Sci. 13, 57–78 f[beta] noise in human movement science. Hum. Mov. Sci. 28, 297–318 56 Bertschinger, N. and Natschlager, T. (2004) Real-time computation at 71 Steyvers, M. and Tenenbaum, J.B. (2005) The large-scale structure of the edge of chaos in recurrent neural networks. Neural Comput. 16, semantic networks: statistical analyses and a model of semantic 1413–1436 growth. Cogn. Sci. 29, 41–78 57 Tognoli, E. and Kelso, J.A.S. (2009) Brain coordination dynamics: true 72 Gilden, D.L. and Hancock, H. (2007) Response variability in attention- and false faces of phase synchrony and metastability. Prog. Neurobiol. deficit disorders. Psychol. Sci. 18, 796–802 87, 31–40 73 Riley, M.A. and Turvey, M. (2002) Variability and determinism in 58 Kelso, J. and Tognoli, E. (2009) Toward a complementary motor behavior. J. Mot. Behav. 34, 99–125 neuroscience: metastable coordination dynamics of the brain. In 74 Zbilut, J.P. (2004) Unstable Singularities and Randomness: Their Downward Causation and the Neurobiology of Free Will Importance in the Complexity of Physical, Biological and Social (Heidelberg, S.B., ed.), pp. 103–124, Springer Sciences, Elsevier 59 Packard, N. (1988) Adaptation towards the edge of chaos. In Dynamic 75 Spence, A.J. (2009) Scaling in biology. Curr. Biol. 19, R57–R61 Patterns in Complex Systems (Kelso, J.A.S. et al., eds), pp. 293–301, 76 Zhang, K. and Sejnowski, T.J. (2000) A universal scaling law between World Scientific gray matter and white matter of cerebral cortex. Proc. Natl. Acad. Sci. 60 Langton, C.G. (1990) Computation at the edge of chaos – phase- U. S. A. 97, 5621–5626 transitions and emergent computation. Physica. D 42, 12–37 77 Onsager, L. (1944) A two-dimensional model with an order-disorder 61 Beggs, J.M. (2008) The criticality hypothesis: how local cortical transition. Phys. Rev. 65, 117–149 networks might optimize information processing. Philos. Transact. A 78 Bak, P. et al. (1988) Self-organized criticality. Phys. Rev. A 38, 364 Math. Phys. Eng. Sci. 366, 329–344 79 Mitzenmacher, M. (2004) A brief history of generative models for power 62 Beggs, J.M. and Plenz, D. (2003) Neuronal avalanches in neocortical law and lognormal distributions. Internet Math. 1, 226–251 circuits. J. Neurosci. 23, 11167–11177 80 Newman, M.E.J. (2005) Power laws, Pareto distributions and Zipf’s 63 Ferrer i Cancho, R. and Sole´, R.V. (2003) Least effort and the origins of law. Contemp. Phys. 46, 323–351 scaling in human language. Proc. Natl. Acad. Sci. U. S. A. 100, 788–791 81 Mandelbrot, B. (1967) How long is the coast of Britain? Statistical self- 64 Ferrer-i-Cancho, R. (2005) Zipf’s law from a communicative phase similarity and fractional dimension. Science 156, 636–638 transition. Eur. Phys. J. B 47, 449–457 82 Gabaix, X. (2006) Zipf’s law for cities: an explanation. Q. J. Econ. 114, 65 Ferrer-i-Cancho, R. and Diaz-Guilera, A. (2007) The global minima of 739–767 the communicative energy of natural communication systems. J. Stat. 83 Ramos, F. et al. (2004) Levy walk patterns in the foraging movements of Mech. P06009 spider monkeys (Ateles geoffroyi). Behav. Ecol. Sociobiol. 55, 223–230 66 Linkenkaer-Hansen, K. et al. (2005) Breakdown of long-range temporal 84 Linkenkaer-Hansen, K. et al. (2004) Stimulus-induced change in long- correlations in theta oscillations in patients with major depressive range temporal correlations and scaling behaviour of sensorimotor disorder. J. Neurosci. 25, 10131–10137 oscillations. Eur. J. Neurosci. 19, 203–211 232