A Dynamic Field Theory of visual working memory

The main objective of this chapter is to introduce concepts of dynamic field theory (DFT), a continuous attractor neural network, and its implementation of visual working memory (VWM). In DFT, WM is an attractor state where representations are self-sustained through strong interactions between self-excitation and lateral inhibition. We discuss a VWM model with fields represented by stabilised attractor states. Using this model, we demonstrate how encoding, consolidation, maintenance and comparison occurs in correct and incorrect, same and different trials in a change detection task. Further, the model captures accuracy and capacity limitations when VWM load is manipulated. Critically, we review work from our research group by demonstrating how the model captures behavioural performance and makes hemodynamic predictions in early childhood, young adulthood and older adulthood. Using the model, we posit that developmental changes in VWM processing occur as a result of the modulation of strength and width of excitation and inhibition. Finally, we describe how the DFT account compares with current views on a domain-general account and distributed nature of WM processing.


Definition of working memory
In dynamic field theory (DFT), WM is an attractor state where representations are self-sustained through strong recurrent interactions between excitation and inhibition.

Describe and explain the methods you use, and their strengths and limitations
Behavioural measures are used to tune model performance.Neuroimaging methods explain activation in cortical networks across the lifespan and across cognitive functions to understand the processes that underlie behaviour.Computational modelling to investigate causality and the link between behaviour and brain.

Unitary vs. nonunitary nature of working memory
The implementation of WM in DFT is a domain-general system that interacts with long-term memory.It can be engaged within any cognitive function, because, in DFT, WM is an attractor state that emerges from the interactions between excitation and inhibition.That said, DFT is strongly committed to the grounding of cognition in sensorimotor systems.This requires understanding the domainspecific nature of the task and the sensorimotor systems engaged.

The role of attention and control
There is no central homunculus/executive controller.Instead, its role emerges from the interactions between perceptual, WM and decision-making fields.However, larger DF architectures have feature/spatial attention fields that build activation when fixation moves to a particular item/location.

Storage, maintenance and loss of information in WM
Encoding occurs when a peak of activation responsive to a stimulus, enters a stabilized attractor state.Maintenance occurs when this peak of activation enters a self-sustaining state as a result of the interaction between excitation and inhibition.Loss of information occurs as a result of competition between multiple peaks of activation (representations)-under some conditions, a peak can be de-stabilized (forgotten).

The role of LTM knowledge in WM storage and processing
Peaks of activation in WM can form sub-threshold memory traces that are likened to LTM traces.Conversely, these memory traces can speed up encoding and/or strengthen WM maintenance.

Is there evidence that is not consistent with your theoretical framework
The key challenging topics for DFT are: (1) Incorporating differences in strategies (for e.g., verbal strategies employed by older adults during change detection).(2) Capturing differences in LTM processing (declarative versus procedural memory).(3) The representation of temperament, affect, gender and motivation, purported to influence WM processing.

WM Development
Changes in working memory across development are explained by changes in the strength and width of excitation and inhibition within and/or between fields in the DF model.

Individual differences and limits in WM capacity
Limits in WM capacity occur due to an increase in inhibition from multiple competing peaks of activation.Manipulating self-excitation, lateral inhibition, neural noise and/or resting level in the fields of the DF model can potentially capture individual differences.

Neural correlates of WM
Local field potentials (LFPs) can be computed from trial-by-trial estimates of each field of the DF model.These LFPs can be convolved with canonical impulse response functions to create regressors for each experimental condition to be used in a general linear modelsynonymous with statistical analyses used in fMRI.
Working memory is a key cognitive system that is responsible for maintaining and manipulating information and detecting changes when they occur (Luck & Vogel, 2013).
Examples from everyday functions include maintaining and manipulating numbers during a complex set of calculations, storing the words before a conjunction until the rest of the sentence is read to comprehend the meaning or even, continually maintaining and updating information from a dynamic visual scene while driving.
In the four decades since the proposition of the influential model of working memory by Baddeley and Hitch, behavioural and neuroimaging findings have paved the way for the development of multiple theoretical accounts of working memory.Specifically, the advent of micro-and macro-scale neuroimaging techniques have afforded the possibility of 'peeking' into brains while humans and animals engage in experimental tasks.It is now possible to associate encoding, maintenance, and retrieval processes with levels of activation in specialized cortical networks, map these functions across age, and examine the extent of spatial overlap with other cognitive functions.Further, it is possible to spatially overlay regions of activation from working memory tasks, and inhibitory control tasks and cognitive flexibility tasks to examine similar and unique mechanisms.
Our research group employs behavioural and neuroimaging methods to investigate how a specific type of working memoryvisual working memory (VWM)emerges in infancy (Wijeakumar, Kumar, Reyes, Tiwari, & Spencer, 2019) and early childhood (Buss, Fox, Boas, & Spencer, 2014), and develops across the lifespan (Wijeakumar, Magnotta, & Spencer, 2017a).Research across the lifespan emphasizes the importance of creating theories that can explain age-related changes in VWM processing.To this end, we have used experimental tasks such as change detection (Colbert & Bo, 2017; S.J. Luck & Vogel, 1997;Phillips, 1974;Read, Rogers, & Wilson, 2016) and preferential looking (Ross-sheehy, Oakes, & Luck, 2003) that can be scaled across the lifespan by manipulating load (1-6 items) and trial types (same/different).We refer to the chapter by Hakim, Awh & Vogel (2020) in this book that describes some of the research conducted using variants of the change detection task (Hakim, Awh, & Vogel, 2020).To record brain activation across the lifespan, we rely on functional nearinfrared spectroscopy (fNIRS), a neuroimaging technique where near infrared light is shone through the scalp and differentially absorbed by oxy-and deoxy-hemoglobin.This technique can be used in studies with infants, children and older adults because it doesn't require that participants remain still.Further, it is less susceptible to motion artifacts.More recently, portable systems have also become available affording the possibility of conducting home assessments.Recent innovations in methodology also allow for the spatial overlap of regions of activation to compare neural correlates across the lifespan (Wijeakumar, Huppert, Magnotta, Buss, & Spencer, 2017).Concretely, it is possible to spatially overlay brain activation from a 6-month old infant engaging in a preferential looking VWM task with an 80year old adult engaging in a change detection VWM task.In addition to behaviour and brain function, our research employs the use of computational models to pursue a mechanistic and causative approach to understanding VWM processing.Specifically, we aim to develop models that can capture behaviour and brain activation, are constrained by manipulations to task parameters, and can make specific predictions.
In this chapter, we introduce the computational framework that we use in our research -Dynamic Field Theory.We begin by providing an overview of this framework and the neurophysiological basis for its foundation.Then, we present a step-by-step guide to the development of a three-field DF model of VWM.In doing so, we demonstrate how this computational framework instantiates a neural basis for VWM via a self-sustained state of activation in a cortical field.Further, we explain how this model can explain encoding, consolidation, maintenance and comparison of items while being implemented within a realtime neural system.Finally, we demonstrate how the three-field DF model can be used to capture age-related changes in VWM processing across the lifespan.

Overview of dynamic field theory:
Dynamic field theory (DFT) refers to a class of continuous attractor neural network models used to capture the dynamics involved in neural and behavioural processes of perception, cognition, and action (Schöner, Spencer, & The DFT Research Group, 2016).Its emergence is heavily rooted in work by Esther Thelen and Linda Smith (Thelen & Smith, 1994, 2007).They adopted a dynamical systems perspective to studying early development, and in doing so, highlight two valuable points: (1) development occurs as a result of micro-scale, moment-to-moment changes (for e.g., moving the hand in response to a painful stimulus), but also, as a result of macro-scale changes (chronic exposure to life stress) across years, and (2) these changes unfold as a consequence of the complex, mutual, and continuous interactions between the brain, body and environment.Within this complexity, changes will occur across multiple spatial scalescellular to cultural.The principle of 'change' is captured through the use of differential equations in DFT (Spencer & Schöner, 2003).Moreover, DF models capture interactions between systems by integrating perceptual, memory, and decision-making systems.Finally, DF models have been implemented using autonomous robotic systems, showing how the concepts of DFT span across the body, brain and the environment (Lipinski, Schneegans, Sandamirskaya, Spencer, & Schöner, 2012;Milde et al., 2017).
In DFT, behaviour is 'softly assembled' via the integration and dis-integration of multiple processes, instead of being centrally controlled by a single entity.In dynamical systems, local neural populations that constitute these processes move in and out of attractor states.Neural systems can exist in at least three attractor states: a 'resting' attractor state where it can be activated following input, a 'stabilized' state where the system is anchored to external input, and a 'self-sustaining' state wherein interactions within the system and interactions with other systems can sustain activation even after input is removed.This selfsustaining state creates a form of working memory, which as we discuss in later sections, is a product of the balance between local excitation and surround inhibition in populations of neurons.Note that in our DF models, we label particular fields as 'WM' fields.This reflects that these fields have neural interactions that are 'tuned' to operate in the self-sustaining state with strong local excitation and strong surround inhibition.Within our DFT models, the WM field is modelled separately to imply functional differences from perceptual fields and decision nodes.
More generally, we adopt the view that the WM field represents the instantiation of a property that might exist in all of the cortex.
More broadly within the DFT framework, moving in and out of these attractor states is conceptualized as the formation of thoughts.The refinement of these patterns is conceptualized as learning.Finally, cementing these patterns through repeated engagement over time is conceptualized as development.In general, every attractor system can be represented by an activation variable moving in and out of these states.However, this view does not take into consideration differences in representational states.Consider this example, you reach out to grab a cup that contains coffee, placed on the right side of a table .A perception activation variable should activate to a stable 'on' state, a long-term memory activation variable should activate to recognise the presence of coffee, and a motor preparation and movement variable should activate before the arm actually starts to move.Now, consider differences in the same example: reaching out to a red cup on the table versus reaching out to the red cup, amongst blue cups on the right side of the table versus reaching out to a red cup placed on the left side of the table.Capturing these different representational states require (1) splitting single activation variables into multiple fields (color, orientation, location etc.) and further, (2) allowing for continuous metric representation of dimensions (left to right for location, blue to yellow for color, etc.).Such metric dimension fields can be coupled together to provide potential architectures for experimental manipulations.
The properties of DF architectures are anchored to properties of neural populations (see Jancke et al., 1999;Schöner et al., 2015).However, these models do not represent single neuronal synapses or capture neuron-to-neuron signalling, instead, their properties reflect activation from populations of neurons.These properties are synonymous with observations from an electrophysiological analyses technique called distribution of population activation.In this technique, firing rates of each neuron to a particular stimulus feature is plotted on a continuous metric scale by centering a gaussian function of fixed width and shape over the center of the cell's receptive field (point of maximum spike rate).When plotted in the XY plane, the x-axis corresponds to the parameters of the dimension and the y-axis refers to the spike rate of each neuron, thus, resulting in a 'tuning curve'.Then, the tuning curves for the pool of neurons are weighted by each individual neuron's spike rate, summed and normalized to obtain distributed population activation of a population of neurons.This approach increases stability by creating a decision not based on single neurons, but pools of neurons.Further, a single neuron is likely to respond to parameters across more than one dimension, therefore, computing a population distribution based on multiple neurons reduces ambiguity.Thus, a gaussian peak of activation in a color metric field (as an example) of a DF architecture is based on DPA estimated from pools of neurons responding to the red colour of a cup.
Neural systems are not stand-alonethey are constantly triggered by external inputs, are inherently noisy, and densely interconnected with neighbouring systems through excitatory and inhibitory interactions.Any activation in a neural system is a result of the summation of all these processes.Indeed, DFT fields encapsulate these processes.Thus, in a DFT field, the rate of change of activation ̇(, ) with time constant   is formally specified with the following equation: Here, −  (, ) represents activation within the neural field. is a point along the metric field.
ℎ  is the resting level of the field, which represents a systematic restoring force that is capable of pulling increased or decreased activation back to a 'starting point'.This starting point is the constant value specified by ℎ  .External input provided to the system is specified by   (, ).
The neural system is subject to random perturbations that cause activation to fluctuate in small quantities.These random perturbations are modelled as Gaussian white noise   .Interactions between fields can be excitatory or inhibitory.Generally, interactions are implemented as the convolution between the field output ((, )) and a connectivity kernel () within or between the fields.The field output ((, )) is calculated using a sigmoid function () = 1/(1 + Exp [−]) with threshold set to zero and steepness parameter .() has a value close to zero for low activation.As activation approaches a soft threshold, () starts to rise and, reaches saturation (a value of 1) when activation is well above the 0 threshold.The connectivity kernel () is defined as a Gaussian function where,   represents the amplitude of the Gaussian function and   2 represents the width of the Gaussian function.
An external input to this field stabilizes a peak of activation at a location in the neural field.Local excitation within the field prevents the peak from decaying and pushes it into a self-sustaining state even after input is removed.However, under just these circumstances, the boundary of the peak will continue expanding and grow out of control.To prevent the peak from growing out of control, this 'excitatory' field is reciprocally coupled to an 'inhibitory' field.
This field receives excitation from the excitatory field and projects inhibition back to the excitatory field.These inhibitory projections create surround inhibition around the peak in the excitatory field and prevent it from growing out of control.This two-layer model is analogous

Building a DF model of visual working memory
In previous sections, we have discussed that WM is an attractor state where peaks of activation can be maintained even after stimulus input has been removed.Peaks are maintained through the balance achieved between local excitatory and inhibitory interactions in the field.For this chapter, we focus on the change detection (CD) task because it has been used to measure VWM processing in children (Simmering, 2012), young adults (Ambrose, Wijeakumar, Buss, & Spencer, 2016;Todd & Marois, 2004;Wijeakumar, Magnotta, & Spencer, 2017b) and older adults (Wijeakumar, Magnotta, et al., 2017b).We review previous b.
different trials), Misses (incorrect different trials), correct rejections (CR -correct same trials) and false alarms (FA -incorrect same trials).We use an updated version of Grier's formula by Aaronson and Watts (Aaronson & Watts, 1987) to calculate Accuracy (A'): According to this formula, A' of 1 indicates perfect performance and a score of 0.5 indicates chance performance.Work from our research group and others have shown that in young adults, A' decreases as VWM load increases from 1 to 6 items.
We estimate capacity (K) at each load using Pashler's formula given by: In young adults, capacity increases as VWM load increases.However, the maximum capacity (Max K) estimates tend to vary between 3 and 6 items.et al., 2014;J. S. Johnson, Spencer, Luck, & Schöner, 2009;Simmering, 2016).The CON or contrast field is an excitatory field that mainly receives afferent external input and serves two roles.First, it acts as a perceptual field that encodes visual stimuli and second, it is also responsible for comparison by providing a 'contrast' between stored items and novel items.
The WM or working memory field is also an excitatory field that allows peaks of activation to stabilize and become self-sustaining to capture how items are stored in working memory.The Inh or inhibitory field assumes the function of inhibitory interneurons and is responsible for projecting inhibition to the WM and CON fields.The Gate node gates activation from the WM and CON fields to the decision-making Same and Different nodes respectively.WM, CON, Same, Different, and Gate nodes have local excitatory interactions that are depicted by circular green arrows in Figure 2. Longer-range excitatory and inhibitory connections are shown in green and red in Figure 2, respectively.The CON field is the site of afferent external input and this mechanism is likened to perceptual mechanisms that encode stimuli.
Projections from CON field: The CON field projects excitation to the WM field and this process is likened to the transition from encoding to working memory processes.It also projects excitation to the Inh field and the Different node and inhibits the Same node.
Projections from WM field: The WM field projects excitation to the Inh field and the Same and Gate nodes and inhibits the Different node.
Projections from nodes: The Same and Different nodes are reciprocally coupled and inhibit one another.The Same node passes excitation to the WM field and inhibition to the CON field.
On the other hand, the Different node passes excitation to the CON field and inhibits the WM field.In the model, the strength of the connection from the CON field to the Different node is set to be much higher than the strength of the connection between the WM field and the Same node.This difference in connection strengths causes the model to overcome the same bias that occurs because the WM field will always have greater activation at comparison (due to the presence of multiple peaks at higher loads) than the CON field.Interestingly, evidence for this bias is also evident in behavioural data, where accuracy for same trials is greater than accuracy for different trials.{Insert Figure 2} Below, we present exemplar equations for the WM field and Same node.The full set of equations (CON field, Inh field, Different node and Gate node) is available elsewhere ( Johnson et al., 2014).
The activation in the WM field is given by: Here, −  (, ) specifies rate of change of activation along space x as a function of time, t. ℎ  represents the resting level.In addition, each field contains spatially correlated noise created by convolving a field of independent noise sources with a Gaussian kernel: External input (, ) is projected as a Gaussian with strength c.These inputs are turned on for a time interval specified by the pulse function χ () .
Interactions within and between the nodes have a similar representation,  (()), where (()) is a sigmoidal threshold function and  is the node-to-node connection strength.
The connection strength between both nodes and the Gate node is weighted by the sigmoidal threshold of the Gate node ( ( ()) ∫  ,  (   ( ′ , )) ′ ).Long-range interactions from a field to node is represented by the summed excitation or inhibition projected to the node (for eg. the projection from the WM field to the same node: Spencer, Johnson, Simmering, Buss and colleagues have previously used this threelayer model to demonstrate how Hits, Correct rejections, False alarms and Misses occur (Costello & Buss, 2018; J. S. Johnson et al., 2014;Simmering, 2016).They have also used this model to explore the origin of capacity limitations.We review their work in the following sections.Further, we present findings from new quantitative simulations to bring together work in early childhood with young and older adulthood.To run quantitative simulations, we used the same parameters as those used by Costello and Buss to simulate behavioural data from a Color CD task from young adults (Costello & Buss, 2018).We simulated 50 same and 50 different trials across 20 runs (equivalent to 20 participants) for each of the VWM loads (1 to 6 items).We compared model fits to the behavioural data from another study from our lab (Ambrose et al., 2016).In later sections, we modify the base 'young adult' model parameters to capture behavioural data from early childhood (using behavioural data from (Simmering 2012b) and McKay et al. [in preparation]) and late adulthood (Wijeakumar, Magnotta, et al., 2017b).

How does the 3-layer DF model capture VWM behaviours?
To demonstrate the working of the three-layer model, we begin by showing how the model explains Correct rejections and Hits for a load of 3 items.this process is encoding in the model (Figure 3a).Weak external input is also projected to the WM field.This weak input combined with the excitatory projections from the CON field causes peaks of activation to form in WM field; this process is consolidation in the model (Figure 3a).
WM and CON fields project excitation to the Inh field.Peaks of activation in the Inh field projects strong inhibition to the WM and CON fields.During the delay phase in the absence of external input, low levels of local excitation and strong inhibition from the Inh field diminishes peaks in the CON field and builds troughs at those locations.On the other hand, higher local excitation and inhibition from the Inh field allow peaks in WM to become self-sustained; this process is maintenance in the model (Figure 3b).Essentially, at this state, memory representations are being actively maintained in working memory.In a same trial, the items in the test array match the memory array.Thus, at the onset of the test array, external input is projected to the CON field, but activation is unable to pass threshold because of the strong inhibition from the Inh field at the same locations (Figure 3c).After the presentation of the test array, the Gate node receives the necessary activation from the WM field and the external input from the presentation of the test array, to allow the WM/CON fields to communicate with the decision-making nodes.In this instance, the WM field has greater activation than the CON field and projects excitation to the Same node, resulting in the system correctly identifying a same trial; this process is comparison in the model (Figure 3d).{Insert Figure 3} correct rejection trial during the presentation of the memory array (Figure 4a) and the delay phase (Figure 4b).During the presentation of the test array, the novel item generates a peak in an uninhibited part of CON field (shown by a red '*' in Figure 4c).The CON field projects excitation to the Different node and the system correctly identifies that it was a different trial (Figure 4d).Note that the connection strength between the WM field and Same node is set lower than the connection strength between the CON field and Different node to overcome a greater same bias from having more peaks of activation in the WM field.{Insert Figure 4} The model can also explain how errors occur.During Misses, items in the sample array build peaks in the CON field, and eventually in the WM field (Figure 5a).During the delay phase, the CON field receives inhibition from the Inh layer and develops troughs at the color locations (Figure 5b).During the presentation of the test array, if the novel item is close to the color of one of the other items in the array and falls within the inhibited region, it can fail to build a peak in the CON layer (see red * in Figure 5c).Activation from the WM field is greater than activation in the CON field, and in this case, the model will incorrectly decide that it was a same trial (Figure 5d).{Insert Figure 5} During False alarm trials, peaks of activation build in the CON and WM fields, following the presentation of the memory array.However, competition between peaks in the WM field could prevent a peak from consolidating (i.e., forming a peak) or could destabilize one of the peaks during the delay phase (see red * in Figure 6a).At the onset of the test array presentation, a new peak builds in the CON field at the location of the old item since this area is no longer inhibited.The CON field projects excitation to the Different node and the model incorrectly decides that it was a different trial (Figure 6d).Thus, in the DFT model, false alarms occur as a result of the loss of peaks due to competition with other peaks.{Insert Figure 6} On the other hand, Misses are mostly a result of the inability to form peaks in the CON field in locations that receive strong inhibition.Our quantitative simulations revealed that the three-

The origin of capacity limitations in the DF model
In the three-layer model of VWM, capacity limitations occur as a result of two contributions: (1) Neighbourhood effects, and, (2) increase in global inhibition.First, local excitation in the WM field helps build peaks of activation.As the number of peaks increase, WM projects stronger excitation to the Inh field, which in turn, projects strong inhibition back to the same locations in the WM field.These projections result in stronger surround inhibition around each peak, resulting in the sharpening of the peak and reduction in overall amplitude.As a result, the WM field projects weaker activation to the Inh layer and so on.Thus, the nature of the interactions between the excitation and inhibition keeps the peaks in the WM field active.As the numbers of peaks start to increase, the area between peaks are heavily inhibited because of overlap from two sources of surround inhibition.Greater inhibition prevents new peaks from forming, limiting the storage of items in the WM field.The second contribution comes from global inhibition; each new peak adds some global inhibition.As the number of peaks build, this global inhibition reaches a point that it is not possible to build another peak without destabilizing an old peak.Metric similarity of stimuli can also influence capacity limitations.If two peaks of activation are very closely separated, then they might fuse together, or one peak might 'kill' the other peak (as we discuss in the previous section on Miss trials).Quantitative simulations showed that the model showed decreasing A' with increasing VWM load from 1 to 6 items (shown in black in Figure 8a).These results agree with findings from Ambrose et al.In an elegant review, Johnson and colleagues explored how this DF model addresses the 'slots' versus 'resource' theories on capacity and resolution of items (J. S. Johnson et al., 2014).Briefly, the 'slots' theory posits that capacity is fixed and items stored have highresolution representations.On the other hand, the 'resources' theory posits that VWM capacity is unlimited; however, as the number of representations increase, their resolution decreases.

(shown in grey in
Johnson and colleagues argue that the DF model offers a unique account that captures properties of both theories.Concretely, in the DF model, peak formation is a discrete, all-ornone phenomenon that occurs only if the peak reaches threshold.If the peak doesn't pass the threshold, it does not become self-sustained.In this respect the DF model is similar to the 'slots' account.However, the number of peaks that can be maintained (capacity) is not fixed and is variable depending upon the excitatory and inhibitory interactions and noise within the field.Variability is also introduced by the possibility of a peak merging with another peak or the destabilization of a neighbouring peak due to competition with other peaks.Finally, the resolution of the representations is also variable in the DF model depending on the interactions within the fieldmore peaks in the WM field result in more inhibition from the Inh field, thus sharpening peaks and reducing overall amplitude.

VWM processing across the lifespan using the DF model
In this section, we will investigate how the DF model might be used to understand VWM processing across the lifespan.Capturing changing dynamics across the lifespan provides a different perspective to manipulating conditions of a task; to understand VWM processing through the lens of age-related changes such as individual differences in early childhood, and defecits in older adults.We have pooled data from three studies conducted by our research group to test model performance across the lifespan (Simmering et al 2012, Ambrose et al. 2012and Wijeakumar et al 2017).In early childhood, the Max K estimate is around 1-2 items in early childhood, and this estimate increases to 4 to 6 items in young adulthood.In late adulthood, capacity drops back to about 2 items.Figure 9  The balance between excitation and inhibition in neurons is important for maintaining stability in the neocortex (Yizhar et al., 2011).Mouse models have shown that the disruption of excitation and inhibition (referred to as the E/I ratio) is associated with the development of neurodevelopmental disorders such as autism (E.Lee, Lee, & Kim, 2017).On the other end of the lifespan, senescence is associated with reduction in the number of dopamine D2 receptors from the age of 20 (D F Wong et al., 1984;Dean F Wong, Young, Wilson, Meltzer, & Gjedde, 1997).In general, the role of the dopamine system is conceptualized as regulating neurons' sensitivity through regulating the signal-to-noise ratio.In a neurobiological found that it reduced the distinctiveness of internal representations (Li, Lindenberger, & Sikstrom, 2001).Specifically, internal representations were less distinctive in the 'old' adult model than in the 'young' adult model.Accounts of age-related loss of distinction in representations reported from neurobiological models is also evident in literature employing macro-scale neuroimaging methods; this account is termed de-differentiation (Fandakova, Sander, Werkle-Bergner, & Shing, 2014).Other accounts of ageing include compensationrelated utilization hypothesis (CRUNCH), and hemispheric asymmetry reduction (HAROLD) hypothesis (Cabeza, 2002;Schneider-Garces et al., 2010).Both these accounts specify the recruitment of 'extra' neural resources to fulfil the less complex demands of a cognitive task in ageing.Indeed, it is still possible that recruitment of compensatory resources is a product of the loss of precision in maintained representations.

VWM load
In DFT, it is possible to represent excitatory-inhibitory interactions discussed above for modelling developmental changes, by altering the properties of the projections between the WM/CON field and the Inh field.It is important to recall, however, that the mapping between neuromodulation, and population dynamics as employed by DFT is somewhat speculative.
Previous work from our research group has shown that in DFT, early developmental changes occur as a result of strengthening neural interactions over timereferred to as the Spatial Precision Hypothesis.Schutte, Spencer and colleagues first tested this hypothesis exploring developmental changes in spatial cognition (Schutte & Spencer, 2009;Simmering, Schutte, & Spencer, 2008).Next, Simmering, Spencer and colleagues showed that increasing the strength of connectivity of excitation and inhibition resulted in sharper, more stable peaks that can explain increasing performance (accuracy and capacity) with increasing age in VWM tasks (Simmering, 2016).More recently, Costello and Buss used the three-layer DF model to investigate the parameters that would need to be modulated to capture changes in behavioural performance in older adults (Costello & Buss, 2018).They tested 10 different models manipulating strength and width of excitation and inhibition, and strength and width of noise.
Most of their models show comparable performance.However, they emphasize that increasing the width of both excitation and inhibition was able to best capture behavioural performance in older adults.
Here, we use the same three-layer model to capture behavioural performance from Ambrose et al. 2015 (described above) to re-demonstrate how manipulating excitation and inhibition can capture behaviour across the lifespan.To do this, we collated data from the studies shown in Figure 9 to create three age groups: early childhood (3, 4 and 5 year-olds from Simmering et al. 2012), young adulthood (18 to 30 year olds from Ambrose et al 2015) and older adulthood (> 60 years from Wijeakumar et al. 2017).Figure 10 shows Hits and Correct rejections for these three age categories.Children demonstrated a lower percentage of Hits and Correct rejections than young adults.However, older adults only showed a lower percentage than young adults in Hit trials.Interestingly, older adults 'behaved' like young adults during the 'same' trials, whereas, during different trials, their behaviour was similar to that of children.We used this critical difference between the three groups to 'age' our model.It is possible to randomly vary model parameters to test model performance, but we adopted a systematic theory-driven approach by anchoring our model to neurobiological evidence that we discussed at the start of this section.We varied two key parameters of the model: (1) Strength of local excitation in CON and WM and inhibition from Inh field to WM and CON (Spatial precision hypothesis reported by Simmering and colleagues) and (2) Width of local excitation in CON and WM and inhibition from Inh field to WM and CON (reported by Costello and Buss).Note that Simmering and colleagues only varied the strength, and not the width of connectivity, so we tested whether the latter could also capture changes in early development.
The parameters that were used to capture behaviour from children are shown in blue, young adults are shown in black, and older adults are shown in red (Width model) and orange (Strength model), in Table 1.In line with findings from Simmering et al., we varied the strength of local excitation in CON and WM, and, the inhibition between Inh and CON and Inh and WM, from 10% to 100% of the estimates used for young adults.Model performance approached behavioural performance when the strength of excitation and inhibition was reduced to 60% to 70% of the estimates used for young adults.However, the model did not perform as poorly on different trials because the strength of the excitation from the CON field to the Different node was strong.When this strength of excitation from the CON field to the Different node was reduced (to 60% of the estimates used for young adults), the model captured behavioural performance in early childhood (RMSE = 0.09).Decreasing the strength of excitation and inhibition reduced the robustness of the peaks in the CON and WM fields, allowing them to become de-stabilized relatively quickly.Losing stability increased the possibility of errors across both same and different trials.These findings are in agreement with those from Simmering and colleagues (Simmering, 2012).
Table 2 about  Strong excitation is projected to the Inh field, which in turn, projects strong and broad inhibition to the CON field to form small peaks in broadly inhibited regions.In the Width model, similar but less pronounced effects are observed in the CON field.Figure 11b shows summed output activation from the WM and CON fields throughout a trial duration during load 3 Hits and Misses.As expected, across both models, summed activation is higher for Hits than for Misses in the CON field.The converse is true for the WM field.One big functional difference is evident between both models.Both models begin encoding in the CON field around the same time; however, it takes longer for these peaks to decay (or be supressed by Inh field) in the Width  (Forsberg, Johnson, & Logie, 2019;Wijeakumar, Magnotta, et al., 2017a).On the other hand, the Width model might represent a different form of age-related decline, where defecits in perceptual processes result in longer decay times for peaks in the CON field.Such model variations might also be used to understand age-related changes across other types of working memory (Brockmole & Logie, 2013;Johnson, Logie, & Brockmole, 2010;Maylor & Logie, 2010).{Insert Figure 11}.
Figure 12 shows the percentage of Hits and Correct rejections, and A' for behavioural data and the quantitative simulations for children (blue), young (black) and older adults (red).
Using the DF model to make hemodynamic predictions of brain activation?
In recent work, our research group adapted a model-based fMRI approach from Deco and colleagues (Deco, Rolls, & Horwitz, 2004) to make hemodynamic predictions in an inhibitory control Go/Nogo task (Buss, Wifall, Hazeltine, & Spencer, 2013;Wijeakumar et al., 2017).We used the same approach to make hemodynamic predictions using the three-layer VWM model.We used this approach to create an average hemodynamic response function for each load, trial type and component of the model.This process was repeated for early childhood, young adulthood and older adulthood models.We compare these hemodynamic response functions to findings from two studies that used functional near-infrared spectroscopy to record brain activation (Wijeakumar, Magnotta, et al., 2017b;McKay et al. [in prep]).We averaged activation across channels from the functional near-infrared spectroscopy system to obtain an exemplar HRF from the left frontal, right frontal, left parietal and right parietal cortices, for each condition and age group.Note that we report only on the fronto-parietal network for both studies because we did not record from the occipito-temporal cortices in McKay and colleagues.Figure 13 shows HRFs for load 3 (averaged across same and different trials) from recorded brain activation (top row) and predicted hemodynamic activation from the WM field (bottom row) for early childhood, young adulthood and older adulthood.We highlight two observations.First, predictions from the WM field of the three-layer model match trends in brain activation across age group.Specifically, the model predicted that children would recruit greater activation than young or older adults at load 3. Second, this neural signature is pronounced in the left frontal cortex and bilateral parietal cortex suggesting that the WM field might be distributed across these cortical areas, in accordance with a distributed or integrated account of VWM processing (we discuss our stance on this in an upcoming section).{Insert The association between WM and long-term memory (LTM) processes: A central issue in the memory literature is whether WM is essentially re-activated LTM (Atkinson & Shiffrin, 1971;Cameron, Haarmann, Grafman, & Ruchkin, 2005;Cowan, 1988;Lewis-Peacock & Postle, 2008;Oberauer, 2002;Öztekin, Davachi, & McElree, 2010).Recent studies have reported that neural activation recruited by LTM processes were also observed in WM processing.A notable example is work from Lewis-Peacock and Postle who trained a classifier to first identify brain activity from LTM processes involved in making judgements about pictures of famous people, locations and objects (Lewis-Peacock & Postle, 2008).Then, outside the scanner, participants learnt the association between stimuli (for eg. a person with a location).Finally, participants engaged in a delayed-paired-association recognition task, where they were required to indicate whether the first and second stimuli presentations were associated.This LTM-trained classifier was able to successfully decode delay-period activity from a network of regions for stimuli that were associated through the learning phase.This work showed that retention of information during WM processing could be supported by LTM representations.In reviewing all of this evidence, Norris makes a valid observation; both WM tasks and LTM tasks are likely to recruit both processes, so merely observing LTM-related activation in a WM task does not imply the absence of WM processing (Norris, 2017).Lifespan studies might add value to this discussion which has, thus far, been fuelled by findings from young adults and lesion patients.On one end of the lifespan spectrum, in infancy, VWM processing can be measured as early as four months of age using the preferential-looking task (Ross-sheehy et al., 2003).This demonstration questions the veracity of the account that WM is essentially re-activated LTM processing.On the other end of the spectrum, older adults reportedly employ verbal strategies to successfully detect changes in VWM tasks.Specifically, work from our group found that older adults associated labels based on their familiarity to abstract shapes in the memory array of a shape CD task and relied on rehearsal to compare shapes across the memory and test arrays (Wijeakumar, Magnotta, et al., 2017a).Our findings are supported by recent work from Logie and colleagues, who suggest that age-related changes in VWM might reflect the difficulty in actively engaging cognitive systems required to employ strategies to successfully perform the task (Forsberg, Johnson, & Logie, 2019;Logie, 2018;Logie, Belletier, & Doherty, 2020).These findings in older adults defend the dependence of WM processes on LTM.Our experience with implementing LTM in DF models allow us to deviate from the discussion of whether WM is essentially activated LTM, and instead posit that both processes are inextricably linked and share bi-directional properties; in infancy, LTM processes are built through sufficiently maintained WM traces, and in late adulthood, WM processing is reliant in LTM processes.Concretely, in DFT, LTM is implemented through the creation and strengthening of 'sub-threshold' memory traces in Hebbian fields within the DF architecture.These Hebbian fields are coupled to their respective main neural fields (for eg.WM and CON).Simmering, Perone and colleagues modified the standard three-layer DFT architecture to capture looking behaviour in infants and children in a preferential-looking task (Perone, Simmering, & Spencer, 2011;Simmering, 2016).They added a fixation system that would stochastically look at two side-by-side flashing displays of colored squares, where one side contained a square that constantly changed its color.Here, peaks of activation were encoded in the CON field and, consolidated in the WM field as the model passively viewed each display.Those peaks that were successfully encoded and maintained also built memory traces in the associated Hebbian fields.When the model stochastically returned back to attended displays, these memory traces were strengthened if the items were successfully encoded and maintained.Conversely, strengthening memory traces also sped up encoding in the CON field and strengthened maintenance of the peaks of activation in the WM field.This work highlights the how LTM traces can be formed from mounting WM peaks, and conversely, how suitably strengthened LTM traces can influence future encoding and maintenance in future WM peaks.

A distributed account of VWM processing:
WM processing has been shown to recruit a distributed network of areas across the visual, parietal, temporal and frontal cortices.The significance of the term 'distribution' within the context of WM processing is multi-fold.First, conventional load-dependent effects have been observed across the intraparietal sulcus, inferior parietal lobule, superior parietal lobule, dorsolateral prefrontal cortex and frontal eye fields (Druzgal & D'Esposito, 2003;Linden et al., 2003b;Ma, Husain, & Bays, 2014;Pessoa & Ungerleider, 2004;Postle, 2015;Rypma, Prabhakaran, Desmond, Glover, & Gabrieli, 1999;Todd & Marois, 2004).Second, different stimuli dimensions have been shown to recruit a wide distribution of cortical regions during the delay period.For instance, lower-level features such as color and orientation have been decoded from the visual cortex (Harrison & Tong, 2009;Serences, Ester, Vogel, & Awh, 2009), whereas, complex shapes have been decoded from the frontal eye fields (Ester, Sprague, & Serences, 2015;Wijeakumar, Magnotta, et al., 2017b).Further, more complex stimuli such as natural objects (S.-H.Lee, Kravitz, & Baker, 2013)have been decoded from the prefrontal cortex.Third, WM sub-processes of encoding, maintenance and retrieval recruit some common, but mostly unique areas in the frontal, parietal and motor areas (Linden et al., 2003).This evidence has inspired theoretical accounts that subscribe to the view that WM processes are distributed and integrative by nature.
Christophel and colleagues argue for moving away from a modular perspective of WM processing, where areas perform specialized functions, towards the distribution of cortical networks working together to transform sensory information to behaviour (Christophel, Klink, Spitzer, Roelfsema, & Haynes, 2017).Their account acknowledges the likelihood of a posterior-to-anterior gradient of information processing in the brain, where the posterior cortex is involved in processing sensory information and the anterior cortex is involved in processing information that is abstract or categorical.This view is supported by Postle and colleagues, who advocate that WM emerges through the recruitment and coordination of cortical areas that have evolved to process sensory and action-related functions (Postle, 2006;Postle, 2020).This account further proposes that a representation held in WM will recruit as many resources as necessary.An example that anchors both accounts to our work is the engagement of a distributed network of areas in the posterior and anterior cortices in older adults as they employ strategies to successfully attend to complex shapes in a shape CD task and (Wijeakumar, Magnotta, et al., 2017a).Here, older adults reported assigning labels to the abstract shapes, to complete the VWM task.We posit that employing these strategies might have resulted in recruiting as many resources as possible, and the classical posterior to anterior shift in activation observed in older adults.
Neuro-computational modelling accounts from Hazy and colleagues define WM as an emergent process that stems from the interactions between a posterior cortex system, a hippocampal system and a prefrontal cortex/basal ganglial system (Hazy, Frank, & O'Reilly, 2006, 2007).In their model, the posterior cortex system is responsible for sensory and motor processing, the hippocampal system is responsible for rapid learning, and, the prefrontal cortex/basal ganglial system is necessary for active maintenance of internal contextual information.Concretely, they argue that their model builds on existing mechanisms in motor control where the basal ganglia modulates frontal motor representations in inhibitory control, to the current model where it modulates the maintenance of more abstract frontal representations in WM processing.
In accordance with these accounts, we also posit the instantiation of WM in DFT anchored to encoding and decision-making is distributed across cortical networks in the brain.
In one sense, the implementation of WM in DFT is domain-general, wherein the activation as a result of scaling of interactions within and between fields and in some cases, other parameters such as noise and resting level will determine the nature and type of distribution across cortical networks.Recall, however, that in its core principles and origin, DFT grounds cognition in the sensorimotor processing; in this sense, our model cannot be strictly domaingeneral.Our account appears to deviate away from, but not necessarily disagree with the popular notion of a central executive or homunculus as described in the Baddeley and Hitch model (Baddeley, Hitch, & Allen, 2020;Baddeley, 1996).Indeed, in the DFT account and the interpretation of the accounts described above, the role of the homonculus is subsumed by the properties of the systems, instead of assuming the highest, and somewhat mysterious role in a hierarchical model.

Conclusion
In DFT, WM is a self-sustaining attractor state attained through strengthened interactions between self-excitation and lateral inhibition.This state is represented in a WM field and coupled with fields that have stabilized attractor states representing encoding and comparison, and decision-making processes, to implement performance in a change detection task.
Capacity limitations are demonstrated through competition between neighbouring peaks and greater global inhibition preventing the formation of new peaks without destabilizing an existing peak.Under the framework, improvements in VWM processing from early childhood to young adulthood is a result of strengthening interactions between excitation and inhibitionreferred to as the Spatial precision hypothesis.Strengthened interactions create more distinct and precise peaks of activation, and by extension, more precise representations of the stimuli.On the other hand, the model captures performance in older adults by increasing the width and strength of excitatory and inhibitory projections in the fields; creating less distinct representations and/or interfering in the comparison process.Lastly, our model predictions are in line with hemodynamic activation recorded from the fronto-parietal network across the lifespan, advocating for a distributed perspective of VWM processing in the brain.
Figure 1a.Distribution population activation (DPA) measured from cat primary visual cortex when two squares of light were presented simultaneously at varying distances (left to right).Figure 1b.Superposition of DPAs from separate presentations of the two squares of light, at varying distances (left to right).Adapted from Jancke et al.1999.
work and use an integrative DF model to capture performance in a Color CD task and shed light on current debates surrounding capacity limitations, production of errors, development of VWM processing across the lifespan, and the association between brain and behavior.Each trial in a Color CD task begins with the presentation of a memory array of colored objects followed by a short delay and finally, a test array of colored objects.At the end of the test array, participants have to indicate if the colors of the objects in the memory and test arrays were 'same' or 'different'.Under the 'different' condition, the color of one of the objects differs between the memory and test arrays.VWM load is typically varied between 1 and 6 items.Based on participants' responses, four trial types can be estimated -Hits (correct a.
Our VWM DF model has three continuous metric fields (CON, WM and Inh), two decision-making nodes (Same and Different nodes), and one Gating node [Figure3] (Johnson

Figure 2 :
Figure 2: Three-field DFT model of visual working memory.Green arrows represent excitatory connections and red arrows represent inhibitory connections.
Figure 3 shows snapshots of the state of the CON, Inh and WM fields at the end of the memory array, delay and test array for a Correct rejection trial.It also shows the activation of the Same and Different nodes throughout the duration of the trial.During the presentation of the memory array, afferent input is projected to the CON field at three different locations causing peaks of activation to form;

Figure 3 .
Figure 3. Working of the model during a Correct rejection trial.Dotted line across the snapshots indicates the baseline (value of 0). Figure 3(a) Activation in the three fields at the end of the memory array.Figure 3(b) Activation in the three fields at the end of the delay phase.Figure 3(c) Activation in the three fields at the end of the test array.Figure 3(d) Activation in the Same (dashed line) and Different (solid line) nodes throughout the trial.Vertical red lines indicate the onset of the memory array (M), delay (D) and test array (T)

Figure 4 .
Figure 4. Working of the model during a Hit trial.Dotted line across the snapshots indicates the baseline (value of 0). Figure 4(a) Activation in the three fields at the end of the memory array.Figure 4(b) Activation in the three fields at the end of the delay phase.Figure 4(c) Activation in the three fields at the end of the test array.The peak of activation for the novel item built in the CON layer during the presentation of the test array is shown by a red '*'. Figure 4(d) Activation in the Same (dashed line) and Different (solid line) node throughout the trial.Vertical red lines indicate the onset of the memory array (M), delay (D) and test array (T).

Figure 5 .
Figure 5. Working of the model during a Miss trial.Dotted line across the snapshots indicates the baseline (value of 0). Figure 5(a) Activation in the three fields at the end of the memory array.Figure 5(b) Activation in the three fields at the end of the delay phase.Figure 5(c) Activation in the three fields at the end of the test array.The peak that failed to build in the CON field following the presentation of the test array has been shown by a red '*'. Figure 5(d) Activation in the Same (dashed line) and Different (solid line) node throughout the trial.Vertical red lines indicate the onset of the memory array (M), delay (D) and test array (T).

Figure 6 .
Figure 6.Working of the model during a Correct rejection trial.Dotted line across the snapshots indicates the baseline (value of 0). Figure 6(a) Activation in the three fields at the end of the memory array.Figure 6(b) Activation in the three fields at the end of the delay phase.Figure 6(c) Activation in the three fields at the end of the test array.The disappearance of a peak from the WM field due to competition between peaks is shown by a red '*'. Figure 6(d) Activation in the Same (dashed line) and Different (solid line) node throughout the trial.Vertical red lines indicate the onset of the memory array (M), delay (D) and test array (T).

Figure 7
Figure 7(a).Percentage of Hits across loads of 1-6 items.Figure 2(b).Percentage of Correct rejections across loads of 1-6 items.Behavioural data is adapted from Ambrose et al. 2015 (shown as grey bars) and modelled using the three-layer VWM DF model (shown as black bars).
Figure 8a).Further, Max K estimates were similar across the model and data from Ambrose et al. 2015 [see Figure 8b].The root mean-squared error (RMSE)between this model performance and the behavioural data was 0.05.{Insert Figure8}

Figure 8
Figure 8(a) Accuracy (A') across loads of 1-6 items.Figure 8(b).Max K across all loads.Note that behavioural data is adapted from Ambrose et al. 2015 (shown as grey bars) and modelled using the three-layer VWM DF model (shown as black bars).
Figure 9. Accuracy estimates from studies using a color CD task.Data points from childhood are shown in hues of blue, data points from young adults are shown in grey and black, and data points from older adults are shown in hues of red.
et al. reduced the G parameter of the sigmoidal activation function (likened to reducing the unit's average responsivity to excitatory and inhibitory signals) and

Figure 10 .
Figure 10.Hits and Correct rejections for early childhood, young and late adulthood created by pooling data from the studies shown in Figure 9.
here On the other hand, two versions of the DF model captured behavioural data from the older adults.In the Width model, the width of local excitation and lateral inhibition in CON and WM field was increased by 12% of the estimates used in young adults.In the Strength model, the strength of local excitation and lateral inhibition was increased by 90% of the estimates used in young adults.Both the Width model (RMSE =0.1) and Strength model (RMSE = 0.11) were able to capture the trend in behavioural performance following quantitative simulations.Figure 11a shows the working of both models during a load 3 different trial.In the Strength model, increased strength of local excitation in the WM field creates large robust peaks of activation.
model.Therefore, in the Width model, activation in the WM field is kept supressed throughout the trial.Thus, the Strength model elicits greater activation than the Width model in the WM field during Hit trials.Both variants of the DF model could be informative in exploring individual differences.The Strength model might represent brain mechanisms in older adults who recruit verbal strategies, resulting in quicker consolidation of items in the WM field.However, in general, such reliance is still indicative of age-related decline because it relies on extra resources to complete the task

Figure 11
Figure 11 (a).Working of the Width model and Strength model on a load 3 different trial.Blue line represents activation at the end of the sample array.Cyan line represents activation at the end of delay.Black line represents activation at the end of the test array.Figure 11(b).Summed activation in the WM and CON fields for the Width model and the Strength model for hit and miss trials at load 3. First dashed line indicates the end of sample array and second dashed line indicates the end of the delay period.

Figure 12 .
Figure 12.Model performance capturing Hits, CRs and Accuracy in early childhood (blue), young adulthood (black) and late adulthood (red).Note that two potential 'older adulthood' models have been discussed.The parameters for the Width model are shown in red inTable 1 and the parameters for

First, wen
created a local field potential (LFP) from every field and node by summing the absolute value of all the terms in this equation that contribute to the rate of change of activation, with the exception of the stability terms −(, ) and ℎ.Below, we show the equation for calculating LFPs from the WM field:   () = 1  |∫  , ( −  ′ )  (  ( ′ , )) ′ | + 1  |∫  ,ℎ ( −  ′ )  (  ( ′ , )) ′ | + 1  |∫  , ( − ′)  (  (′, ))′| +|   (   ())| + |(  *   )| − |(  *   )is the number of units in each field.Since each field in the model has different contributions, the LFP measures are unique.LFPs are generated for each model component, condition, trial (100 trials) and run (20 runs in total).Then, they are averaged across each trial and run to create a predicted LFP per condition (for eg.load 3 same trial).Next, to obtain a predicted hemodynamic response, we convolve the average LFPs with a canonical impulse response function specified by, = 1.3 and n =4.Note that this process is similar to what is described in the fMRI literature.

Figure 13 .
Figure 13.Recorded brain activation (top) and predicted hemodynamic activation from the WM field of the model (bottom) from children (shown in blue), young adults (shown in black) and older adults (shown in red).

Table 2 .
Parameters used for the three-layer VWM model.Parameters shown in black were used to capture performance from Ambrose et al. 2015 for VWM load of 1 to 6 items, and the 'young adulthood' category.The parameters that were scaled to capture performance in early childhood are shown in blue.The parameters that were scaled to capture performance in older adulthood are shown in red (Width model) and orange (Strength model).Note that the parameters shown only in black were unchanged across the lifespan.The slope of the activation β = 5.0 for all terms.
Table 1 and the parameters for the Strength model are shown in orange in Table