- Sep 2020
-
Local file Local file
-
executed one after another without considering feedbackfrom the environment during the sequence
The hierarchical account states we do not take the second stage context into account and simply pick what we already decided before we even saw the second stage context.
-
action control in stage two should not dependon stage one
This is key to the whole setup of the two-step markov decision task right: Once we have arrived in the second stage, we do select based on the known context, regardless of MB or MF RL. The effect happens on the switching the next trial.
-
Here we show that first stage habitual actions, explained by themodel-free evaluation in previous work, can also be explained byassuming that first stage actions chunk with second stage actions
So this does not actually account for the model-based behaviour, which we hope we can build
-
may best be viewed as actionsequence
So a habit - something learned according to a model-free scheme - can trigger an action sequence
-
-
-
participant could use this information to select the action that has a relatively high expected valueon common transitions. Thus, arare transition would lead to a state with lower expected value,yieldinga negative RPE
This seems very likely to me! Is there some natural way to correct for this? Current estimation of expected value as a covariate, regress it out, or something?
-
analysing frequency of stay on the first step choice should reveal an interaction effect between transition type and second choice feedback outcome in the preceding trial on the frequency of repeating the first step choice
Participants have selected one spaceship because they like its major planet -> move to minor planet -> obtain reward -> select alternative spaceship (win-shift)
Move to minor planet -> obtain no reward -> select same old spaceship again (you want to go to major planet) = lose-stay
-
Thesefindings implicate the aMCC in the processing of theSPEs
Fronto-medial theta has before already been 'localized' to the aMCC (likely)
-
Predicted Response-Outcome(PRO)
But this one model did say it should! (no data)
-
Gläscher et al. (2010) conducted an fMRI experiment usinga paradigm that featuredcommon and uncommon transitions and found that the intraparietal sulcus and lateral PFC are sensitive toSPEs
So empirical results show: aMCC does not do this
-
transition model
e.g. a successor representation (but not limited to this)
-
-
Local file Local file
-
ACC appears to encodehierarchical structure with distributed representations, which makes the parcellation problemeven harder
It is not immediately obvious how hierarchy can be encoded in continuous time distributed neural representations. This is a different beast from a symbolic AI algorithm that updates a and then draws b etc.
-
Individual ACC neurons seemcapable of responding to most task events, with particular mixtures of sensitivities within andacross neurons continually reallocated according to changing task conditions [72]
Very cool target for a modeling study?
-
phasic bursts of norepinephrine, which may serve as a neural-interrupt signal[67], can reset network activity in ACC [68] and thus allow for module re-binding
So the LC-NE system can 'reprogram' the communication through the ACC?
-
Specifically, when task demands are high (e.g., afteran error), ACC would send a synchronizing signal to lower-order modules, with consequentsynchronization and thus improved communication between those lower-order modules
ACC is now also the great orchestrator of communication everywhere - What is left for the dlPFC?
-
ACC motivates sticking to a plan
Framing the ACC for extended control of sequences thus states that it keeps track of how much of this cost of planning would likely still be worth it. This is basically the same idea as the 'expected value of control' theory, although the function of ACC is expanded upon much by HMB-HRL theory.
-
At face value, such a self-regulating controlmechanism is both computationally [48] and evolutionarily [49] maladaptive
No! A self-regulating control systems should sometimes turn itself off! This is the whole reason we have cost added into the mix.
-
feedback-based control mechanisms constitute thebread-and-butter of control theory in engineering (Box 1), but these always concern theregulation of subordinate systems, never self-regulation
So a theory in which the ACC adapts its own control by detecting conflict is not 'natural' from an engineering standpoint - it should modify subordinate systems?
-
For example, aprominent computational model of ACC contains units that exhaustively predict all possiblestates of the task environment, generating prediction errors to unexpected transitions; thoughnot explicitly used in the model for this purpose, in principle the prediction errors can providelearning signals for MB-RL [47]
ACC-dlPFC theory as a super-learner for all unexpected events? Sound a bit predictive-processing ish
-
ACC could use such models to plan overtemporally extended action sequences
So it would take a planning function, which is in much literature associated with the HPC? How do the two interact, seems like a very relevant question!
-
-
Local file Local file
-
can be useful in the context of multitask learning to extract useful, reusable policies
The DR is some sort of generalized representation of shared structure of a task (family)?
-
how it is learned
There is no clear proposal yet as to how the DR would be learned, it could be very similar to SR?
-
default policy plays the role of prior over policy space and rewards play the role of the likelihood function
Options also function as some sort of prior over action selection potentially?
-
empiricallyunderconstrainedtheoreticalflexibilityinspecifyinghowatask’sstatespaceshouldbe.CC-BY-NC-ND 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted May 5, 2020. . https://doi.org/10.1101/856849doi: bioRxiv preprint
Exactly a problem of PFC research - empirically underconstrained in what should be represented
-
finite decision problem
So linear RL cannot be a general theory of open-ended lifelong learning
-
distinguishing between terminal states (representing goals), and nonterminal states (those that may be traversed on the way to goals)
How exactly is this implemented, and what does this mean for our RNN architecture, which has a goal-representation space separately?
-
“control cost,” KLL𝜋||𝜋MN, whichisincreasinginthe dissimilarity (KL divergence) between the chosen distribution 𝜋andsomedefaultdistribution,𝜋M.
Control cost inherent in the model! Can be linked to expected-value of control model very naturally
-
assumea one-to-one, deterministic correspondence between actions and successor states
The next state is directly and exclusively dependent on the action we take
-
The only way to find the latter using equation(2)is by iteratively re-solving the equation to repeatedly update 𝜋and𝐒untiltheyeventuallyconvergeto𝜋∗
Policy iteration (solving through search)
-
assuming that all choices are made followingpolicy 𝜋
Since in practice, the state transition function is likely to be dependent on the chosen actions, it is wise to note it as S(pi), as the long-run state visits depend on the action selection according to this fixed policy
-
The default policy and cost term introduced to make linear RL tractable offers a natural explanation for these tendencies,quantifies in units of common-currency reward how costly it is to overcome them in different circumstances,and relatedly offersa novel rationale and explanation for a classic problem in cognitive control: the source of the apparent costs of “control-demanding” actions
Deviations from the default policy are what constitute 'control demanding' actions? => So the more we deviate from this, the more dACC activity we can expect, something like this??
-
SR theory predicts grid fields must continually change to reflect updated successor state predictions as the animal’s choice policy evolves, which is inconsistent with evidence
Entorhinal grid cells have this 'fourier-domain map of task space' but do not continuously change their representations to fit with different goals - as would be necessary under vanilla SR theory
-
Fourier-domain map of task space
Figure out what this means exactly...
-
stable and useful even under changes in the current goalsand the decision policy they imply
How exactly would they achieve this difference from the SR? Long-run state expectancies sieems to be the definition and also the problem of SR
-
For instance, a change in goals implies a new optimal policy that visits a different set of states, and a different SR is then required to compute it.
This is exactly what an option would look like!
-
However, it simply assumes away the key interdependent optimization problem by evaluating actions under a fixed choice policy(implied by the stored state expectancies)for future steps.
Assumes fixed, constant, probabilities of future state visits
-
-
Local file Local file
-
The present study confirms that the aMCC’s distributed code for temporal information is not sufficiently consistent across blocks and sequence types to be detectable using the ROI classification approach, revealing only a weak effect size in the generalization analysis.The two approaches therefore appear to provide complementary information
Re-examine the methodology of the RSA in the aMCC study - is it very tailored?
-
domain-general role to pars orbitalis in learning the relationship between environmental events and transition probabilities between various environmental states
Is this sucessor representation learning??
-
By contrast, inconsistentwith the past literature, in the ROI analysis, we did not find evidence for involvement of the aMCC and hippocampus
So basically our theory is already under heavy scrutiny? Exactly the opposite of what we want to see!
-
whether the stir action was performed in a tea or a coffee sequence, irrespective of the sequence position of the stir actio
Only context, no temporal information
-
discriminate between the first and second instance that the stir action was performed
Temporal information -> progression through sequence information (regardless of coffee vs tea task)
-
first defining functional or anatomical ROIs based on subject-specific data
So it is very theoretically based, not like a random cluster search across all voxels
-
serial rank order
This is somewhat different from generic information maintanence in WM?
-
contextual information
So this is basically the same as Working Memory
-
-
Local file Local file
-
At the same time,we observed that dlPFC reinstatement of CTD positively scaledwith the hippocampal pattern similarity between the two over-lapping contexts
So in dlPFC the CTD did have similarity over the two contexts, even though they were distinct in the HPC?
-
hippocampal differentiationeffect
More similar task demands should yield increasingly DIFFERENT HPC representations? Because we separate them?
-
congruency: match/mismatch between the CTD and the actualtask demands on the trial)
Some trials in context 1/2 will have task demand associated with 3/4. This is 'incongruent'
-
ask-setsinclude additional instructions on“how”
Task sets are conceptually different from the process that can identify the task set based on cueing - that is an associative / semantic process?
-
proactively retrieves probabilistically likely task-sets
The cueing of task sets - dont conceptualize this as being even higher in the hierarchy?
-
-
Local file Local file
-
Black dots indicate stable fixed points
You can see in the DMC the RNN has created 3 stable states it can occupy - not only the fixation at the start. Also two for the sample->delay moment, dependent on the category of the sample stimulus!
-
time (ms)
You can see accurate decoding along a wider spectrum of time - stable maintenance of information!
-
stable states associated with each category at the end of the sample period in the DMC task
This really is maintanence of classificatory information after the first stimulus!
-
test stimulus
The second stimulus is the 'test stimulus'
-
sample stimulus
The first stimulus is called the 'sample stimulus'
-
It is likely that this phenomenonis mediatedbyinteractions among different brain regions involvedinthe OIC and DMC tasks. Indeed, LIP is connected with the dorsolateral prefrontal cortex(DLPFC)
Cognitive Control area through dlPFC might be responsible for 'reprogramming' what goes on in LIP? Flexible readouts, different effect of recurrent encoding, etc?
-
greater compression of activity among direction within categories in the DMC task
Exactly what you would expect right, as direction does not matter for encoding category itself? We are talking about matching.
-
compressing variability among directions within a category
So it were mainly the response directions that were still encoded in the OIC task in LIP. In DMC this disappeared in favour of more population-level category coding.
-
we evaluated the temporal stability of category decoding using SVM decoders that were trained at one time point and then tested at all other times pointsin the shared sample period
Check the maintenance of information at time-point 1 by training a decoder on it and applying it to future time-points!
-
category computations are supported by a common subpopulation of neurons in the early sample period and different subpopulations of neurons or different readout mechanisms in the late sample period
Some perceptual mechanism is task-independent, simply information providing and encoding. Later readout can be flexible according to task demands!
-
attractor dynamics appears to compress category-related information to a simpler, binary format by collapsing all directions within a category towards a single population state
Working-Memory component induced in RNN as a function of task demand
-
graded neural activity in the OIC
Direct classification - more room for stimulus idiosyncratic representation?
-
categorical encoding was more abstract with binary-like neural activityin the DMC
Working-memory component
-
-
Local file Local file
-
contingency-degradation
getting reward also when you pick random actions or don't pick anything - non-contingent rewards
-
devaluation
Sudden loss of reward or becoming aversive - MB-RL should instantly stop approaching, while MF-RL gradually
-
assume a predefined state and action space
It is really the representational structure that makes a HUGE difference in the effectiveness of any learning strategy deployed over it
-
apply MF RL updates on retrospectively inferred latent states
Have a model at the ready but don't do anticipatory planning - only retrospective evaluation according to MF-RL learning rules
-
necessarily pressed into the singular axis of MF–MB
So keep an open mind about such things - seems specifically aimed at HEURISTICS - form of wrongful MB-RL?
-
Simple strategies that rely only on working memory,
Looks exactly like MF
-
For MB control to materialize, the agent must identify its goal, search its model for a path leading to that goal and then act on its plan
Hard to model and understand the exact scope of the MB controller, so default to MF evidence if not accurate?
-
features of trajectories in the environment
Successor representations
-
contextual information is used to segregate circumstances in which similar stimuli require different actions
Again, a working-memory procedure?
-
compound representations
Essentially leveraging working memory to create apparently more 'flexible' behaviour, while in reality MF-RL is the only real 'learning' mechanism
-
DAergic signals support both instrumental (action–value) and non-instrumental (state–value) learning in the striatum.
The correct error signals are provided to facilitate any form of RL based learning. In Striatum?
-
Computational RL theory built on the principles that animal behaviourists had distilled through experimentation, to develop the method of temporal difference (TD) learning (a MF algorithm)
Origins of RL are purely associative learning - delta rule style
-
derive value estimates for the different states or actions available
Practical difference between computing desired states and inferring best actions VS directly computing desired actions without explicit state values
-
dimensionality of learning — the axes of variance that describe how individuals learn and make choices — is well beyond two
It is not only the speed / accuracy that is being traded off - which is what MB/MF and all other two-systems seem to boil down to.
-
System1/System2
Many learning theories seem to boil down to this, and MF-RL / MB-RL is exactly this!
-
- Aug 2020
-
Local file Local fileII15
-
rostral ACC activity will predict the probability of switching strategies, whereas caudal ACC activity will predict the probability of staying within a strategy
Univariate more activity in rostral (higher up the hierarchy) means switching? not perhaps more top-down control to STAY in the current strategy?
-
important for hierarchical systems that integrateinformation over extended time periods
In fact it might be the one fundamental reason such hierarchy is useful
-
ill-equipped to simulate control processes that are inherently dynamic, such as the response delays introduced by switching between tasks
Yes, or maybe also the continuous tasks as proposed by Hayden group!
-
by extending previous work that integrated goals into RNNs [38]
The goal-circuit model is an abstraction of the HRL-RNN model?
-
[12,144,181]
Relevant publications explicitly drawing from HRL-RNN theory for ACC
-
cells in isolation, or univariate indicators of ACC function that average across the activity of entire cell populations
Distributed patterns will not be picked up - the guiding function of ACC cannot be detected. Or perhaps, only the 'energizing' part of it
-
caudal ACC and rostral ACC apply control signals that attenuate costs associated with the production of low-level actions
Or is this in some way similar to a 'gating' mechanism, allowing stable representations for control to be 'updated' or amended to current needs, detected by a 'higher' system
-
tonic dopamine levels in ACC stabilize the task in working memory
Something like a hidden state / task set active encoding
-
ACC damage does not interfere strongly with many of the putative functions that have been attributed to it
Crucial aspect of the HRL theory: Everything can still happen, the ACC does not EXCLUSIVELY execute all of these functions, it simply strengthens / guides
-
do large and sudden changes in the state space explain the conflict-likesignals that are commonly observed in ACC?
So basically it is not necessarily an explicit encoding of error, rather the updating of the current context?
-
distributedmanner
What exactly is the meaning and the point of this?
-
an arsenal of mathematical tools from dynamical systems analysis
Reference 58 contains examples of nonlinear dynamical systems analysis for neural networks?! :o
-
bidirectional connectivity between DLPFC and ACC
Explicit recurrence - Learning to reinforcement learn in activity dynamics?
-
the model will predict how hierarchical action sequences are representedat different levels of abstraction along the frontal midline
increasingly abstract goals will be represented along a gradient of the spatial organization of the network - this might have something to do with connectivity between layers? cf. convolutional neural nets
-
individual regions of ACC process both typesof information
Cognitive and emotional functions of the ACC can be found an individual regions all throughout the area
-
-
Local file Local file
-
multiple regression problem, in which the different codingmodels were treated as predictor variables to the observed similarity matrix, the analysis enabledjoint estimation of multiple coding models (
Super relevant: We can estimate the DEGREE to which different features are encoded on a trial-by-trial basis??
-
task-switching study using MVPA
So the sharpening can also be seen using fMRI - single neuron recordings not necessary!
-
increase in the gain (i.e., sharpening) of fron-toparietal task-set coding
This is exactly what is seen in the intracranial recording study Ebitz et al!
-
by using a between-subjects RSAapproach, the analysis was not optimized to capture finer-grained representational structure thatcould be subject-specific
Of course between people the geometry can take on a distinct form, but even if the encoded information is the same? Shouldn't RSA abstract over that?
-
o task-generalcontrol representations were detected
So per task it was very uniquely constructed - not one level for WM, inhibition, etc.
-
separately, across different sub-domains of negative affec
Orthogonal but consistent representation!
-
This exclusive focuson behavioral measures may be suboptimal for construct validation, as brain activity measures canprovide more proximal, higher-dimensional readouts of the neural mechanisms of interest
Is it expected that even though we cannot find common patterns in behaviour, we can still find common patterns in the brain? I thought even within the same task over different sessions, the control representation might differ...
-
A construct validation approach is often used to address this issue.
This is basically how IQ and the sub-measures of it are defined / measured
-
lso relevant is the insight that RSA can be conducted in a time-dependent mannerwithin fMRI, such thattrialsform the dimensions of the similarity matrices
How about sub-trails (e.g. only the picking of sugar) ?
-
strength of conjunctive coding was robustly relatedto trial-by-trial response time
So again - a scalar value strength type signal?
-
For example, inter-ference occurs when a goal-relevant task set and an irrelevant yet prepotent set are simultaneouslyactive.
Is this really a more elaborate model? Would'nt the 'interference' simply move any RSA model closer to the midline between color / word naming task set? We won't have very clear access to neural firing strengths or anything, if that would be relevant.
-
one-dimensionalstructure of the model
The model only investigated the representational strength along the face-house attentional dimension (not an interesting representational geometry?)
-
specifying and comparing representational models is more flexible within RSA
So the benefit is that we get more insight into the geometry of the representation, not just its presence / decodability? Compare different computational models.
-
classification-based decoding, which we simply refer to here as “classification”, and RSA
Distributed patterns can be subdivided like this: decoding and encoding models
-
type and form of informationencoded in LPFC and associated regions of the FPN and CO
So there is relevant information encoded in task set control reps: but there will probably still be a relevant 'intensity' summarizable in a scalar value?
-
independent of particular stimuli, responses, or other task information
Abstract control related factors such as 'congruency' abstract control signals away from directly related stimulus signals
-
highly abstracted, one-dimensional factors
This is the main issue right - Setting up experiments to identify one 'factor' of cognitive control, e.g. 'Congruency' in stroop tasks. It is much more complex multidimensional than that.
-
-
Local file Local file
-
inferred in the current experimentbecause the future value of a patch is, by design, different from itspast value
dACC continuously learning a variable - the slope specifically? Or is it 'simply' trying to predict the prediction errors of lower layers. It seems like the latter would generalize less!
-
The opposing time-linked signals observed do notsuggest that dACC and the other regions integrate rewards to asimple mean estimate (as RL-simple would), but instead pointtowards a comparison of recent and past reward rates necessaryfor the computation of reward trends.
But this was based on full-region regression weights over different time bins, computed from a particular choice moment. How can you determine separate representations with whole-area betas, simultaneously recent reward increases, whereas past reward decreases, the activity of the whole region?
-
which was updated on everytime step using a higher-order PE, PE* (that is, the difference ofobserved PE and expected PE)
So there is some explicit hierarchy in the prediction errors - but is that really what goes on, or is there an explicit estimation of a trend, not necessarily based on the prediction errors themselves? It is mathematically identical!
-
it is also possible that PEs may be used as a decisionvariable to guide decisions
The ability of PEs to directly influence decision making, and not just learning, goes above and beyond simple-RL
-
-
Local file Local file
-
this relationship is heterogeneous; of these 58 neurons, 31.03% (n= 18/58) showed a positive slope and 18.97 % (n= 11/58) showed a negative slop
Distance to prey is an important variable, and it is the actual code for time of impending reward - but it is not encoded by an overall rise in activity (typical fMRI analysis assumption!) It is encoded distributed over the neurons (perhaps RSA, but could still mask it if very single-neuron heavy?)
-
The encoding of the control-relevant variable becomes higher when the expected reward for controlling becomes higher!
-
- Jul 2020
-
-
Finally, some research in deep RL proposes to tackle explora-tion by sampling randomly in the space of hierarchical behaviors(Machado et al., 2017;Jinnai et al., 2020;Hansen et al., 2020).This induces a form of directed, temporally extended, randomexploration reminiscent of some animal foraging models (Viswa-nathan et al., 1999).
Sampling from hierarchical behaviours?
-
An example is prediction learning, inwhich the agent is trained to predict, on the basis of its currentsituation, what it will observe at future time steps (Wayne et al.,2018;Gelada et al., 2019).
This might be what can get confused for successor representation signal from dACC?
-
Song et al. (2017)trained a recurrent deep RL modelon a series of reward-based decision making tasks that havebeen studied in the neuroscience literature, reporting close cor-respondences between the activation patterns observed in thenetwork’s internal units and neurons in dorsolateral prefrontal,orbitofrontal, and parietal cortices (Figure 2C).
Relevant stuff!
-
-
Local file Local file
-
An important future goal is to create multiscale neural computational models that better predictmore complex real world behaviors
Is this something that we induce automatically with recurrence?
-
participants made decisionsbased on the instantaneous reward rate and the reward rate trend
Trend is captured at a higher level representation?
-
conflict betweenshort-term (safe options) and long-term (risky options) was mediated by the dorsal anterior cingulatecortex (dACC)
dACC signals their conflict - does it do so by strengthening the representations of the one it favours?
-
superimposition of computations at the shorter time scale (a trial) and the longer time scale (a blockof trials)
now, participants have to continuously evaluate which task they should be engaged in (which computations to perform?)
-
human memoryforaging
Memory retrieval / search = foraging?
-
one simple decision: when to leave a patch
The key inference to make in Multi Value Theorem
-
One research area that examined decisions of multiple time scales is foraging theory
Foraging == The study of the tradeoff between exploiting the current local habit / inertia VS finding a different niche?
-
miss the crucial resources that could be available if the animal maintains largerscale computations about the wider environment
This is basically an explore-exploit tradeoff, but now not over one uniform action space but explicitly emphasizing local//global environment?
-
summarized in the theories of hierarchical reinforcement learning
!! Oh yeah baby
-
shifts of anterior to posterior brain areas
General organizational principle: The more something is habit-formed, the more posterior it shifts? The more control it requires, the more frontal it is?
-
habit formation occurs when repetitive computations arestreamlined
Habits automatize local goals so agent can allocate processing to more complex global goals?
-
simplest form of multiscale processing,but it is ubiquitous.
Simple bias in favour of repeating previously rewarded action = simple operant conditioning?
-
stable or slowly changing environments
Requires less to no flexibility
-
effectively modulatebetween local tasks while also considering multiple global goals and contextual factors
Comes together nicely with a view on RL as central controller for 'homeostasis', or something like that. Would require highly hierarchical system of goals and representations.
-
t has been historically assumed that theinformation processing in each trial is independent from information processing from other trials,and that once one trial completes, all the information processing is reset
No inter-trial dependencies - but of course there are processing benefits / interferences and maybe trial-by-trial updates of response caution known as speed-accuracy tradeoff
-
many experimental paradigmsfocus on short spatial or temporal scales.
Also a problem for 'real' HRL or meta-learning?
-
area-restrictedsearch
Foraging strategy: Limit your attention to locations that were previously rewarding (?)
-
An overarching analogyis foraging
Foraging requires dynamic allocation and weighting of attention and evidence between multiple sources
-
the degree to which working memory is considered
A Colling & MJ Frank Paper: How much RL is actually WM?
-
The requirement to integrate information over spatial and temporal scales in a widevariety of environments would seem to be a common feature underlying intelligent systems, and onewhose performance has a profound impact on behavior [16–21]
Exactly: Generalization over representations and over temporal grain of behaviour
-
much of the progress made in the latest“AI spring” are, as we describe below, achievements of multiscale processing.
Generalization and broader tasks are the hallmark of the current success of AI
-
- Jun 2020
-
Local file Local file
-
Cells that report choice independently of taskshould lie on the diagonal (i.e., an angle ofp/4).Instead, the distribution of angles was signif-icantly bimodal across all cells
So the MFC has different cells specialized for different tasks (familiarity vs categorization)? Disappointing - hoped for generalized / remapping.
-
Choice decoding in the MFC was strongestshortly after stimulus onset, well before theresponse was made
So it is not the actual execution of an action which is the information that is picked up - It is the direction contextual state-space representation heads towards?
-
Decoding accuracy for choices was highestin the MFC
MFC is more task-relevant for response selection
-
In contrast, in theMFC (Fig. 3E, right), the relative positions ofthe four conditions were not preserved.
MFC seems to rely on different representations. Familiarity - category pairs are not perserved over tasks hence not really represented fundamentally. In the HPC it seems the determining factor (for Dim 1).
-
In the HA, the ability to decode categorywas not significantly different between thetwo tasks
HPC / Amyg encode stimulus category in memory & re-represent, regardless of task context. Part of stimulus representation in general.
-
In the MFC, decoding accuracy for imagecategory was significantly higher in the memorytask
MFC encodes task variable = stimulus category only during relevant context / task-set
-
MS cell responses reflected amemory process: they strengthened over blocksas memories became stronger
memory-selective identified cells fire stronger and stronger with repeated presentation of stimulus & identify false from true negative!
-
We first trained a decoder to discriminate tasktype on trials where the subject was instructedto reply with a button press, and then we testedthe performance of this decoder on trials wherethe subject was instructed to use saccades
Decoder should generalize task classification across response modalitied - does so in MFC (dACC and SMA)
-
. Cells showed significantmodulation of their firing rate during thebaseline period as a function of task type
So already from instruction there is a reconfiguration - very complicated mapping from linguistic input to representations for decision making...
-
Subjects indicated choices using either sac-cades (leftward or rightward eye movement)or button press while maintaining fixationat the center of the screen
Different response modalities allows for disambiguation of coding towards a very specific execution - if similar encoding it's really more cognitive/central!
-
We found that neuronal pop-ulations within the MFC formed two separatedecision axes
So movement through state space in unique but intra-task consistent direction?
-
MFC [dorsal anterior cingulatecortex (dACC
MFC = dACC = MCC
-
insensitive to response modality
So its really about the abstract task demands, not the concrete action output
-
The strength andgeometry of representations of familiaritywere task-insensitive in the HA but not in theMFC
This is what creates the 'shadowing' pattern?
-
whether an image was novel orfamiliar, or whether an image belonged to agiven visual category
recognition memory vs categorization: yes or no responses in both case. For 'pictures' so stimuli can be the same across tasks
-
phase-locking of MFCactivity to oscillations in the HA
HPC memory representations and dACC task set representations?
-
-
Local file Local file
-
pattern of conflict modulation during one correct response is 489 orthogonal to the pattern during another correct response
i.e. it is not a 'general boosting' effect -> only on average the activity of neurons can still increase, but it is all about upregulating the relevant neurons for this correct response
-
higher when Ericksen conflict was present (Figure 2A)
Yeah, in single neurons you can show the detection of general conflict this way, and it was not partitionable into different responses...
-
representational geometry
nice wording similar to RSA
-
with Ericksen conflict than it was for trials without Ericksen
what about simon?
This does mean: Conflict increases representation shifting response toward correct action!
-
AUC
This axis has more predictive power when there is conflict than when there is no conflict (task is already so easy that the information is not needed, or at least a lot less?)
-
G)
Very clear effect! suspicious? how exactly did they even select the pseudo-populations, its not clear exactly from the methods to me
-
amplification hypothesis, conversely, does not predict a unified conflict 341 detection axis in the population. Instead, it makes a prediction that is exactly contrary to 342 the epiphenomenal view: that conflict should shift population activity along task-variable 343 coding dimensions, but in the opposite direction. That is, conflict is predicted to amplify 344 task-relevant neural responses
conflict means more control will be exterted. Heavier representation of whatever info it is that dACC encodes that 'pushes' for the correct action. This function of dACC would be in line with the context layer!?
-
At the population level, then, the epiphenomenon hypothesis330 predicts that conflict should decrease the amount of information about the correct response 331 and shift neuronal population activity down along the axis in firing rate space that encodes 332 this response
Because less % of neurons 'fighting' for the correct response are active, at least in total.
-
Neurons that were tuned for a specific correct response were 298 often tuned to prefer the same Simon/Ericksen distractor response
DLPFC is tuned to action-outcomes? -> in single neurons!
-
In fact, the majority of conflict-sensitive 288 dACC neurons were not selective for either correct response or distractor responses (66.7%
So the conflict is represented separately, not having much to do with action-outcomes.
-
did still signal either Ericksen or Simon 277 conflict
Simply the C-term in the ANOVA which is a binary coder for the general presence? Would also have more trials where its parameter is influential, does that influence estimation?
-
neurons did not encode the distractor response
So on trials with a unique distractor response, that action-outcome was not represented at all? It's interesting but then where does the actual conflict take place?
-
significant 270 proportion of neurons were selective for the correct response
So desired action-outcome is represented. I think that was already known about dACC.
-
separate pools of 266 neurons corresponding to the two conflicting actions, and that conflict increases activity 267 because it uniquely activates both pools
more neurons activate for the different possible action outcomes = more activity overall --> conflict signal. Makes sense.
-
Furthermore, the population of cells whose responses were significantly 244 affected by Eriksen conflict was almost entirely non-overlapping with the population 245 significantly affected by Simon conflict (specifically, only one cell was significantly 246 modulated by both)
Really separate representations for different aspects of the current task-set?
-
additive model was a better fit to the data than other, more 205 flexible models
So separate statistical significance testing shows effect for Eriksen, not for Simon, but regression model shows through model comparison that it's best to ascribe to them the same effect...
-
(n=15/145) neurons had significantly different firing rates between Simon and no-196 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted March 15, 2020. . https://doi.org/10.1101/2020.03.14.991745doi: bioRxiv preprint
No significant main effect but more single cells had a significant effect...? -> also directionality is not all positive, some positive some negative
-
A small number of individual 187 neurons also had different activity levels on Eriksen conflict and no conflict trials (8.2%, 188 n=12/145 neurons, within-cell t-test)
Note the difference between 'averaged over all neurons' (first report) or 'within one specific neuron' (this report)
-
activity was higher on Ericksen conflict 185 trials than on no conflict trials
for Eriksen flankers there is a main effect of conflict (vs no-conflict). Simon was not statistically significant. Was it mainly a power issue?
-
Ericksen
So is it Ericksen or Eriksen??
-
12 task conditions
Here they acknowledge 12 task conditions, not 9.
-
Within each task condition 730 (combination of correct response and distractor response), firing rates from separately 731 recorded neurons were randomly drawn with replacement to create a pseudotrial firing rate 732 vector for that task condition, with each entry corresponding to the activity of one neuron 733 in that condition
Definition of pseudotrial
-
pseudotrial vector x
one trial for all different neurons in the current pseudopopulation matrix?
-
The separating hyperplane for each choice i is the vector (a) that satisfies: 770 771 772 773 Meaning that βi is a vector orthogonal to the separating hyperplane in neuron-774 dimensional space, along which position is proportional to the log odds of that correct 775 response: this is the the coding dimension for that correct response
Makes sense: If Beta is proportional to the log-odds of a correct response, a is the hyperplane that provides the best cutoff, which must be orthogonal. Multiplying two orthogonal vectors yields 0.
-
X is the trials by neurons pseudopopulation matrix of firing rates
So these pseudopopulations were random agglomerates of single neurons that were recorded, so many fits for random groups, and the best were kept?
-
re-representing high-750 dimensional neural activity in a small number of dimensions that correspond to variables 751 of interest in the data
Essentially this is kind of like constructing dissimilarity matrices over large groups of voxels?
-
4917.0 (1) 5826.5 (1)*
Additive model is the winner in single cell firing rates -> coding simply for the notion of conflict? cf. the population coding from dimensionality reduction!
-
Subtracting this expectation from the observed pattern 723 of activity left the residual activity that could not be explained by the linear co-activation 724 of task and distractor conditions
So this is what to analyze: If this still covaries with conflict in some way it means we go beyond epiphenomenal?
-
Within each neuron, 719 we calculated the expected firing rate for each task condition, marginalizing over 720 distractors, and for each distractor, marginalizing over tasks.
Distractor = specific stimulus / location (e.g. '1' or 'left')?
Task = conflict condition (e.g. Simon or Ericksen)?
-
condition-averaged within neurons (9 data points per 691 neuron, reflecting all combinations of the 3 correct response, 3 Ericksen distractors, and 3 692 Simon distractors)
How do all combinations of 3 responses lead to only 9 data points per neuron? 3x2x2 = 12.
-