Multi-Step Decision-Making Task

Technical Manual

Script Author: Katja Borchert, Ph.D. (katjab@millisecond.com), Millisecond

Created: January 30, 2021

Last Modified: January 09, 2025 by K. Borchert (katjab@millisecond.com), Millisecond

Script Copyright © Millisecond Software, LLC

Background

The Multi-Step Decision-Making Task (MSDM) task (e.g. Huys et al, 2012) investigates people's decision making of multi-step problems within a reinforcement paradigm.

This script implements Millisecond's version of the MSDM task. The implemented Inquisit procedure is based on Faulkner et al (2021) and is Millisecond's best guess effort of the implemented trainings and tests.

References

Faulkner P, Huys QJM, Renz D, Eshel N, Pilling S, Dayan P, Roiser JP. A comparison of 'pruning' during multi-step planning in depressed and healthy individuals. Psychol Med. 2021 Mar 12:1-9. doi: 10.1017/S0033291721000799. Epub ahead of print. PMID: 33706833.

Huys QJM, Eshel N, O’Nions E, Sheridan L, Dayan P & Roiser JP (2012). Bonsai Trees in your head: How the Pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8(3), 1-13.

Duration

45 minutes

Description

The MSDM task is a sequential decision making task that requires participants to employ a 'plan ahead' strategy.

Specifically, participants are given a transition matrix with 6 states in which to move around. Some of the moves result in gains and some in losses. The 6 states are interconnected in the following way - each state can be reached by only 2 other states Example: state1 can be reached from state5 and state6 only - once on a state you can move to only two other states using the response keys [U] and [I]. Example: from state1 you can reach state2 (using [U]) and state4 (using [I])

Participants work through 2 training sessions:

Training I: - learn how to navigate the transition matrix using the keys [U] and [I]

Training II: - learn how rewards and losses are connected to each transition

Test: - participants have to plan moves (2-8) around the transition matrix to maximize gains

Procedure

(I) Training Session I:

A. Introduction of the environment and the 6 states
-> test of state knowledge ('which state is this?')

B. Introduction of the transition matrix with explicit images of the interconnected states
and pathways (see Faulkner et al, 2021).
The implemented trainings and tests are not originals. They are Millisecond's best
guess efforts.
-> implemented Training: move around the transition matrix using keys [I] and [U].
-> implemented Test1: given a start state and a particular key (I or U), which state do you end up on?
-> implemented Test2: given a start state and a particular end state, how could you get there
(enter the appropriate key response sequences)?
can enter any way, not just the fastest

(II) Training Session II:

A. Introduction of the reward states associated with moving from state to state.
The implemented trainings and tests are not originals. They are Millisecond's best
guess efforts.
-> implemented Training: As done by Faulkner et al (2021), participants are given hints in the form of (+) and (-)
under each state but have to learn by trial and error which move (the U or the I move)
is associated with which reward (loss).
-> implemented Test: given a start state and a particular key (I or U), enter the associated gain (loss)

(III) Test: 48 trials total
For the test, participants are told that the goal is to maximize the gains given
a particular start state AND a particular number of moves (2, 4, 6, 8) that need to be made
The Test runs 2 different types of test trials:
a. 'Immediate': participants are given gain/loss-feedback after each U/I response
b. 'Delayed' (plan-ahead): participants enter the planned complete movement sequence (e.g. 'UUII').
Once the movement sequence is entered, the program shows the selected transitions and provides
associated gain/loss-feedback to the participants.

A. 24 trials (12 'immediate' and 12 'delayed' trials, order randomized)
- immediate and delayed trials run 3 repetitions of the number of moves (2, 4, 6, 8) each
- start states are selected randomly without replacement for each trial
this first block of 24 trials is still considered training by Faulkner et al (2021)

B. 24 trials (12 'immediate' and 12 'delayed' trials, order randomized)
- immediate and delayed trials run 3 repetitions of the number of moves (2, 4, 6, 8) each
- start states are selected randomly without replacement for each trial


Faulkner et al (2021) report that they tested 2-8 number of moves.
This script uses the even number of 4 possible number of moves (2, 4, 6, 8) as this number works nicely
with the number 12 (trials per delayed/immediate condition), 24 (number of trials per block) and
48 (total number of trials run).



The script determines for each trial
- the maximum gain possible
- all possible best move sequences that would lead to this maximum gain
- the possible best move sequences WITHOUT any extreme loss paths (--)
- the number of possible best move sequences
- the number of possible best move sequences that contain at least one extreme loss path (--)
- the number of possible best move sequences that do NOT contain any extreme loss path (--)

for each test move sequence made, the script determines:
- the obtained gain
- whether the selected moves contained at least one extreme loss path (--)
- whether the selected moves were optimal

Stimuli

provided by Millisecond

Instructions

provided by Millisecond

Summary Data

File Name: msdmSummary*.iqdat

Data Fields

NameDescription
inquisit.version Inquisit version number
computer.platform Device platform: win | mac |ios | android
computer.touch 0 = device has no touchscreen capabilities; 1 = device has touchscreen capabilities
computer.hasKeyboard 0 = no external keyboard detected; 1 = external keyboard detected
startDate Date the session was run
startTime Time the session was run
subjectId Participant ID
groupId Group number
sessionId Session number
elapsedTime Session duration in ms
completed 0 = Test was not completed
1 = Test was completed
accStatesTest Proportion correct responses during the states test
accTraining1Test1 Proportion correct during training1 (test1)
accTraining1Test2 Proportion correct during training1 (test1)
accTraining2Test Proportion correct during training2 (test)

Raw Data

File Name: msdmRaw*.iqdat

Data Fields

NameDescription
build Inquisit version number
computer.platform Device platform: win | mac |ios | android
computer.touch 0 = device has no touchscreen capabilities; 1 = device has touchscreen capabilities
computer.hasKeyboard 0 = no external keyboard detected; 1 = external keyboard detected
date Date the session was run
time Time the session was run
subject Participant ID
group Group number
session Session number
blockcode The name the current block (built-in Inquisit variable)
blocknum The number of the current block (built-in Inquisit variable)
trialcode The name of the currently recorded trial (built-in Inquisit variable)
trialnum The number of the currently recorded trial (built-in Inquisit variable)
trialnum is a built-in Inquisit variable; it counts all trials run
even those that do not store data to the data file.
runCounter Only for training sessions: keeps track of repeated training(trainingTest) sessions
testCounter Only for final test session: keeps track of number of test blocks run
trialCounter Counts the number of trials in a given block
startState The start state (1-6), from which to start moving around (used during all trainings and tests)
nextState The next state that a move results into (if applicable)
endState The end state that should be reached (used in 'training1Test2' only)
Test Only
numberMoves The number of moves that need to be made during the test
maxGain The maximum gain that could be collected given the start state and the number of moves
selectedGain The gains collected through the sequence of selected moves
bestMoves A variable that stores the sequence of I and U moves associated with maxGain
selectedMoves Stores the sequence of I and U responses made by participant
bestStateSequences A variable that stores the sequential orders of the visited states associated with maxGain given the current start state (contains startState!)
selectedStateSequence Stores the sequence of states that participants moved through (contains startState!)
bestStateSequenceSelected 1 (true): the selectedStateSequence is one of the bestStateSequences
0 (false): the selectedStateSequence is not one of the optimal ones
response Built-in response variable: the participant's response (scancode of response buttons)
responseText Built-in response variable: the letter of the pressed response key
resp Custom response variable
correctResponse The established correct response for a given trial (if applicable)
correct Accuracy of response: 1 = correct response; 0 = otherwise (if applicable)
latency The response latency (in ms); measured from: onset of trial
win Reward structure training: shows the gain/loss given a particular move during training2
key Used during training1 (transition training): stores the key that participant is asked to press

Parameters

The procedure can be adjusted by setting the following parameters.

NameDescriptionDefault
Transitions With Key U/I
state1MoveU Move from state1 with key U goes to state 22
state1MoveI Move from state1 with key I goes to state 44
state2MoveU Move from state2 with key U goes to state 33
state2MoveI Move from state2 with key I goes to state 55
state3MoveU Move from state3 with key U goes to state 66
state3MoveI Move from state3 with key I goes to state 44
state4MoveU Move from state4 with key U goes to state 22
state4MoveI Move from state4 with key I goes to state 55
state5MoveU Move from state5 with key U goes to state 11
state5MoveI Move from state5 with key I goes to state 66
state6MoveU Move from state6 with key U goes to state 33
state6MoveI Move from state6 with key I goes to state 11
Reward Structure Associated With The Moves U/I
rewardStructure1U Pressing left on state1 (currently: goes to state2) gains/loses this amount140
rewardStructure1I Pressing right on state1 (currently: goes to state4) gains/loses this amount20
rewardStructure2U Pressing left on state2 (currently: goes to state3) gains/loses this amount-20
rewardStructure2I Pressing right on state2 (currently: goes to state5) gains/loses this amount-140
rewardStructure3U Pressing left on state3 (currently: goes to state6) gains/loses this amount-140
rewardStructure3I Pressing right on state3 (currently: goes to state4) gains/loses this amount-20
rewardStructure4U Pressing left on state4 (currently: goes to state2) gains/loses this amount20
rewardStructure4I Pressing right on state4 (currently: goes to state5) gains/loses this amount-20
rewardStructure5U Pressing left on state5 (currently: goes to state1) gains/loses this amount-140
rewardStructure5I Pressing right on state5 (currently: goes to state6) gains/loses this amount-20
rewardStructure6U Pressing left on state6 (currently: goes to state3) gains/loses this amount20
rewardStructure6I Pressing right on state6 (currently: goes to state1) gains/loses this amount-20
maxLoss The max/extreme loss (--) that is associated with any of the transitions-140
statesTestTrials The number of states test trials to run12
minAccStatesTest The minimum proportion correct that is considered successful to pass the states test (block.msdmTrainingStatesTest)0.83
maxStatesTestRuns The maximum number of block.msdmTrainingStatesTest run (if no success after the last run, the script is prematurel terminated)2
maxTrainingRuns The maximum number of times that trainings are repeated
if participant still fails the test for the trainings after 2 training runs, the script ends prematurely
2
Training1: Learning The Transitionmatrix
numberTraining1Trials The number of training1 trials run by block.msdmTraining150
numberTraining1Test1Trials The number of trials run by block.msdmTraining1Test124
minAccTraining1Test1 The minimum proportion correct that is considered successful to pass msdmTraining1Test10.83
maxTraining1Test1Runs The maximum number of times the traininr1Test1 block is repeated before checking whether training1 needs to be repeated5
numberTraining1Test2Trials The number of trials run by block.msdmTraining1Test224
minAccTraining1Test2 The minimum proportion correct that is considered successful to pass msdmTraining1Test20.83
maxTraining1Test2Runs The maximum number of times the training1Test2 block is repeated before checking whether training1 needs to be repeated5
Training2: Learning The Rewardstructure Of The Transiationmatrix
numberTraining2Trials The number of training2 trials run by block.msdmTraining224
numberTraining2TestTrials The number of trials run by block.msdmTraining2Test24
minAccTraining2Test The minimum proportion correct that is considered successful to pass msdmTraining2Test0.83
maxTraining2Test2Runs The maximum number of times the training2Test block is repeated before checking whether training2 should be repeated5
Test
numberTestBlocks The number of test blocks (each running 24 trials: 12 immediate moves trials; 12 plan-ahead trials) run
the first test block run was not analyzed by Faulkner et al (2021)
2
firstTestBlockCountsAsTraining True: the first test block run counts as training
the first test block run was not analyzed by Faulkner et al (2021)
false: all test blocks count as actual test blocks
true
Response Keys: If Changed, All Images Have To Be Updated
leftKey "U"
rightKey "I"