On the Relationship between Working Memory and Musical Performance Under Delayed Auditory Feedback

by Daniel Xu, Gauri Ramsoekh, Oskar Kruse, & Yadav Permalloo
3185 words




           Data Analysis


           Musical Performance
           DAF Effect on Musical Performance
           Working Memory
           Working Memory and DAF Effect

Discussion and Conclusion


On the Relationship between Working Memory and Musical Performance Under Delayed Auditory Feedback

Daniel Xu, Gauri Ramsoekh, Oskar Kruse, & Yadav Permalloo

LSC217: Systematic Musicology

Word Count: 3185


Delayed auditory feedback (DAF) is a manipulation in which a short time delay is introduced between a person’s actions and the sound of that action reaching their ears. DAF impairs musical performance by providing incorrect feedback that disrupts sensorimotor integration. Previous research has shown that working memory facilitates sensorimotor integration of speech production by enhancing the perception of feedback errors whilst inhibiting compensatory vocal behaviour. This study investigates the relationship between working memory and sensorimotor integration of musical performance using a DAF paradigm. We hypothesise that greater working memory capacity allows for better compensation for the negative effects of DAF on musical performance. Participants’ working memory and DAF effect on piano performance were measured separately. We found that the musical performance of participants with better working memory was less affected by DAF. Our results are in keeping with previous literature that suggest a top-down influence from working memory on sensorimotor integration.

            Keywords: delayed auditory feedback, sensorimotor integration, N-back task, working memory, musical performance, systematic musicology
            Delayed auditory feedback (DAF) is a manipulation in which a short time delay is introduced between a person’s actions and the sound of that action reaching their ears. Many studies have shown that DAF disrupts human speech (e.g. Lee, 1950; Zimmerman et al., 1988) and musical performance (e.g. Finney, 1997; Gates & Bradshaw, 1974). With regards to speech production, the impairments demonstrated under DAF include decreased speech rate, increased loudness, repetitions, prolongations, and other dysfluencies. The impairment from DAF to musical performance is qualitatively similar, such as decreased rate, note repetitions, prolongations, and insertions. For instance, Gates and Bradshaw (1974) asked participants to learn a piece of music on an electronic organ, and then asked participants to perform the piece as quickly as possible. Performance was significantly slower under 180 ms DAF compared to normal feedback conditions. A similar impairment was also found when feedback was a completely different musical piece to the participant’s performance.

            Impairment under DAF (and other altered auditory feedback paradigms) has been interpreted as evidence for a sensory guidance hypothesis, i.e., disruption of sensorimotor integration (e.g. Finney & Warren, 2002; Havlicek, 1968; Lee, 1950). Musical performance is a complex sensorimotor task that on most instruments involves a sequenced execution of finger movements, which in turn produces an intended sequence of sounds. The sound output and motor plan are coordinated, with the auditory feedback serving as a continuous guide to ongoing musical performance (Zatorre et al., 2007). Put simply, the predicted consequence of a motor command is compared to the actual sensory feedback; incongruences between prediction and feedback lead to motor commands that attempt to correct for this mismatch. Impairment under DAF can thus be interpreted as corrective motor commands to prediction-feedback mismatch – but in the case of DAF, the feedback itself is incorrect.

            The present study investigates the executive functions that might modulate sensorimotor integration in musical performance, and in particular, whether working memory might have a role in musical performance under DAF. Working memory is a limited-capacity brain system responsible for the temporary storage and manipulation of information for ongoing tasks (Baddeley & Hitch, 1974). There exists already a body of literature suggesting that working memory exerts a top-down influence on sensorimotor integration during speech production. Li et al. (2015) trained the working memory of healthy participants and measured their cortical event related potential (ERP) responses to pitch-shifted auditory feedback; ERPs known as the N1-P2 complex were modulated by prediction-feedback mismatches. Their results suggest that participants who received working memory training exhibited better detection and correction of feedback errors. In a later study, the same group replicated their earlier findings, and further found that employment of working memory resources decreased the participant’s ability to detect and correct for altered feedback (Guo et al., 2017). The authors suggest that working memory exerts top-down modulations on sensorimotor integration for speech control by facilitating the processing of feedback errors, including a role in inhibiting the compensatory adjustments of incorrect auditory feedback.

            To the best of the authors’ knowledge, such a relationship between working memory and sensorimotor integration of musical performance has yet to be established. Nevertheless, a link between working memory and musical performance has been demonstrated previously. For instance, Maes et al. (2015) found that engagement of working memory resources impaired the rhythmicity of cellists’ bow strokes. Meinz and Hambrick (2010) found that working memory capacity was a better predictor of piano sight-reading ability than years of experience or weekly hours of practice. However, these studies only indicate a link between working memory and musical performance in general, and not sensorimotor integration during musical performance specifically.

            Perhaps more presciently, there is an overlap between the neuroarchitecture engaged in tonal working memory and that engaged during musical performance under altered auditory feedback. In participants undertaking pitch memory tasks, functional imaging found increased activation in the supramarginal gyrus, superior and inferior parietal lobules (IPL, SPL), area Sylvian-parietal-temporal (Spt), insula, superior temporal gyrus (STG), inferior frontal gyrus (IFG), premotor cortex, supplementary motor area (SMA), and Broca’s area (Gaab et al., 2003; Zatorre et al., 1994). Pfordresher et al. (2014) used fMRI to measure brain activity in pianists playing under normal and altered (DAF and pitch-shifted) auditory feedback. Altered auditory feedback was associated with increased activity across similar regions of frontal (IFG, SMA, Broca’s area), posterior parietal (IPL, Spt), and superior temporal (STG) cortical regions, as well as the insula and cerebellum. In other words, it appears that both working memory and musical performance under DAF draw upon similar neural networks.

            The present study seeks to further demonstrate a connection between working memory and sensorimotor integration in musical performance. We separately measured the working memory capacity and piano performance under DAF of healthy participants. Based on the existing literature, we hypothesised that participants with greater working memory capacity will also be better able to compensate for the deleterious effect of DAF on musical performance.



            Twenty-three (9 male, 14 female) participants were recruited from the student community of Erasmus University College. Mean age was 21.87 (SD = 4.07). Sixteen participants reported playing an instrument; 12 reported piano as their primary instrument (mean years of instruction = 6.07; SD = 3.28). Twenty participants were right handed, and none reported any hearing impairment.


Musical Performance Task

            This part of the experiment involved multiple performances of an excerpt of music composed by one of the authors (DX), shown in Figure 1. The excerpt contained 22 notes and was designed to be easy to learn and repeatable by participants without prior musical instruction. The excerpt was a single line melody with no dynamic or expressive instructions, performed by the right hand only without requiring any changes to hand position. Directions for fingering were indicated under the musical notation, and the keyboard keys were marked correspondingly.

Figure 1
Musical Excerpt

            Participants performed on an M-Audio Oxygen 25 keyboard. Auditory output from the keyboard was delayed using a Black Arts delay device, and amplified with an Onyx Black-Jack USB recording interface. Participants heard auditory feedback through Bose QC-15 noise cancelling headphones at a listening level considered comfortable by the participant. Keypress responses and auditory output were recorded by a MacBook Pro 13’ 2018 laptop using the Ableton digital audio workstation software.

Working Memory Task

            Working memory capacity was measured using an N-back (2-back) task, available at psytoolkit.org. The N-back task is popular amongst researchers and easily administrable, with strong face validity (Owen et al., 2005). The task asks participants to determine whether the currently presented stimulus is the same as the stimulus that was presented N (in our case, 2) items previously. The stimulus set of the N-back task consisted of 15 alphanumeric letters; each stimulus was presented maximally for 760 ms, with an intertrial interval of 2000 ms.

Figure 2
N-Back Task example where the current presented stimulus is the same as the stimulus N(2) items previously.

Note. Created with PsyToolkit. http://psytoolkit.org/experiment-library/experiment_nback2.html

            Participants performed the N-back task on a MacBook Pro 13’ 2018 laptop and were asked to indicate a positive response with a touchpad tap. Auditory feedback (‘good’ or ‘bad’) was provided after each tap through headphones. Participant instructions and a practice block were built into the psytoolkit programme.


            Participants were randomised as to whether they performed the music task or working memory task first; there was a 5-minute rest in-between tasks to minimise fatigue effects.

Musical Performance Task

            To familiarise the participant with the music, each participant practiced the musical excerpt under normal feedback conditions until they could play it comfortably and accurately, i.e. they felt comfortable playing at uniform tempo and without error.

            For the experimental trials, participants were instructed to play at a uniform tempo without expressive variation; they were instructed to keep going if they made a mistake or speed up if they slowed down. A metronome speed was chosen that approached the maximum comfortable play speed of the participant; the same speed was used for all trials of that participant to provide a consistent reference tempo. The metronome played for eight beats prior to each trial; the metronome stopped and participants were instructed to start playing on the following beat whilst maintaining this tempo. A metronome playing throughout the trial was considered inappropriate due to its disorientating effect in the delay conditions.

            To generate a base performance for comparison, each participant played one trial under normal (no delay) feedback conditions. Participants then performed one trial each of three feedback conditions: 150 ms, 250 ms, and 350 ms delay. The order of feedback delay for each participant was randomised.

Working Memory Task

            Our WM task was provided by psytoolkit.org which includes participant instructions. Briefly, participants were informed that they would see a sequence of letters and to respond with a mouse click if they saw the same letter two trials ago. There was a practice block of 25 trials to familiarise participants with the task, followed by two experimental blocks of 25 trials each. A full overview of the procedure is detailed on their website.

Data Analysis

            Data were managed in Excel (Microsoft) and analysed using Python and SPSS (IBM). Working memory capacity was quantified using two measures from the N-back task: mean reaction time (milliseconds) and accuracy (percentage correct); participants made a mistake when they failed to respond to a match, or responded to a non-match. Faster reaction time and higher accuracy are both indicative of greater working memory capacity (Owen et al., 2005).

            We introduce a novel method of quantifying musical performance. Previous studies on musical performance under DAF have tended to quantify musical performance by counting discrete error events, e.g. the number of note deletions, additions, or substitutions (Finney, 1997), or by measuring the total time taken to play the musical excerpt (e.g. Gates & Bradshaw, 1974). The former method does not account for participants slowing down which would reduce the number of note errors, and the latter method does not account for fast but erroneous playing. In the present study, we quantify musical performance by generating note-time maps from the MIDI file of each performance, which we compared against note-time maps of perfect performance at the participant’s reference tempo. Musical performance was quantified as the percentage time of non-concordance between the two mappings, or put more simply, the percentage time of erroneous playing (%TEP). In contrast to previously used methods, %TEP generates a single continuous measure of musical performance that accounts for all error types. Table 1 demonstrates this procedure using simple fictional values.

Table 1
Quantification of Musical Performance

Perfect note-time map at participant reference tempoParticipant note-time mapDifference (ms)
NoteDuration (ms)NoteDuration (ms)
Percentage time of erroneous playing (%TEP) = 1500/4500 = 33%


            Table 2 presents a summary of musical performance and working memory measures for all participants.

Table 2
Summary of Musical Performance and Working Memory Measures
            M            SD
Musical performance (%TEP)
Normal feedback        13.40            7.19
150 ms DAF        17.34            9.84
250 ms DAF        22.14            9.29
350 ms DAF        19.25            8.09
Working memory
Reaction time (ms)        700.32            22.96
Accuracy (% correct)            88                7

Musical Performance

            %TEP was lowest for normal auditory feedback, reaching a peak at 250 ms delay. Given that our novel method yields a similar relationship between musical performance and delay interval as Gates et al.’s seminal paper (1974), we suggest that %TEP has at least strong face validity as a measure of musical performance. Independent samples t-tests found was no significant effect on musical performance under any feedback condition from sex (ps > .25), dominant hand (ps > .5), or ability to read music (ps > .25), though as expected, musical performance was significantly better under all feedback conditions for pianists than for non-pianists (ps < .05). Nevertheless, the improvement by pianists was consistent – between 7.52 to 8.16 %TEP – across all feedback conditions; in other words, piano playing did not exhibit an interaction with feedback condition (see Figure 2).

Figure 3
Musical Performance for each Feedback Condition

DAF Effect on Musical Performance

            Our interest is the effect of DAF on musical performance, and not musical performance under DAF per se. Thus, we quantify DAF effect as the %TEP at no delay subtracted from the %TEP at 250 ms DAF since this was the maximal impairment interval.

            The mean DAF effect was 8.74 %TEP (SD = 7.09). There was no significant effect of sex ( Mmale = 9.02, Mfemale = 8.56, t(21) = -0.15, p = .884), ability to read music (Myes = 9.02, Mno = 8.53, t(21) = -0.16, p = .875), piano playing ( Myes = 8.74, Mno = 8.75, t(21) = 0.01, p = .996), or dominant hand (Mright = 8.19, Mleft = 12.40, t(2.11) = 0.51, p = .657). Moreover, there was no significant correlation between DAF effect and years of musical instruction (r(21) = -.12, p = .596) or reference tempo (r(21) = -.15, p = .507)

Working Memory

            Mean N-back reaction time for all participants was 700.32 ms (SD = 22.96) and accuracy was 88% (SD = 7). Interestingly, reaction time was significantly faster for women than for men (Mfemale = 691.92, Mmale = 713.39, t(17.96) = -2.87, p = .010), though men were more accurate but not significantly so ( Mfemale = 86.29, Mmale = 91.33, t(21) = -1.86, p = .077). There was no effect on either reaction time or accuracy from playing piano, being able to read music, or dominant hand (ps > .1). Moreover, there was no significant correlation between either measures and years of musical instruction (|rs| < 0.15, ps > .5).

Working Memory and DAF Effect

               The main concern of this study is the relationship between working memory and DAF effect on musical performance; we hypothesised that participants with better working memory would be less affected by DAF. Figures 3 and 4 present the scatterplots for DAF effect against N-back accuracy and reaction time. We observed a moderate negative correlation between DAF effect and N-back accuracy (r(21) = -.36, p = .044), and a weak positive correlation with N-back reaction time (r(21) = .14, p = .260) (ps, one-tailed). Although correlation with reaction time was not significant, taken as a whole these results suggest that better working memory is correlated with the ability to overcome the effect of DAF.

Figure 4
Scatterplot of DAF Effect against N-back Accuracy

Figure 5
Scatterplot of DAF Effect against N-back Reaction Time

            Finally, we performed a multiple linear regression to determine if working memory or any other variables were predictors of DAF effect. Sex, ability to read music, piano playing, years of musical instruction, reference tempo, N-back accuracy, and N-back reaction time were entered into the model. Although the model did not reach significance (F(7,15) = 1.29, p = .33), it did generate an R2 = .37. Interestingly, of all the entered variables, only N-back task accuracy reached significance with p = .029, and it also had the biggest effect size, with standardised β = -.57 (see Table 3). No other entered variables were close to significance (ps > .15).

Table 3
Predictors of DAF Effect

                β               p
Sex              0.24            .380
Ability to read music              0.47            .166
Piano playing              -0.16            .683
Years of musical instruction              -0.40            .269
Performance tempo              -0.33            .161
N-back accuracy              -0.57            .029
N-back reaction time              0.34            .210

Note. Standardised β.

Discussion and Conclusion

            We hypothesised that participants with better working memory are less affected by delayed auditory feedback during musical performance. Our results tentatively support our hypothesis. A moderate negative correlation was found between DAF effect and N-back task accuracy which was significant at p = .044. Moreover, a multiple linear regression with DAF effect as the dependent variable found that N-back task accuracy was the predictor with the greatest effect size and the only predictor to reach statistical significance with p = .029, although the model itself was not significant. N-back reaction time, by contrast, did not appear to exhibit a relationship with DAF effect, nor did any of our other recorded variables (sex, dominant hand, ability to read music, piano playing, years of musical instruction, reference tempo).

            Although our results certainly point towards a relationship between working memory and DAF effect, it is disappointing that many of our main findings did not reach significance at p < .05. Owing to the Covid-19 pandemic of 2020, we had access to a limited participant pool, resulting in N = 23, of which only half were piano players. To the best of our knowledge, all prior studies utilising a DAF-piano paradigm recruited only experienced piano players, for instance “students majoring in instrumental music education” (Havlicek, 1968, p. 311). Although we attempted to design a musical performance task suitable for complete novices, we cannot rule out a learning effect that may have interacted with our experiment, nor can we rule out a three-way interaction between DAF effect, prior piano playing ability, and working memory. Indeed, as a post hoc analysis, we ran a correlation between DAF effect and N-back accuracy separately for pianists and non-pianists. Both the correlation coefficient and significance level were weaker for non-pianists than for pianists (rpianists = -.46, rnon-pianists = -.21; ppianists = .068, pnon-pianists = .264), and it is regrettable that more pianists were not available for recruitment to this study. Moreover, the inclusion of both pianists and non-pianists meant that performance tempo was highly variable, ranging from 60 to 110 beats per minute. Finney and Warren (2002) found that the maximal impairment delay is dependent on performance rate; however, the present study takes 250 ms to be the peak impairment interval for all participants.

            It is also interesting to note that DAF effect correlated significantly with N-back accuracy but not with N-back reaction time, despite both purporting to be measures of working memory. Although researchers have been using the N-back test extensively as a working memory paradigm since the 1960s, more recent validation studies have cast doubt on the N-back test as a measure of individual differences in working memory (Jaeggi et al., 2010).

            Our results are in line with previous research indicating a relationship between working memory and sensorimotor integration of speech production (e.g. Guo et al., 2017; Li et al., 2015). However, ours is the first study to investigate working memory and sensorimotor integration with a DAF-musical performance paradigm. The results appear to support the model proposed by Guo et al. (2017) whereby working memory exerts a top-down influence on sensorimotor integration, possibly by facilitating the processing of feedback errors and inhibiting compensatory adjustments to altered feedback. That being said, it is not a given that the mechanisms and pathways of sensorimotor integration of speech production correspond to that of musical performance. For one, musical performance also relies on visual and tactile feedback, although the available research does indicate that the auditory system is dominant in this regard (Comstock et al., 2018). Nevertheless, the interactions between auditory, visual, and tactile systems on the sensorimotor integration of musical performance is underexplored, and further research in this area is needed to answer these questions.