Royal Holloway Graduate Music Research
Internet Journal written and edited by students of the Department of Music

Attention Whilst Sight-Reading Music


Sight-reading music is one task that requires much cognitive effort, from the initial visual processing of a score to the controlled motor action, via any higher-level interpretative cognition and additional language processing. Despite the degree of this cognitive effort, performers are still able to shift their attention to other non-musical events when sight-reading music.

The psychological study of attention has its roots in the early twentieth century, with the work of the American William James (1890) and his introspective accounts of attention.

Every one knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others, and is a condition which has a real opposite in the confused, dazed, scatterbrained state which in French is called distraction, and Zerstreutheit in German. (1890: 404, italics added)
This scholarship was advanced in the 1950s, with a number of studies that looked at attention from an early information-processing viewpoint. Cherry (1953) proposed the notion of the cocktail party effect, in which people are unable to engage in more than one ongoing conversation, yet automatically switch their attention to the sound of a salient word, such as their name. This effect presented problems for the early models of attention that were based on the then prevalent serial processing of visual and auditory inputs (allowing, for instance, for a single conversation to be followed to the exclusion of others). Furthermore, the effect had implications for the cognitive capacity of the brain, and raised questions as to how many conversations (or visual inputs) could be processed even subconsciously.

Broadbent (1958) engaged with these issues in his filter model of attention, which was a juxtaposition of initial parallel processing and later serial processing. Broadbents model consists of a filter component that scans all incoming signals for salient information. This salient information may be the continuation of the current conversation (most probably defined by contextual linguistic and extra-linguistic information), or other keywords, which are rated higher than all other lexemes, such as ones name or the word fire. Once the filter has selected the appropriate input channel, the data pass through further general (linguistic) processing paths that allow conscious awareness of and response to them.

Though a logically valid model of attention, Broadbents model has a number of associated problems. The main problem is that the filter component must have some linguistic ability, in order to detect keywords spoken by people with completely different accents and levels of speech, which presents a problem for the efficiency of the model in that it would require linguistic processing to occur more than once, and to differing degrees. Contrarily, it may be noted that the alleged Echelon system of today bears a striking similarity to this model. The Echelon system uses vast computing power to process data from a large number of inputs, combing it for keywords useful to the intelligence services, suspect ones then being passed to human operators for further linguistic and intelligence analysis. The other problem is that of control: the filter system, with its simplistic language processing capabilities, is granted control of the data, but there may however be instances when the latter, higher linguistic processes will be required to feed back to the filter. Such an instance might be a particularly emotional conversation that must take priority over all other input (the equivalent of Echelons administrators increasing the weighting on detection of Osama Bin Ladens voice, for example), yet Broadbent does not include such interactionist features in his serial model.

Whilst Broadbents model held a paradigmatic position during a period of normal science that lasted for over a decade, a crisis and an apparent paradigm shift started to take hold in attention research during the early 1970s, and a number of criticisms were levelled at the 1958 model. One major challenge was the pointedly subtitled work of Allport, Antonis and Reynolds (1972). In their work, Allport et al. carried out two experiments, and they found that participants could carry out auditory shadowing (the task of immediately repeating auditory stimuli) concurrent with memorisation of pictures.

Allport et al.s second experiment again used shadowing; however, participants were required simultaneously to sight-read piano music. This experiment is therefore of particular interest to the present article. Participants took part in two experimental sessions, and were required to sight-read either easy (ABRSM Grade II standard) music or hard (ABRSM Grade IV standard) music. They shadowed easy (humorous) poetry, or hard (Norse) poetry. Whilst errors arose on the hard music-hard poetry condition, they were significantly reduced in the second session. Allport et al. regarded this as disproof of the single channel hypothesis.

The 1980s witnessed a period of chaos in the area of attention scholarship, as research led to increased doubts as to the everyday validity of Broadbents model. Attention had been closely related to short-term memory, in that input is filtered, processed and retained in a short-term store, if one points to the multi-store model of memory (Atkinson and Schriffin, 1968). One model of short-term memory, which has had implications for the study of attention, is the model of working memory (WM) of Baddeley (1981, 1992). The link between attention and memory is made explicit in that the control processes of Baddeleys 1981 model have been implicated in attentional disorders.

The result of Baddeleys work was the development of a model of working memory that consists of three components, a phonological loop, a visuospatial sketchpad, and a central executive. The two former components are dedicated to processing input that is either auditory (more specifically, speech-based) or visual, and/or has spatial dimensions. The latter component, the central executive, plays the role of controlling the information flow; it is modality-independent. If one of the two slave systems becomes overloaded, the central executive can combine with that system and increase the available processing capacity of the overall system. The validity of a general processor is questionable, yet the model has survived the scrutiny of empirical study.

The distinction between visual and auditory input provides one explanation for Allport et al.s results. Sight-reading may be assumed to require primarily visual processing, with motor output. However, short chunks of auditory feedback may be used to add consistency, shape and other interpretive aspects to the performance. The dominance of the visuospatial aspects of the task of sight-reading can be compared to the auditory nature of the shadowing task, hence the two slave systems of Baddeleys model of working memory are unlikely to become overloaded.

Cole (2002) attempted to develop Allport et al.s work by asking participants to carry out concurrent tasks, dominated by visuospatial processing. Participants were directed to focus on sight-reading piano music (specifically, composed pieces of ABRSM Grade IV standard) to the best of their ability. Concurrently, they were presented with coloured shapes, one at a time, as visual stimuli. Five experimental groups were asked to respond to all the circles, every square, every red shape, every blue shape, or all the stimuli regardless of colour or shape. It was found that participants responding to all the stimuli had a significantly slower response time than those in the colour (red or blue) or form (circle or square) conditions. However, the deficit was explained by the differential between the tempo of the piece and the rate of presentation of the stimuli, rather than factors related to attention. It appears that by Grade VIII level (the standard of Coles participants), reading relatively easy pieces whilst carrying out concurrent visuospatial tasks lies within the limits of the attentional system.

Interestingly, the striking result arose that there was no difference in the number of errors in performance whether sight-reading a piece or performing one that had been practised for fifteen minutes beforehand under the controlled conditions of the experiment. Whilst methodological problems may have caused this result, the possibility remains that the attentional system had in fact reached its limit, yet the simplicity of the pieces led to there being few outward signs in the quality of the performance.

Other research into sight-reading has pointed to the automatic processes that take place in sight-readers of different levels. An experiment described by Sloboda in his 1985 book, The Musical Mind, places sight-readers into two distinct categories. Poor sight-readers read pieces one note at a time, hence they tend not to make low-level errors in pitch – rather, their interpretation and longer-term phrasing, expression and structure may be less developed. In contrast, good sight-readers read patterns of grouped notes, hence are more likely to make errors where there are differences between the expected pitch and actual pitch, yet their higher-level abilities will be well developed. Gillett (2002) tried to replicate Slobodas results, and arrived at the surprise finding that there are in fact three levels of sight-reading ability. Participants in Gilletts study fell into groups comparable to the poor and good sight-readers of Sloboda. However, a third (minority) category of expert sight-readers – a single participant in Gilletts study – had moved beyond pattern recognition, and utilised the reading of individual notes whilst also retaining their contextual properties in relation to structure, phrasing and expression. The expert participant in Gilletts study regularly sight-read pieces and played much modern music in which set inter-piece patterns may be less likely to occur (as a result of, for example, the use of atonality and emphasis upon complex rhythm).

One shortcoming of a number of psychological experiments is their limited validity outside of the laboratory. Memory is one area to which this issue is particularly relevant, as people are rarely required (for example) to remember strings of digits, count them backwards from 100 in threes, and then recite them! Attention outside of the laboratory is also subject to many more input variables than the carefully controlled conditions necessary to ensure that such experiments are empirically valid. Thus, it is worth ending this short overview of the study of attention and sight-reading music by considering possible implications for everyday musical performance.

All performers, particularly those in ensembles with conductors, know their ability to perform music that they have practised whilst following the vague, emotionally-led swings of the conductors arms. This ability may be put down to memory for passages available due to the hours of practice put in before the rehearsal. However, the situation changes when performers are asked to sight-read ensemble pieces during an initial play through. It is at this moment that divided attention between the visuospatial input of the conductor marking the tempo (and any changes thereto) and the visuospatial input coupled with auditory feedback necessary for sight-reading. Unless the page is full of barred notes that extend above and below the stave, it is not usual for sight-reading ability simply to break down at such moments. Of course, it is possible that the increased cognitive load may be made more manageable by heuristics such as pattern recognition, and the conductors beating may be aided by internal timing mechanisms.

In conclusion, the ability that performers possess to follow two visual inputs at once whilst sight-reading music appears to have been empirically proven. Further research may clarify any methodological errors with studies, and develop past studies with the aim of identifying the limits of attention past those of the psychological refractory period (Welford, 1952) that is rarely reached in music.


Allport, D. A., Antonis, B., and Reynolds, P. (1972). On the division of attention: A disproof of the single channel hypothesis. Quarterly Journal of Experimental Psychology, 24(2), 225-235.

Atkinson, R. C., and Schriffin, R. M. (1968). Human memory: A proposed system and its control processes. In W. K. Spence and T. .J. Spence (eds.), The psychology of learning and motivation: Advances in research and theory. New York: Academic Press, 89-125.

Baddeley, A. (1981). The concept of working memory: A view of its current state and probable future development. Cognition, 10(1-sup-3), 17-23.

Baddeley, A. (1992). Working memory. Science, 255, 536-559.

Broadbent, D. E. (1958). Perception and Communication. London: Pergamon Press.

Cherry, E. (1953). Some experiments in the recognition of speech, with one and two ears. Journal of the Acoustical Society of America, 25, 975-979.

Cole, N. (2002). Visual Attention for Colour and Form whilst Performing Rehearsed and Unrehearsed Piano Music. Undergraduate dissertation. Royal Holloway, University of London.

Gillett, K. (2002). Levels of Ability in Sight-readers. Undergraduate dissertation. Royal Holloway, University of London.

James, W. (1890). The Principles of Attention. Repr. at

Sloboda, J. A. (1985). The Musical Mind: the cognitive psychology of music. Oxford: Oxford University Press.

Welford, A. T. (1952). The psychological refractory period and the timing of high-speed performance – a review and a theory. British Journal of Psychology, 43, 2-19.

© Nicholas Cole, 2002Back to contents page