Music Processing Modularity


Nature Neuroscience, July 2003, p.688

Modularity of music processing

Isabelle Peretz1 & Max Coltheart2

'University of Montreal, Box 6128, Succ. Centreville, Montreal, Quebec H3C 3J7, Canada.

2Macquarie Center for Cognitive Science, Macquarie University, Sydney NSW 2109, Australia.


The music faculty is not a monolithic entity that a person either has or does not. Rather, it comprises a set of neurally isolable processing components, each having the potential to be specialized for music. Here we propose a functional architecture for music processing that captures the typical properties of modular organization. The model rests essentially on the analysis of music-related deficits in neurologically impaired individuals, but provides useful guidelines for exploring the music faculty in normal people, using methods such as neuroimaging.

Musical ability has traditionally been studied as the product of a general-purpose cognitive architecture, but a growing number of studies are based on the premise that music is a cognitively unique and evolutionary distinct faculty. Musical abilities are now studied as part of a distinct mental module with its own procedures and knowledge bases that are associated with dedicated and separate neural substrates. Thus, research concerning musical ability now tends to adhere, more or less explicitly, to the concept of modularity of cognitive functions as formulated by Fodor.

After briefly describing what is currently meant by modularity, we will illustrate how modularity shapes current thinking about how the mind processes music, relying particularly on evidence from individuals with abnormalities of musical ability.


According to Fodor, mental modules have the following characteristic properties: rapidity of operation, automaticity, domain-specificity, informational encapsulation, neural specificity and innateness. Fodor does not insist that any one of these properties is absolutely necessary for the ascription of the term 'modular’: For example, a system can he modular even if not innate; that is why there is no difficulty in describing the reading system as a module, even though reading is clearly not an innate ability5. So each of these properties might best be described as a typical, rather than necessary or sufficient, feature of a modular system.

Fodor does, however, consider one property to be more important than the others: information encapsulation. By this he means that the information processing within a mental module is immune from influence by the 'central system'—a large and slowly operating encyclopedic-knowledge system involved in high-level cognitive operations, such as problem solving or belief evaluation. Our view is that domain-specificity is an equally important property: it would be very odd to describe some system as a module if its operation were not specific to some restricted domain of input or output. Hence we consider domain-specificity to be not only an essential but necessary property for a processing system to be considered modular.

It is also important to realize that a module can be composed of smaller processing subsystems that can themselves be referred to as modules. For example, the language module contains component lexical and phonetic-processor modules.

To claim that there is a music-processing module is to claim that there is a mental information processing system whose operation is specific to the processing of music. That system may contain smaller modules whose processing domains may also be restricted to particular aspects of music. The possibility that such a cognitive architecture for music processing exists has been entertained for more than a decade.

Even if the human mind does contain a music module, it is conceivable that this module could lack the property of neural specificity, or neuroanatomical separability. For example, the neural substrate for music processing might overlap with that used for processing other complex patterns, such as speech sounds. In this case, brain damage would never affect musical abilities while sparing all other aspects of cognition (particularly auditory processing outside the domain of music). lf, on the other hand, the putative music module does possess the property of neural specificity, then we should expect to find people in whom brain damage has selectively affected musical abilities. Many such people have been found.

A module for music processing

Support for the existence of a music-processing module can be found in reports of selective impairments in music recognition abilities after brain damage. Such patients can no longer recognize melodies (presented without words) that were highly familiar to them before the onset of their brain damage. In contrast, they are normal at recognizing spoken lyrics (and spoken words in general), familiar voices and other environmental sounds (such as animal cries, traffic noises and human vocal sounds). This condition is called 'acquired amusia’. Similarly, in 'congenital amusia', individuals suffer from lifelong difficulties with music but can recognize the lyrics of familiar songs even though they are unable to recognize the tune that usually accompanies them.

Most people are experts at recognizing spoken words, but amateurs at recognizing music. It might therefore be argued that there is no specific module for recognizing music, but just a general auditory recognition module; when that is damaged, amateur abilities such as music recognition suffer more than expert abilities such as speech recognition. This account predicts that one will not find people in whom brain damage has impaired the ability to recognize spoken words while sparing the ability to recognize music. But such cases do exist: non-musicians may lose their ability to recognize spoken words yet remain able to recognize music. Similarly, brain-damaged patients who are afflicted with verbal agnosia (or word deafness), and hence have lost their ability to recognize spoken words, can be normal at recognizing nonverbal sounds, including music. The existence of such cases of selective impairment and sparing of musical abilities is incompatible with the claim that there is a single processing system responsible for the recognition of speech, music and environmental sounds. Rather, the evidence points to the existence of at least two distinct processing modules: one for music and one for speech.

Modular architecture of music processing

The study of neurological deficits has revealed far more about music processing than merely that there is a mental module specific to the processing of music.  A model has been developed showing the functional architecture of music processing that has been derived from case studies of specific music impairments in brain-damaged patients. In this model, a neurological anomaly could either damage a processing component (box) or interfere with the flow of information (arrow) between components.

Two modular aspects of the model deserve comment. First, the individuation of each box or arrow in the model arises from the study of its breakdown pattern in a brain-damaged patient. This fact confers upon the individuated component the modular property of neuroanatomical separability. Second, the model proposes various music-processing modules, each of which is concerned with a particular information-processing operation that contributes to the overall system.

An example of a distinct music-specific component inside the music module is the system concerned with tonal encoding of pitch.

Central to pitch organization is the perception of pitch along musical scales. A musical scale refers to the use of a small subset of pitches (usually seven) in a given musical piece. Scale tones are not equivalent and are organized around a central tone, called the tonic. Usually, a musical piece starts and ends on the tonic. The other scale tones are arranged in a hierarchy of importance or stability, with the fifth scale tone and the third scale tone being most closely related to the tonic. The remaining scale tones are less related to the tonic, and the non-scale tones are the least related; the latter often sound like 'foreign' tones. This tonal hierarchical organization of pitch facilitates perception, memory and performance of music by creating expectancies.

There is substantial empirical evidence that listeners use this tonal knowledge in music perception automatically. Tonal organization of pitch applies to most types of music, but it does not occur in processing other sound patterns, such as speech. Although the commonly used scales differ somewhat from culture to culture, most musical scales use pitches of unequal spacing that are organized around 5-7 focal pitches28 and afford the building of pitch hierarchies. The tonal encoding module seems to exploit musical predispositions, as infants show enhanced processing for scales with unequal pitch steps". Tonal encoding can be selectively impaired by brain damage; for example, some patients are no longer able to judge melodic closure properly and suffer from a severe reduction in pitch memory. In a recent functional neuroimaging study, Janata and collaborators point to the rostromedial prefrontal cortex as a likely brain substrate for the 'tonal encoding' module.

Unlike the tonal encoding module, other component music-processing modules    might not be restricted to just music For example, the 'contour analysis' component, which abstracts the pitch trajectories (in terms of pitch direction between adjacent tones without regard to the precise pitch intervals), could conceivably be involved in processing speech intonation as well as music.

The model takes as input any acoustic stimulus that can be attributed to a unique source. This implies that auditory segregation of sound mixtures into distinct sound sources first occurs in an acoustic analysis module whose domain is all auditory stimuli, not just music. The output of this early acoustic analysis might be, for example, a representation of the song "Happy Birthday." In that case, the lyric component of the song is assumed to be processed in parallel in the language processing system (right of the figure). We suppose that acti­vation of the music or the language processing modules is determined by the aspect of the input to which a module is tuned35. That is, there is no 'gatekeeper' that decides which part of the auditory pattern should be sent to the musical modules and which part should be sent to the language system. All the information contained in the song line, for example, would be sent to all modules. Only the modules that are specialized for the extraction of such information will respond—just as the retina does not respond when a sound wave passes through it, nor the cochlea when light shines upon it.

The musical input modules are organized in two parallel and largely independent subsystems whose functions are to specify, respectively, the pitch content (the melodic contour and the tonal functions of the successive pitch intervals) and the temporal content, by representing the metric organization as well as the rhythmic structure of the successive durations. The 'rhythm analysis' component deals with the segmentation of the ongoing sequence into temporal groups on the basis of durational values without regard to periodicity; the 'meter analysis' component extracts an underlying temporal regularity or beat, corresponding to periodic alternation between strong and weak beats. The strong beats generally correspond to the spontaneous tapping of the foot. Both the melodic and temporal pathways send their respective outputs to either the 'musical lexicon' or the 'emotion expression analysis' component. The musical lexicon is a representational system that contains all the representations of the specific musical phrases to which one has been exposed during one's lifetime. The same system also keeps a record of any new incoming musical input. Accordingly, successful recognition of a familiar tune depends on a selection procedure that takes place in the musical lexicon. The output of the musical lexicon can feed two different components, depending on task requirements. If the goal is to sing a song like "Happy Birthday," the corresponding melody, represented in the musical lexicon, will be paired with its associated lyrics that are stored in the phonological lexicon and will be tightly integrated and planned in a way that is suitable for vocal production. If the task requires retrieving nonmusical information about a musical selection, such as naming the tune or retrieving a related experience from memory, the associated knowledge stored in the 'associative memories' component will be invoked.

In parallel with memory processes, but independently, the perceptual modules will feed their outputs into an emotion expression analysis component, allowing the listener to recognize and experience the emotion expressed by the music. This emotional pathway also contributes to recognition via the musical lexicon. Emotion expression analysis is a pivotal processing component because music has the power to elicit strong emotional responses. It takes as input emotion-specific musical features, such as mode (e.g. major or minor) and tempo (e.g. slow or fast) as computed by the melodic and temporal pathways, respectively. What is currently unclear is to what extent this emotion expression analysis component is specific to music as opposed to being involved in more general kinds of emotional processing. A patient who could recognize pieces of music but could not respond emotionally to them, while being able to respond emotionally to other media, would be informative here.

In sum, we propose a modular functional architecture for music processing that comprises several component modules. Our model also describes the pathways of information flow among these component modules. The characterization of each box and arrow represented in the model has been provided by the detailed study of brain-damaged patients with selective impairments or preservations of particular musical abilities). The inclusion of three new output modules again stems from the study of neurological patients: singing performance in aphasic patients and tapping abilities in adults suffering from congenital amusia. Thus, our proposed modular architecture for processing music provides a plausible framework for further investigating the neural mechanisms of music processing.

(end of paraphrase)


Return to — Auditory Perception and Music