Speech Organization in Cortex

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Speech Organization in Cortex

Nature 495, 327–332 (21 March 2013)

Functional organization of human sensorimotor cortex for speech articulation

Department of Neurological Surgery and Department of Physiology, University of California, San Francisco, 505 Parnassus Avenue, San Francisco, California 94143, USA

Kristofer E. Bouchard, Nima Mesgarani & Edward F. Chang

Center for Integrative Neuroscience, 675 Nelson Rising Lane, University of California, San Francisco, California 94158, USA

Kristofer E. Bouchard, Nima Mesgarani & Edward F. Chang

Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall, Berkeley, California 94720, USA

Keith Johnson

UCSF Epilepsy Center, University of California, San Francisco, 400 Parnassus Avenue, San Francisco, California 94143, USA

Edward F. Chang

[paraphrase]

Speaking is one of the most complex actions that we perform, but nearly all of us learn to do it effortlessly. Production of fluent speech requires the precise, coordinated movement of multiple articulators (for example, the lips, jaw, tongue and larynx) over rapid time scales. Here we used high-resolution, multi-electrode cortical recordings during the production of consonant-vowel syllables to determine the organization of speech sensorimotor cortex in humans. We found speech-articulator representations that are arranged somatotopically on ventral pre- and post-central gyri, and that partially overlap at individual electrodes. These representations were coordinated temporally as sequences during syllable production. Spatial patterns of cortical activity showed an emergent, population-level representation, which was organized by phonetic features. Over tens of milliseconds, the spatial patterns transitioned between distinct representations for different consonants and vowels. These results reveal the dynamic organization of speech sensorimotor cortex during the generation of multi-articulator movements that underlies our ability to speak.

Speech communication critically depends on the ability to produce the large number of sounds that compose a given language. The wide range of spoken sounds results from highly flexible configurations of the vocal tract, which filters sound produced at the larynx through movements of the lips, jaw and tongue that are coordinated precisely. Each articulator has extensive degrees of freedom, making a large number of different speech movements possible. How humans exert such precise control despite the wide variety of movement possibilities is a central unanswered question.

The cortical control of articulation is mediated primarily by the ventral half of the lateral sensorimotor (Rolandic) cortex (ventral sensorimotor cortex, vSMC), which provides corticobulbar projections to, and afferent innervation from, the face and vocal tract. The U-shaped vSMC is composed of the pre- and post-central gyri (Brodmann areas 1, 2, 3 and 6b), and the gyral area directly ventral to the termination of the central sulcus called the guenon (Brodmann area 43). Using electrical stimulation, Foerster and Penfield described the somatotopic organization of face and mouth representations in human vSMC. However, focal stimulation could not evoke meaningful utterances, implying that speech is not stored in discrete cortical areas. Instead, the production of phonemes and syllables is thought to arise from a coordinated motor pattern involving multiple articulator representations.

To understand the functional organization of vSMC in articulatory sensorimotor control, we recorded neural activity directly from the cortical surface in three human subjects implanted with high-density multi-electrode arrays as part of their preparation for epilepsy surgery, Intracranial cortical recordings were synchronized with microphone recordings as subjects read aloud consonant-vowel syllables (19 consonants followed by /a/, /u/ or /i/) that are commonly used in American English. This task was designed to sample across a range of phonetic features, including different constriction locations (place of articulation) and different constriction degrees or shapes (manner of articulation) for a given articulatory organ.

We aligned cortical recordings to acoustic onsets of consonant-to-vowel transitions (t = 0) to provide a common reference point across consonant-vowel syllables. We focused on the high-gamma frequency component of local field potentials (85–175 Hz), which correlates well with multi-unit firing rates. For each electrode, we normalized the time-varying high-gamma amplitude to baseline statistics by transforming to z-scores.

During syllable articulation, approximately 30 active vSMC electrode sites were identified per subject (approximately 1,200 mm², change in z-score of greater than 2 for any syllable). Cortical activity from selected electrodes distributed along the vSMC dorsoventral axis is shown for /ba/, /da/ and /ga/. The plosive consonants (/b/, /d/, /g/) are produced by transient occlusion of the vocal tract by the lips, front tongue and back tongue, respectively, whereas the vowel /a/ is produced by a low, back tongue position during phonation. Dorsally located electrodes were active during production of /b/, which requires transient closure of the lips. In contrast, mid-positioned electrodes were active during production of /d/, which requires forward tongue protrusion against the alveolar ridge. A more ventral electrode was most active during production of /g/, which requires a posterior-oriented tongue elevation towards the soft palate. Other electrodes appear to be active during the vowel phase for /a/.

Cortical activity at different electrode subsets was superimposed to visualize spatiotemporal patterns across other phonetic contrasts. Consonants produced with different constriction locations of the tongue tip, (for example, /θ/ (dental), /s/ (alveolar), and /∫/ (post-alveolar)), showed specificity across different electrodes in central vSMC, although they were not as categorical as those shown for consonants involving different articulators in. Consonants with similar tongue constriction locations, but different constriction degree or constriction shape, were generated by overlapping electrode sets exhibiting different relative activity magnitudes (/l/ (lateral) versus /n/ (nasal stop) versus /d/ (oral stop)). Syllables with the same consonant followed by different vowels (/ja/, /ji/, /ju/) were found to have similar activity patterns before the consonant-vowel transition. During vowel phonation, a dorsal electrode is clearly active during /u/, but not /i/ or /a/ whereas another electrode in the middle of vSMC had prolonged activity during /i/ and /u/ vowels compared to /a/ These contrasting examples show that important phonetic properties can be observed qualitatively from the rich repertoire of vSMC spatiotemporal patterns.

The distributed organization of speech articulator representations led us to propose that coordination of the multiple articulators required for speech production would be associated with spatial patterns of cortical activity. We refer here to this population-derived pattern as the phonetic representation. To determine its organizational properties, we used principal component analysis to transform the observed cortical activity patterns into a ‘cortical state-space’ (approximately 60% of variance is explained by 9 spatial principal components for all subjects).. k-means clustering during the consonant phase (25 ms before the consonant-vowel transition, t = −25 ms) showed that the cortical state-space was organized into three clusters (quantified by silhouette analysis) corresponding to the major oral articulators: labial, coronal tongue, and dorsal tongue. During the vowel phase (250 ms after the consonant-vowel transition, t = 250), we found clear separation of /a/, /i/ and /u/ vowel states. Similar clustering of consonants and vowels was found across subjects (P < 10⁻¹⁰ for clustering of both consonants and vowels,

Theories of speech motor control and phonology have speculated that there is a hierarchical organization of phoneme representations, given the anatomical and functional dependencies of the vocal tract articulators during speech production. To evaluate such organization in vSMC, we applied hierarchical clustering to the cortical state-space. For consonants, this analysis confirmed that the primary tier of organization was defined by the major oral articulator features: dorsal, labial or coronal. These major articulators were superordinate to the constriction location within each articulator. For example, the labial cluster could be subdivided into bi-labial and labiodental. Only at the lowest level of the hierarchy did we observe suggestions of organization according to constriction degree or shape, such as the sorting of nasal (/n/ syllables), oral stops (/d/, /t/) and lateral approximants (/l/). Similarly, during the vowel period, a primary distinction was based on the presence or absence of lip rounding (/u/ versus /a/ and /i/), and a secondary distinction was based on tongue posture (height, and front or back position). Therefore, the major oral articulator features that organize consonant representations are similar to those for vowels.

The dynamics of neural populations have provided insights into the structure and function of many neural circuits. To determine the dynamics of phonetic representations, we investigated how state-space trajectories for consonants and vowels entered and departed target regions for phonetic clusters. Trajectories of individual consonant-vowel syllables were visualized by plotting their locations in the first two principal-component dimensions versus time.

Visualization of the dynamic structure of the cortical state-space during production of all consonant-vowel syllables showed that, as the cortical state comes to reflect phonetic structure, different phonetic clusters diverge from one another, while the trajectories within the clusters converge. Furthermore, we observed correlates of the earlier articulatory specification for sibilants (/∫/, /z/, /s/). In addition, with all consonant-vowel syllables on the same axes, we observed that in comparison to vowels, consonants occupy a distinct region of cortical state-space, despite sharing the same articulators. The distribution of state-space distances was significantly greater in consonant-vowel comparisons than in consonant-consonant or vowel-vowel comparisons (P < 10⁻¹⁰ for all comparisons, WSRT, n = 4623 in all cases). Finally, the consonant-to-vowel sequence reveals a periodic structure, which is sub-specified for consonant and vowel features.

Our broad-coverage, high-resolution direct cortical recordings enabled us to examine the spatial and temporal profiles of speech articulator representations in human vSMC. Cortical representations are somatotopically organized, with individual sites tuned for a preferred articulator and co-modulated by other articulators. The dorsoventral layout of articulator representations recapitulates the rostral-to-caudal layout of the vocal tract. However, we found an additional laryngeal representation located at the dorsal-most end of vSMC. This dorsal laryngeal representation seems to be absent in non-human primates, suggesting a unique feature of human vSMC for the specialized control of speech. Pre- and post-central gyrus neural activity occurred before vocalization, which may reflect the integration of motor commands with proprioceptive information for rapid feedback control during speaking.

Just as focal stimulation is insufficient to evoke speech sounds, it is not any single articulator representation, but the coordination of multiple articulator representations across the vSMC network that generates speech. Analysis of spatial patterns of activity showed an emergent hierarchy of network states that organizes phonemes by articulatory features. This functional hierarchy of network states contrasts with the anatomical hierarchy often considered in motor control. The cortical state-space organization probably reflects the coordinative patterns of articulatory motions during speech, and is notably similar to a theorized cross-linguistic hierarchy of phonetic features (‘feature geometry). In particular, the findings support gestural theories of speech control³ over alternative acoustic (a hierarchy organized primarily by constriction degree)¹⁹ or vocal-tract geometry theories (no hierarchy of constriction location and degree).

The vSMC population showed convergent and divergent dynamics during the production of different phonetic features. The dynamics of individual phonemes were superimposed on a slower oscillation that characterizes the transition between consonants and vowels. Although trajectories were found to originate or terminate in different regions, they consistently pass through the same (target) region of the state-space for shared phonetic features. Consonants and vowels occupy distinct regions of the cortical state-space. Large state-space distances between consonant and vowel representations may explain why it is more common in speech errors to substitute consonants with one another, and vowels with vowels, but very rarely consonants with vowels or vowels with consonants (that is, in ‘slips of the tongue).

We have shown that a relatively small set of articulator representations can combine flexibly to create the large variety of speech sounds in American English. The major organizational features found here define phonologies of languages from across the world. Consequently, these cortical organizational principles are likely to be conserved, with further specification for unique articulatory properties across different languages.

[end of paraphrase]

Speech Organization in Cortex

Return to — Language

Return to — FAPs

Return to — Movement Control