Olfactory Perception of Odor Molecules from Chemical Features

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Science 24 Feb 2017: Vol. 355, Issue 6327, pp. 820-826

Predicting human olfactory perception from chemical features of odor molecules

Andreas Keller, et.al.

Laboratory of Neurogenetics and Behavior, The Rockefeller University, New York, NY 10065, USA.

School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA.

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

Thomas J. Watson Computational Biology Center, IBM, Yorktown Heights, NY 10598, USA.

Department of Physiology, Faculty of Medicine, Semmelweis University, 1085 Budapest, Hungary.

Laboratory of Molecular Physiology, Hungarian Academy of Science, Semmelweis University (MTA-SE), 1085 Budapest, Hungary.

Monell Chemical Senses Center, Philadelphia, PA 19104, USA.

Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 19104, USA.

Institution for Innovation, Ajinomoto Co., Inc., Kawasaki, Kanagawa 210-8681, Japan.

SAS Institute, Inc., Cary, NC 27513, USA.

Department of Public Health and Primary Care, KU Leuven, Kulak, 8500 Kortrijk, Belgium.

Department of Computer Science, KU Leuven, 3001 Leuven, Belgium.

Flanders Make, 3920 Lommel, Belgium.

Howard Hughes Medical Institute, New York, NY 10065, USA.

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

[paraphrase]

It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors (“garlic,” “fish,” “sweet,” “fruit,” “burnt,” “spices,” “flower,” and “sour”). Regularized linear models performed nearly as well as random forest–based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule.

In vision and hearing, the wavelength of light and frequency of sound are highly predictive of color and tone. In contrast, it is not currently possible to predict the smell of a molecule from its chemical structure. This stimulus-percept problem has been difficult to solve in olfaction because odors do not vary continuously in stimulus space, and the size and dimensionality of olfactory perceptual space is unknown. Some molecules with very similar chemical structures can be discriminated by humans, and molecules with very different structures sometimes produce nearly identical percepts. Computational efforts developed models to relate chemical structure to odor percept, but many relied on psychophysical data from a single 30-year-old study that used odorants with limited structural and perceptual diversity.

Twenty-two teams were given a large, unpublished psychophysical data set collected by Keller and Vosshall from 49 individuals who profiled 476 structurally and perceptually diverse molecules. We supplied 4884 physicochemical features of each of the molecules smelled by the subjects, including atom types, functional groups, and topological and geometrical properties that were computed using Dragon chemoinformatic software.

Using a baseline linear model developed for the challenge and inspired by previous efforts to model perceptual responses of humans, we divided the perceptual data into three sets. Challenge participants were provided with a training set of perceptual data from 338 molecules that they used to build models. The organizers used perceptual data from an additional 69 molecules to build a leaderboard to rank performance of participants during the competition. Toward the end of the challenge, the organizers released perceptual data from the 69 leaderboard molecules so that participants could get feedback on their model and to enable refinement with a larger training + leaderboard data set. The remaining 69 molecules were kept as a hidden test set available only to challenge organizers to evaluate the performance of the final models. Participants developed models to predict the perceived intensity, pleasantness, and usage of 19 semantic descriptors for each of the 49 individuals and for the mean and standard deviation across the population of these individuals.

We first examined the structure of the psychophysical data using the inverse of the covariance matrix calculated across all molecules as a proxy for connection strength between each of the 21 perceptual attributes. This yielded a number of strong positive interactions, including those between “garlic” and “fish”; “musky” and “sweaty”; and “sweet” and “bakery”; and among “fruit,” “acid,” and “urinous”; and a negative interaction between pleasantness and “decayed.” The perception of intensity had the lowest connectivity to the other 20 attributes. To understand whether a given individual used the full rating scale or a restricted range, we examined subject-level variance across the ratings for all molecules. Applying hierarchical clustering on Euclidean distances for the variance of attribute ratings across all the molecules in the data set, we distinguished three clusters: subjects that responded with high-variance for all 21 attributes, subjects with high-variance for four attributes (intensity, pleasantness, “chemical,” and “sweet”) and either low variance or intermediate variance for the remaining 17 attributes.

The accuracy of predictions of individual perception for the best-performing model was highly variable, but the correlation of six of the attributes was above 0.3. The best-predicted individual showed a correlation above 0.5 for 16 of 21 attributes. We asked whether the usage of the rating scale could be related to the predictability of each individual. Overall, we observed that individuals using a narrow range of attribute ratings—measured across all molecules for a given attribute—were more difficult to predict. The relations between range and prediction accuracy did not hold for intensity and pleasantness.

We analyzed the quality of model predictions for specific molecules in the population. The correlation between predicted and observed attributes exceeded 0.9 (t test, P < 10^–4) for 44 of 69 hidden test-set molecules when we used aggregated model predictions, and 28 of 69 when we averaged all model correlations. The quality of predictions varied across molecules, but for every molecule, the aggregated models exhibited higher correlations. The two best-predicted molecules were 3-methyl cyclohexanone followed by ethyl heptanoate. Conversely, the five molecules that were most difficult to predict were l-lysine and l-cysteine, followed by ethyl formate, benzyl ether, and glycerol.

The DREAM Olfaction Prediction Challenge has yielded models that generated high-quality personalized perceptual predictions. This work substantially expands on previous modeling efforts because it predicts not only pleasantness and intensity, but also 8 out of 19 semantic descriptors of odor quality. The predictive models enable the reverse-engineering of a desired perceptual profile to identify suitable molecules from vast databases of chemical structures and closely approach the theoretical limits of accuracy when accounting for within-individual variability. Although highly significant, there is still much room for improving, in particular, the individual predictions. Although the current models can only be used to predict the 21 attributes, the same approach could be applied to a psychophysical data set that measured any desired sensory attribute (e.g., “rose,” “sandalwood,” or “citrus”). How can the highly predictive models presented here be further improved? Recognizing the inherent limits of using semantic descriptors for odors, we think that alternative perceptual data, such as ratings of stimulus similarity, will be important.

What do our results imply about how the brain encodes an olfactory percept? We speculate that, for each molecular feature, there must be some quantitative mapping, possibly one to many, between the magnitude of that feature and the spatiotemporal pattern and activation magnitude of the associated olfactory receptors. If features rarely or never interact to produce perception, as suggested by the strong relative performance of linear models in this challenge, then these feature-specific patterns must sum linearly at the perceptual stage. Peripheral events in the olfactory sensory epithelium, including receptor binding and sensory neuron firing rates might have nonlinearities, but the numerical representation of perceptual magnitude must be linear in these patterns. It is possible that stronger nonlinearity will be discovered when odor mixtures or the temporal dynamics of odor perception are investigated.

Many questions regarding human olfaction remain that may be successfully addressed by applying this method to future data sets that include more specific descriptors; more molecules that represent different olfactory percepts than those studied here; and subjects of different genetic, cultural, and geographic backgrounds.

Results of the DREAM Olfaction Prediction Challenge may accelerate efforts to understand basic mechanisms of ligand-receptor interactions, and to test predictive models of olfactory coding in both humans and animal models. Finally, these models have the potential to streamline the production and evaluation of new molecules by the flavor and fragrance industry.

[end of paraphrase]

Return to — Olfactory Stimuli Discrimination