Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Social Learning


Science 9 April 2010: Vol. 328. no. 5975, pp. 208 - 213

Why Copy Others? Insights from the Social Learning Strategies Tournament

L. Rendell,1 R. Boyd,2 D. Cownden,3 M. Enquist,4,5 K. Eriksson,5,6 M. W. Feldman,7 L. Fogarty,1 S. Ghirlanda,5,8 T. Lillicrap,9 K. N. Laland1,*

1 Centre for Social Learning and Cognitive Evolution, School of Biology, University of St. Andrews, Queen's Terrace, St. Andrews, Fife KY16 9TS, UK.
2 Department of Anthropology, University of California, Los Angeles, CA 90095, USA.
3 Department of Mathematics and Statistics, Queen's University, Jeffery Hall, University Avenue, Kingston, Ontario K7L 3N6, Canada.
4 Department of Zoology, Stockholm University, 11691 Stockholm, Sweden.
5 Centre for the Study of Cultural Evolution, Stockholm University, 11691 Stockholm, Sweden.
6 School of Education, Culture, and Communication, Mälardalen University, 72123 Västerås, Sweden.
7 Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA.
8 Department of Psychology, University of Bologna, 40127 Bologna, Italy.
9 Centre for Neuroscience Studies, Queen's University, 18 Stuart Street, Kingston, Ontario K7L 3N6, Canada.


Social learning (learning through observation or interaction with other individuals) is widespread in nature and is central to the remarkable success of humanity, yet it remains unclear why copying is profitable and how to copy most effectively. To address these questions, we organized a computer tournament in which entrants submitted strategies specifying how to use social learning and its asocial alternative (for example, trial-and-error learning) to acquire adaptive behavior in a complex environment. Most current theory predicts the emergence of mixed strategies that rely on some combination of the two types of learning. In the tournament, however, strategies that relied heavily on social learning were found to be remarkably successful, even when asocial information was no more costly than social information. Social learning proved advantageous because individuals frequently demonstrated the highest-payoff behavior in their repertoire, inadvertently filtering information for copiers. The winning strategy (discountmachine) relied nearly exclusively on social learning and weighted information according to the time since acquisition.

Cultural processes facilitate the spread of adaptive knowledge, accumulated over generations, allowing individuals to acquire vital life skills. One of the foundations of culture is social learning, learning influenced by observation or interaction with other individuals, which occurs widely in various forms across the animal kingdom.

Social learning appears advantageous because it allows individuals to avoid the costs, in terms of effort and risk, of trial-and-error learning. However, social learning can also cost time and effort, and theoretical work reveals that it can be error-prone, leading individuals to acquire inappropriate or outdated information in nonuniform and changing environments. Current theory suggests that to avoid these errors individuals should be selective in when and how they use social learning. Accordingly, natural selection is expected to have favored social learning strategies that specify when individuals copy and from whom they learn.

We organized a computer tournament in which strategies competed in a complex and changing simulation environment. {euro}10,000 was offered as first prize. The organization of similar tournaments by Robert Axelrod in the 1980s proved an extremely effective means for investigating the evolution of cooperation and is widely credited with invigorating that field.

We received 104 entries, most, although not all, from academics across a wide range of disciplines and from all over the world.

The simulated environment for our tournament was a "multiarmed bandit", analogous to the "one-armed bandit" slot machine but with multiple "arms." In the tournament, the bandit had 100 arms, each representing a different behavior and each with a distinct payoff drawn independently from an exponential distribution. Furthermore, we posited a temporally varying environment realized by changing the payoffs with a probability, pc, per behavior per simulation round, with new payoffs drawn from the same distribution. The possibility of acquiring outdated information is seen as a crucial weakness of social learning.

Entered strategies had to specify how individual agents in a finite population choose between three possible moves in each round, namely Innovate, Observe, and Exploit.

Innovate represented asocial learning, that is, individual learning stemming solely through direct interaction with the environment, for example, through trial and error. An Innovate move always returned accurate information about the payoff of a randomly selected behavior previously unknown to the agent.

Observe represented any form of social learning or copying through which an agent could acquire a behavior performed by another individual, whether by observation of or interaction with that individual. An Observe move returned noisy information about the behavior and payoff currently being demonstrated in the population by one or more other agents playing Exploit. Playing Observe could return no behavior if none was demonstrated or if a behavior that was already in the agent’s repertoire is observed and always occurred with error, such that the wrong behavior or wrong payoff could be acquired. The probabilities of these errors occurring and the number of agents observed were parameters we varied.

Exploit represented the performance of a behavior from the agent’s repertoire, equivalent to pulling one of the multiarmed bandit’s levers. Agents could only obtain a payoff by playing Exploit.

Axelrod’s cooperation tournaments were based on a widely accepted theoretical framework for the study of cooperation: the Prisoner’s Dilemma. Although there is no such currently established framework for social learning research, multiarmed bandits have been widely deployed to study learning across biology, economics, artificial intelligence research, and computer science, because they mimic a common problem faced by individuals who must make decisions about how to allocate their time in order to maximize their payoffs. Multiarmed bandits capture the essence of many difficult problems in the real world, for instance, where there are many possible actions, only a few of which yield a high payoff; where it is possible to learn asocially or through observation of others; where copying error occurs; and where the environment changes.

Winning strategies relied more heavily on recently acquired than older information.

The most important outcome of the tournament is the remarkable success of strategies that rely heavily on copying when learning in spite of the absence of a structural cost to asocial learning, an observation evocative of human culture.

The ability to evaluate current information on the basis of its age and to judge how valuable that information might be in the future, given knowledge of rates of environmental change, is also highlighted by the tournament.

By drawing attention to the importance of adaptive filtering by the copied individual and temporal discounting by the copier, the tournament helps to explain both why social learning is common in nature and why human beings happen to be so good at it.

(end of paraphrase)



   Return to — Human-type Consciousness