Social Learning

Scientific Understanding of Consciousness
Consciousness as an Emergent Property of Thalamocortical Activity

Social Learning

Science 9 April 2010: Vol. 328. no. 5975, pp. 208 - 213

Why Copy Others? Insights from the Social Learning Strategies Tournament

L. Rendell,¹ R. Boyd,² D. Cownden,³ M. Enquist,^4,5 K. Eriksson,^5,6 M. W. Feldman,⁷ L. Fogarty,¹ S. Ghirlanda,^5,8 T. Lillicrap,⁹ K. N. Laland^1,^*

1 Centre for Social Learning and Cognitive Evolution, School of Biology, University of St. Andrews, Queen's Terrace, St. Andrews, Fife KY16 9TS, UK.
2 Department of Anthropology, University of California, Los Angeles, CA 90095, USA.
3 Department of Mathematics and Statistics, Queen's University, Jeffery Hall, University Avenue, Kingston, Ontario K7L 3N6, Canada.
4 Department of Zoology, Stockholm University, 11691 Stockholm, Sweden.
5 Centre for the Study of Cultural Evolution, Stockholm University, 11691 Stockholm, Sweden.
6 School of Education, Culture, and Communication, Mälardalen University, 72123 Västerås, Sweden.
7 Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA.
8 Department of Psychology, University of Bologna, 40127 Bologna, Italy.
9 Centre for Neuroscience Studies, Queen's University, 18 Stuart Street, Kingston, Ontario K7L 3N6, Canada.

(paraphrase)

Social learning (learning through observation or interactionwith other individuals) is widespread in nature and is centralto the remarkable success of humanity, yet it remains unclearwhy copying is profitable and how to copy most effectively.To address these questions, we organized a computer tournamentin which entrants submitted strategies specifying how to usesocial learning and its asocial alternative (for example, trial-and-errorlearning) to acquire adaptive behavior in a complex environment.Most current theory predicts the emergence of mixed strategiesthat rely on some combination of the two types of learning.In the tournament, however, strategies that relied heavily onsocial learning were found to be remarkably successful, evenwhen asocial information was no more costly than social information.Social learning proved advantageous because individuals frequentlydemonstrated the highest-payoff behavior in their repertoire,inadvertently filtering information for copiers. The winningstrategy (discountmachine) relied nearly exclusively on sociallearning and weighted information according to the time sinceacquisition.

Cultural processes facilitate thespread of adaptive knowledge, accumulated over generations,allowing individuals to acquire vital life skills. One of thefoundations of culture is social learning, learning influencedby observation or interaction with other individuals, whichoccurs widely in various forms across the animal kingdom.

Social learning appears advantageous becauseit allows individuals to avoid the costs, in terms of effortand risk, of trial-and-error learning. However, social learningcan also cost time and effort, and theoretical work revealsthat it can be error-prone, leading individuals to acquire inappropriateor outdated information in nonuniform and changing environments. Current theory suggests that to avoid these errorsindividuals should be selective in when and how they use sociallearning. Accordingly, naturalselection is expected to have favored social learning strategies that specify when individuals copyand from whom they learn.

We organized a computer tournamentin which strategies competed in a complex and changing simulationenvironment. 10,000 was offered as first prize. The organizationof similar tournaments by Robert Axelrod in the 1980s provedan extremely effective means for investigating the evolutionof cooperation and is widely credited with invigorating thatfield.

We received 104 entries, most, although not all,from academics across a wide range of disciplines and from allover the world.

The simulated environment for our tournamentwas a "multiarmed bandit", analogous to the "one-armedbandit" slot machine but with multiple "arms." In the tournament,the bandit had 100 arms, each representing a different behaviorand each with a distinct payoff drawn independently from anexponential distribution. Furthermore, we posited a temporallyvarying environment realized by changing the payoffs with aprobability, p_c, per behavior per simulation round, with newpayoffs drawn from the same distribution. The possibility ofacquiring outdated information is seen as a crucial weaknessof social learning.

Entered strategies had to specify how individual agents in afinite population choose between three possible moves in eachround, namely Innovate, Observe, and Exploit.

Innovate representedasocial learning, that is, individual learning stemming solelythrough direct interaction with the environment, for example,through trial and error. An Innovate move always returned accurateinformation about the payoff of a randomly selected behaviorpreviously unknown to the agent.

Observe represented any formof social learning or copying through which an agent could acquirea behavior performed by another individual, whether by observationof or interaction with that individual. An Observe movereturned noisy information about the behavior and payoff currentlybeing demonstrated in the population by one or more other agentsplaying Exploit. Playing Observe could return no behavior ifnone was demonstrated or if a behavior that was already in theagent’s repertoire is observed and always occurred witherror, such that the wrong behavior or wrong payoff could beacquired. The probabilities of these errors occurring and thenumber of agents observed were parameters we varied.

Exploit represented the performance of a behavior from the agent’s repertoire, equivalent to pulling one of the multiarmed bandit’s levers. Agents could only obtain a payoff by playing Exploit.

Axelrod’s cooperation tournaments were based on a widelyaccepted theoretical framework for the study of cooperation:the Prisoner’s Dilemma. Although there is no such currentlyestablished framework for social learning research, multiarmedbandits have been widely deployed to study learning across biology,economics, artificial intelligence research, and computer science, because they mimic a common problemfaced by individuals who must make decisions about how to allocatetheir time in order to maximize their payoffs. Multiarmed banditscapture the essence of many difficult problems in the real world,for instance, where there are many possible actions, only afew of which yield a high payoff; where it is possible to learnasocially or through observation of others; where copying erroroccurs; and where the environment changes.

Winning strategies relied more heavily on recently acquiredthan older information.

The most important outcome of the tournament isthe remarkable success of strategies that rely heavily on copyingwhen learning in spite of the absence of a structural cost toasocial learning, an observation evocative of human culture.

The ability to evaluate current information on the basis ofits age and to judge how valuable that information might bein the future, given knowledge of rates of environmental change,is also highlighted by the tournament.

By drawing attentionto the importance of adaptive filtering by the copied individualand temporal discounting by the copier, the tournament helpsto explain both why social learning is common in nature andwhy human beings happen to be so good at it.

(end of paraphrase)

Return to — Human-type Consciousness