Atkinson, Suppes - AN ANALYSIS OF TWO-PERSON GAME SITUATIONS IN.pdf

(656 KB) Pobierz
241145236 UNPDF
Journal o/ Experimental Psychology
Vol. 55, No. 4, 1958
AN ANALYSIS OF TWO-PERSON GAME SITUATIONS IN
TERMS OF STATISTICAL LEARNING THEORY '
RICHARD C. ATKINSON 2 AND PATRICK SUPPES
Applied Mathematics and Statistics Laboratory, Stanford University
This study represents an extension
of statistical learning theory to a class
of two-person, zero-sum game situa-
tions. Because the theory has been
mainly developed in connection with
experiments dealing with individual
learning problems, its predictive suc-
cess in an experimental area involving
interaction between individuals pro-
vides an additional measure of the
scope of its validity. It should be
emphasized that the study reported
here was not conceived as providing
an empirical test of the adequacy of
learning theory as opposed to game
theory; although we use the language
of game theory to describe the study,
the game characteristics of the situa-
tion were not apparent to Ss. This
point is amplified below.
For the purposes of this experiment,
a play of a game is a trial. On a given
trial each of the two players inde-
pendently makes a choice between
one of two alternatives—that is, he
makes one of two possible responses.
After the players have indicated their
choices, the outcome of the trial is
announced to each player.
On all trials, the game is described by
the following pay-off matrix.
The players are designated A and B.
The responses available to A are AI and
AS; similarly, the responses available to
B are BI and B 2 . If A selects AI and B
selects Bi then there is a probability ai
that A is "correct" and B is "incorrect,"
and a probability 1 — ai that A is "in-
correct" and B is "correct." These two
joint events are exhaustive since it is
required that exactly one player is cor-
rect on each trial. The outcomes of the
other three response pairs are identically
specified in terms of az, as and &i.
The interaction of the players is limited
by two factors: (a) neither player is
shown the pay-off matrix, (b) neither
player is directly informed of the re-
sponses of the other player. Thus, from
the standpoint of the general theory of
rational behavior (4), S should not regard
himself as playing a 2 X 2 game with
known pay-off matrix but should view
the situation as a multi-stage decision
problem against an unknown opponent.
However, selection of an optimal strategy
in this multi-stage decision problem is
far from a trivial task mathematically,
and it is scarcely to be expected that any
S would use such a strategy. The virtue
of statistical learning theory is that it
yields a quantitative prediction of how
organisms actually do behave in such
situations.
Our theoretical analysis of the be-
havior of Js in the situation described is
based on two distinct but closely related
models. Since a detailed mathematical
analysis of these models will be presented
elsewhere, the present statement will
concern only the most salient facts and
omit mathematical proofs.
Linear model. —The first model is an
extension of a linear model developed by
Estes and Burke (6). Experimental
tests of this formulation for one-person
learning situations have been reported
1 This research was supported by the Be-
havioral Sciences Division of the Ford Founda-
tion and by the Group Psychology Branch of the
Office of Naval Research. The authors are
indebted to W. K. Estes for several stimulating
discussions of the ideas on which this experiment
is based.
2 Now at the University of California, Los
Angeles.
369
241145236.002.png
370
RICHARD C. ATKINSON AND PATRICK SUPPES
(2, 9, 13). The basic assumption of the
model is that response probability on a
given trial is a linear function of the
probability on the preceding trial. When
a response is reinforced its probability
increases; the reinforcement of any other
response decreases its probability.
In the present situation, where two
responses are available to each S, if a
response occurs and is designated as
"correct," then the response is rein-
forced; if a response occurs and is desig-
nated as "incorrect," then the alternative
response is reinforced. More specifi-
cally, let a n be the probability of response
AI on Trial «. The rules of change are:
(a) if AI is reinforced on Trial n, then
a n+ 1 = (1 — 0A.)a n + 0A; (b] if Aj is rein-
forced on Trial n then a n+ \ = (1 — 0A.)a n ,
where 0 < BA. < 1. Identical rules are
specified for /3 n , the probability of a B:
response, in terms of (?B-
The following pair of recursive equa-
tions can then be derived for the mean
probabilities a n and j3 n , where •)>„ is the
mean probability of the joint event that
on Trial » Player A will make response
Ai and Player B response BI.
The line determined by this equation
has been labeled the interaction line since
the exact point on the line specifying the
asymptotic probabilities 5 and /3 is a
function of both 8 A and &•&• It is inter-
esting to observe that in the correspond-
ing one-person learning situation, the
interaction line degenerates to a point,
while in the three-person situation an
interaction surface is obtained.
Finite Markov model. —In this model
the simplifying assumption is made that
on all trials a player's response behavior
is determined by a single stimulus—that
is, the event associated with the onset of
a trial. The S is described as being in
one of two possible states: (a) if in State
1, the stimulus is conditioned to Response
1 and, in the presence of the stimulus,
Response 1 will be elicited; (b) if in State
2, the stimulus is conditioned to Re-
sponse 2 and, in the presence of the
stimulus, Response 2 will be elicited.
Thus, on any Trial «, the two players are
described in terms of one of the following
fourstates: <!,!>, <1,2>, <2,1> and
<2,2> where the first member of a
couple indicates the state of Player A
and the second, the state of Player B.
For example, <2,1> means that Player
A will make response Aa and Player B
will make response Bj. It is postulated
that the change of states from one trial
to the next is Markovian, and the follow-
ing analysis is used to derive the transi-
tion matrix (10, 11) of the process.
When one of Player A's responses is
reinforced on Trial n there is (a) a prob-
ability #A that the stimulus governing
Player A's response will be conditioned
to the reinforced response and therefore,
on Trial n + 1 Player A will make the
response reinforced on Trial n and, (b) a
probability 1 — (?A that the conditioned
status of the stimulus will remain un-
changed and therefore, on Trial n + 1
Player A will repeat the response made
on Trial n. Identical rules describe the
process for Player B in terms of 0B- 4
4 The Markov process derived from these as-
sumptions differs in certain respects from that
which can be derived from the Estes and Burke
stimulus sampling model (6). In their formula-
tion the stimulus is conceptualized as being
+ 0A(a 4 -
/§»*,! = (1 - 0 B a s - 0Ba 4 )i3 n
— a 4 )5,,
It may be shown that 3, /3 and 7, the
asymptotic probabilities in the sense of
Cesaro (II), 3 exist and are independent
of the initial probabilities «o, |8o> To-
However, in general these asymptotic
quantities depend on (?A and #B, and no
simple results are obtainable for the
quantities individually. On the other
hand, an interesting linear relation be-
tween a and /3, which is independent of
•y, #A and 0s, can be derived, namely:
[(a s + a 4 — ai — a 2 ) + (aia 2 — aaa 4 )]5
J(a s + a 4 &\ a 2 )
a a ). (1)
3 To be explicit,
_ _
a = lira - S "on,
_ n-»» » i-l
and similarly, for /3 and 7.
1 " "
241145236.003.png
TWO-PERSON GAME SITUATIONS
371
For this set of assumptions and the
pay-off probabilities ai, aa, a-a and a.*, the
transition matrix describing the learning
process can be derived and is as follows:
<2,1>
<2,2>
ai(0A -
0
320B
a 2 (0A - SB)
(1 - a s )0A
a(0 A - SB)
+ (1 - fa)
<2,2>
4(0A - fe)
+ (1 - fa)
Rows designate the state on Trial n
and columns the state on Trial n + 1.
Thus (1 — aa)0A, the entry in Row 3,
Column 1, is the conditional probability
of being in State <!,!> on Trial n -f- 1
given that the pair of "Ss was in State
<2,1> on Trial n, because:
taking the limits of a and /3 as the ratio
#A /6s approaches zero or becomes large.
Particular cases of the theoretical
analysis may be illustrated by examining
predictions for the parameter values em-
ployed in this experiment. Three sets
of ai values were used corresponding to
three classical cases of 2 X 2 games in
the theory of zero-sum, two-person
games (12).
The first case is labeled the Mixed
Group, since both players have mixed
minimax strategies. The as values are
given by the pay-off matrix
+ (1-0A)0B-0+(1-0A)(1-0 B )'0.
From these one-stage transition prob-
abilities an explicit solution is obtained
for the Cesaro asymptotic probabilities
of an AI and Bj response; as in the case
of the linear model_ these quantities are
denoted as 5 and & respectively. The
general equations for S. and /3 are too
lengthy to reproduce here but certain
results are noteworthy. It can be shown
that a and /3 are related by the identical
interaction line determined by Equation
1 of the linear model. For the Markov
model, however, it can in addition be
proved that the point on the interaction
line describing a particular pair of Ss'
asymptotic behaviors is uniquely deter-
mined by the ratio of 0A to O-B. Further,
even without a knowledge of the specific
values of BA. and BB one can specify a
fairly narrow interval on the interaction
line within which a and /3 must fall by
The minimax strategy for Player A is to
choose Ai with probability 3, and the
minimax strategy for B is to choose BI
with probability f. In the Markov
model
5= .600
- 35(0 A /fla)+22
P ~ 50(0 A /S B ) + 40'
(2)
(3)
Note that a is independent of 0A/0B-
From Equation 3 one obtains as bounds
on /3:
composed of a large number of stimulus elements
each of which is sampled with probability 6 and,
once sampled, conditioned to the reinforced re-
sponse with probability 1. Further, the prob-
ability of a response is defined as the proportion
of stimulus elements in the sample conditioned
to the response. In the model used in this paper
it is assumed that the single stimulus is sampled
on each trial with probability 1.
.550 < /3 < .700.
(4)
If one assumes 0A = OB, then /3 = .633.
For this case the interaction line is the
line satisfying Equation 2.
The second case is labeled the Pure
Group, since both players have pure mini-
max strategies. The particular values
241145236.004.png
372
RICHARD C. ATKINSON AND PATRICK SUPPES
are given by the matrix
Bi 62
i
Here ai = J is a saddle point of the ma-
trix and from the standpoint of game
theory the optimal strategy for Player A
is to play AI with probability 1 and for
B to play BI with probability 1. In the
Markov model
= .667
j3 ^ 6(fl A /fl B ) + 5
(S)
METHOD
Subjects.— The Ss were 120 undergraduates
obtained from introductory courses in psy-
chology and philosophy at Stanford University.
They were randomly assigned to the Mixed,
Pure, and Sure Groups with the restriction that
there were 20 pairs of Ss in each group.
Apparatus. —The Ss, run in pairs, sat at
opposite ends of an 8 X 3-ft. table. Mounted
vertically on the table top facing each S was a
50-in. wide by 30-in. high black panel placed
22 in. from the end of the table. The E sat
between the two panels and was not visible to
either S. The apparatus, as viewed from S's
side, consisted of two silent operating keys
mounted 8 in. apart on the table top and 12 in.
from the end of the table; upon the panel, three
milk-glass panel lights were mounted. One of
these lights, which served as the signal for S to
respond, was centered between the keys at a
height of 17 in. from the table top. Each of the
two remaining lights, the reinforcing signals, was
at a height of 11 in. directly above one of the
keys. The presentation and duration of the
lights were automatically controlled. The Ss
were not visible to one another and could not see
each other's responses or panel lights.
Procedure. —The Ss were read the following
instructions: "We always run Ss in pairs because
this is the way the equipment has been designed
and also because it is the most economical pro-
cedure. Actually, however, you are both work-
ing on two completely different and independent
problems.
"The experiment for each of you consists of a
series of trials. The top center lamp on your
panel will light for about 2 sec. to indicate the
start of each trial. Shortly thereafter one or the
other of the two lower lamps will light up.
Your job is to predict on each trial which one of
the two lower lamps will light and indicate your
prediction by pressing the proper key. That is,
if you expect the left lamp to light press the left
key, if you expect the right lamp to light press
the right key. On each trial press one or the
other of the two keys but never both. If you
are not sure which key to press then guess.
"Be sure to indicate your choice by pressing
the proper key immediately after the onset of
the signal light. That is, when the signal light
goes on press one or the other key down and
release it. Then wait until one of the lower
lights goes on. If the light above the key you
pressed goes on your prediction was correct, if
the light above the key opposite from the one
you pressed goes on you were incorrect, and
should have pressed the other key. At times
(6)
If one assumes
|9 = .611.
The third case is labeled the Sure
Group since both players have sure-thing
strategies (i.e., given the pay-off matrix
one of the two responses available to each
player is at least as good or better than
the other response regardless of what his
opponent does). The parameter values
are given by the matrix
(7)
0B, then
The sure-thing strategies for Players A
and B are AI and BI respectively. In
the Markov model
- = 5(9 A /e B ) + IS
7 //) /fl i i 0 "\
i ^PA/CB/ i* "^
•J6
-23'
(8)
r *"*
(9)
and as bounds one has:
.652 < a < .711 (10)
.696</3<.711. (11)
If one assumes that #A = #B, then
a = .667 and /3 = .700. For this case
the interaction line is determined by the
equation:
35=10/3-5.
(12)
As in the previous case, a is independent
of 6\ /0B and the interaction line is the
line satisfying Equation 5. From Equa-
tion 6 one obtains as bounds on /3:
.555 < 0 < .667.
that
241145236.005.png
TWO-PERSON GAME SITUATIONS
373
you may feel frustrated or irritated if you cannot
understand what the experiment is all about.
Nevertheless, continue trying to make the very
best prediction you can on each trial."
For each pair of Ss, one was randomly selected
as Player A and the other as Player B. Further,
for each S one of the two response keys was
randomly designated Response 1 and the other
Response 2 with the restriction that the following
possible combinations occurred equally often in
each of the three experimental groups: (a) Ai
and BI on the right, (b) AI on the right and Bi
on the left, (c) AI on the left and BI on the right,
and (</) Ai and BI on the left.
Following the instructions, 200 trials were run
in continuous sequence. For each pair of Ss
sequences of reinforcing lights were generated in
accordance with assigned values of ai and ob-
served responses.
On all trials the signal light was lighted for
3.5 sec.; the time between successive signal ex-
posures was 10 sec. The reinforcing light fol-
lowed the cessation of the signal light by l.S sec.
and remained on for 2 sec.
At the end of the session each S was asked to
describe what he thought was involved in the
experiment. Only one S indicated that he be-
lieved the reinforcing events depended in any
way on a relationship between his responses and
the other player's responses. His record and
that of his partner were eliminated from the
analysis and replaced by another pair.
- MIXED GROUP
.6-
.5-
O .6-
.5-
I
2
Z
SURE GROUP
,-
1
2
3
4
5
BLOCKS OF 40 TRIALS
FIG. 1. Observed proportions of AI and Bi
responses in blocks of 40 trials for the three
experimental groups.
RESULTS AND DISCUSSION
Mean learning curves and asymptotic
results. —Figure 1 provides a descrip-
tion of behavior over all trials of the
experiment. In this figure the mean
proportions of AI and Bj responses in
successive blocks of 40 trials are given
for the sequence of 200 trials. An in-
spection of this figure indicates that
responses are fairly stable over the
last 100 trials except possibly for BI
responses in the Pure Group. To
check the stability of response be-
havior for individual data, t's for
paired measures were computed be-
tween response proportions for the
first and last halves of the final block
of 60 trials. In all cases the obtained
values of t fall short of significance at
the .05 level.
It appears reasonable to assume
that a constant level of responding has
been reached; consequently the pro-
portions computed over the last 60
trials were used as an estimate of 5.
and /3. Table 1 presents the observed
mean proportions of AI and BI re-
sponses in the last 60 trial block and
the SD's associated with these means.
TABLE 1
PREDICTED AND OBSERVED MEAN PROPORTIONS
OF Aj AND BI RESPONSES OVER THE
LAST BLOCK OF 60 TRIALS
Response Ai
Response Bi
Pred.
Obs.
.60S
.670
.606
SD
.0794
.0832
.1005
Pred.
Obs,
.649
.602
.731
SD
.0874
.0634
.0760
Mixed
Pure
Sure
.600
.667
.667
.633
.611
.700
241145236.001.png
Zgłoś jeśli naruszono regulamin