Murphy K. P., Dynamic Bayesian Networks Representation, Inference And Learning.pdf

(1787 KB) Pobierz
134175694 UNPDF
Dynamic Bayesian Networks:
Representation, Inference and Learning
by
Kevin Patrick Murphy
B.A. Hon. (Cambridge University) 1992
M.S. (University of Pennsylvania) 1994
A dissertation submitted in partial satisfaction
of the requirements for the degree of
Doctor of Philosophy
in
Computer Science
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
Committee in charge:
Professor Stuart Russell, Chair
Professor Michael Jordan
Professor Peter Bickel
Professor Jeffrey Bilmes
Fall 2002
Dynamic Bayesian Networks:
Representation, Inference and Learning
Copyright 2002
by
Kevin Patrick Murphy
ABSTRACT
Dynamic Bayesian Networks:
Representation, Inference and Learning
by
Kevin Patrick Murphy
Doctor of Philosophy in Computer Science
University of California, Berkeley
Professor Stuart Russell, Chair
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models
(HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For
example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been
used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs
by allowing the state space to be represented in factored form, instead of as a single discrete random variable.
DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian.
In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact
and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3 ) , where
T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T) ;
a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds
on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored
frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety
of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation
of the field of sequential data modelling.
1
ACKNOWLEDGMENTS
I would like to thank my advisor, Stuart Russell, for supporting me over the years, and for giving me so
much freedom to explore and discover new areas of probabilistic AI. My other committee members have also
been very supportive. Michael Jordan has long been an inspiration to me. His classes and weekly meetings
have proved to be one of my best learning experiences at Berkeley. Jeff Bilmes proved to be a most thorough
reviewer, as I expected, and has kept me honest about all the details. Peter Bickel brought a useful outsider’s
perspective to the thesis, and encouraged me to make it more accessible to non computer scientists (although
any failings in this regard are of course my fault).
I would like to thank my many friends and colleagues at Berkeley with whom I have had the pleasure of
working over the years. These include Eyal Amir, David Andre, Serge Belongie, Jeff Bilmes, Nancy Chang,
Nando de Freitas, Nir Friedman, Paul Horton, Srini Narayanan, Andrew Ng, Mark Paskin, Sekhar Tatikonda,
Yair Weiss, Erix Xing, Geoff Zweig, and all the members of the RUGS and IR groups.
I would like to thank Jim Rehg for hiring me as an intern at DEC/Compaq/HP Cambridge Research Lab
in 1997, where my Bayes Net Toolbox (BNT) was born. I would like to thank Gary Bradski for hiring me
as an intern at Intel in 2000 to work on BNT, and for providing me with the opportunity to work with people
spanning three countries formerly known as superpowers — USA, China and Russia. In particular, I would
like to thank Wei Hu and Yimin Zhang, of ICRC, for their help with BNT. I would also like to thank the many
people on the web who have contributed bug fixes to BNT. By chance, I was able to work with Sebastian
Thrun during part of my time with Intel, for which I am very grateful.
I would like to thank my friends in Jennie Nation and beyond for providing a welcome distraction from
school. Finally, I would like to thank my wife Margaret for putting up with my weekends in the office, for
listening to my sagas from Soda land, and for giving me the motivation to finish this thesis.
ii
Contents
1 Introduction
1
1.1 State-space models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.1.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.2 Hidden Markov Models (HMMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.4 The problem with HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Kalman Filter Models (KFMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 The problem with KFMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Overview of the rest of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 A note on software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Declaration of previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 DBNs: Representation
18
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 DBNs defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Representing HMMs and their variants as DBNs . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 HMMs with mixture-of-Gaussians output . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 HMMs with semi-tied mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.3 Auto-regressive HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Buried Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
iii
Zgłoś jeśli naruszono regulamin