Kapralos - Auditory Perception and Spatial (3D) Auditory Systems 2003-07.pdf

(1996 KB) Pobierz
/tmp/CS-2003-07.dvi
Auditory Perception and Spatial (3D) Auditory Systems
Bill Kapralos
Michael R. M. Jenkin
Evangelos Milios
Technical Report CS-2003-07
July 20, 2003
Department of Computer Science
4700 Keele Street North York, Ontario M3J 1P3 Canada
12589948.001.png
Auditory Perception and Spatial (3D)
Auditory Systems 4
B. Kapralos 1;3 , M. Jenkin 1;3 and E. Milios 2;3
1 Dept. of Computer Science, York University, Toronto, ON, Canada. M3J 1P3
2 Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada. B3H 1W5
3 Centre for Vision Research, York University, Toronto, ON, Canada. M3J 1P3
fbillk, jenking@cs.yorku.ca
eem@cs.dal.ca
Abstract
In order to enable the user of a virtual reality system to be fully immersed in the virtual
environment, the user must be presented with believable sensory input. Although the
majority of virtual environments place the emphasis on visual cues, replicating the complex
interactions of sound within an environment will benet the level of immersion and hence
the user's sense of presence. Three dimensional (spatial) sound systems allow a listener to
perceive the position of sound sources, and the eect of the interaction of sound sources
with the acoustic structure of the environment. This paper reviews the relevant biological
and technical literature relevant to the generation of accurate acoustic displays for virtual
environments, beginning with an introduction to the process of auditory perception in
humans. This paper then critically examines common methods and techniques that have
been used in the past as well as methods and techniques which are currently being used to
generate spatial sound. In the process of doing so, the limitations, drawbacks, advantages
and disadvantages associated with these techniques are also presented.
4 The nancial support of NSERC (Natural Sciences and Engineering Research Council of Canada),
CRESTech (Centre for Research in Earth and Space Technology) and IRIS (Institute for Robotics and
Intelligent Systems), is gratefully acknowledged.
12589948.002.png
Contents
1 Introduction 1
1.1 What Exactly is Sound? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Measuring Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.2 Near Field vs. Far Field . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.3 Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Sound Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Duplex Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Head Related Transfer Function (HRTF) . . . . . . . . . . . . . . . 14
1.2.3 Reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 Precedence Eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.5 Head Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.6 Auditory Distance Perception . . . . . . . . . . . . . . . . . . . . . 23
2 Recording Techniques 30
2.1 Listener Sweet Spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Microphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Monaural Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Stereophonic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Articial Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.2 Coincident Microphone Techniques . . . . . . . . . . . . . . . . . . 37
2.4.3 Spaced Microphone Techniques . . . . . . . . . . . . . . . . . . . . 40
2.4.4 Combining Coincident and Spaced Microphone Techniques . . . . . 40
2.5 Binaural Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1 Binaural Recording Techniques . . . . . . . . . . . . . . . . . . . . 42
2.6 Surround Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6.1 Quadraphonic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.2 Ambisonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6.3 Dolby Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.6.4 Dolby Pro Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6.5 Dolby Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
i
2.6.6 Digital Theater Systems (DTS) Digital Surround . . . . . . . . . . 59
3 Simulating Audio in a Virtual Environment 61
3.1 Modeling the ITD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Binaural Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 HRTF Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.1 Interpolation of HRTFs . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3.2 The Use of Non-individualized (\Generic") HRTFs . . . . . . . . . 67
3.3.3 Available HRTF Datasets . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.4 Equalization of the HRTF Impulse Response . . . . . . . . . . . . . 76
3.4 Modeling of Reverberation and Room Acoustics . . . . . . . . . . . . . . . 78
3.4.1 Auralization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 Distance Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.1 Loudness as a Distance Cue . . . . . . . . . . . . . . . . . . . . . . 84
3.5.2 Reverberation as a Distance Cue . . . . . . . . . . . . . . . . . . . 88
3.5.3 Source Spectral Content as a Distance Cue . . . . . . . . . . . . . . 89
3.5.4 Binaural Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5.5 Sound Source Familiarity . . . . . . . . . . . . . . . . . . . . . . . . 91
4 Conveying Sound in a Virtual Environment 93
4.1 Headphone Listening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1.1 Headphones and Comfort . . . . . . . . . . . . . . . . . . . . . . . 94
4.1.2 Inside-the-Head Localization . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Loudspeaker Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.1 Transaural Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.2.2 Amplitude Panning . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5 Discussion
109
ii
Chapter 1
Introduction
The sounds we hear provide us with detailed information about our surroundings and can
assist us in determining both the distance and direction to objects, at times, very accurately
[159]. This ability is extremely benecial for both humans and a variety of other species
and in many situations, is crucial for survival. We can hear a sound in the dark where we
may not necessarily make use of vision (sight) and in contrast to the limited visual eld
of view, the auditory system is omni-directional, allowing us to hear sounds reaching us
from any position in three dimensional space. Given this omni-directional aspect, hearing
serves to guide our visual senses, or to quote Cohen and Wenzel [30], \the function of the
ears is to point the eyes". Hearing, or audition also serves to guide the more \nely tuned"
visual attention system thereby easing the burden of the visual system [137].
Although sound is a critical cue to perceiving our environment, it is often overlooked
in immersive virtual environments, where, historically, emphasis has been placed on the
visual senses [30, 25]. The spatial audio cues present in many virtual environments are
rather poor and do not necessarily reect natural cues despite the fact that natural (spatial)
sound cues can allow a user to orient themselves in a virtual environment. In addition,
audio cues can add a \pleasing quality" to the simulation, add a better sense of \presence"
or \immersion" and compensate for poor visual cues (graphics) [3, 137]. Furthermore,
the virtual environments which actually employ spatial audio typically, assume a far eld
source acoustical model, emphasizing the direction (azimuth and elevation) to a sound
source only, oering little, if any, sound source distance information [138, 108]. Despite
the importance of distance discrimination in maintaining a sense or realism among the
virtual sound sources [16], accurate sound source distance is often ignored in virtual audio
1
Zgłoś jeśli naruszono regulamin