Multimodal Stimulation and Computer Games


To create immersive computer games it is understandable that multimodal stimulation should play an important role in the development process. This isn’t to say developers will sit around having a meeting about how to address multimodal delivery in their latest game but it is something that teams are aware of whether they know it or not. For example, it is quite clear that if a river makes it into a game that it should have an associated sound, the flow of water, the splashes from fish and the croak of nearby frogs. This can also be enhanced with haptic feedback, such as entering the river and receiving force feedback via the control pad.

So let’s now take a look at the theory behind multimodal stimulation in order to understand its importance.

The Theory behind Multimodal Stimulation

Research into sensory stimulus is normally grouped into different schools such as auditory, visual or somatosensory.  However, as noted by King and Calvat [1] and Vroomen and de Gelder [2] real life consists of stimulus from more than one sensory input, which may influence each other.  Single sensory inputs are rarely used on their own to gather a complete set of information.  When walking through the countryside there is a distinct smell of fresh air, visual senses are stimulated by trees, the ground and mountains.  Auditory senses are triggered by the wind, singing birds and the flow of a river.  These sensory inputs combined together provide detailed information to the brain about the environment. Therefore when creating computer games it makes sense to consider how each element in a game can benefit from multimodal stimulation.

Although multisensory stimulus should be treated as a whole there is a key difference between audio and vision.  In vision, spatial location is an indispensable attribute but colour (frequency) is not, while frequency is an indispensable attribute of audio but spatial location is not [3].  This theory is depicted in the figures below.  The image on the left suggests that if two identical notes were played from one location a listener would only perceive one sound source, similarly, if there were two speakers in different locations both being fed with identical frequency content a listener would still perceive one sound.  The listener will only perceive two sound sources if the frequency content differs, thus differentiating between two separate auditory stimuli is dependent upon frequency not spatial location.  The image on the right shows the process for visual stimulus.  Two light sources will always be perceived as one in the example shown, unless they differ in spatial location.  Finally it is worth adding that with both vision and audition, time is an indispensable attribute.  Therefore two identical auditory events heard at different points in time are distinguishable, as are two identical visual events.


Although a player can enjoy a game to a certain degree without the need for sound, there are several instances where sound complements the visuals or in some situations, is essential.  Examples of how auditory and visual elements work together and may actually change a player’s perception can be experienced with the motion-bounce illusion.  When first observing the illusion an observer is asked to watch the clip with no sound; the two blue balls appear to intersect each other’s path (See the left image).  When viewing the same clip a second time, an auditory “Blip” is heard at the point that the two balls collide, this time the balls appear to bounce off each other (See the image on the right). You can observe this for yourself here:


With the motion-bounce illusion, visuals can appear to change simply based on the presence of audio.  With the McGurk effect however, the inverse happens. The auditory stimulus appears to alter in the presence of visual content. Watch the video here:, interestingly the male appears to chant “Ba, ba, ba” or “Fa, fa, fa” depending on which visuals you are observing. The audio never changes but your brain interprets the audio in a different way because it is combining stimulus from visuals with the auditory information.  Another example of visual stimulus altering auditory feedback can easily be observed with the ventriloquist effect, something many of us have witnessed. “The ventriloquist talks without moving his lips and at the same time moves the dummy’s mouth. The illusion is compelling: it sounds like the dummy is actually talking. That is, you mislocalize the sound of the voice as coming from the dummy rather than real speaker.” [4] The ventriloquism effect, therefore, is an illusion where visuals dominate over auditory stimuli, resulting in the subject perceiving the auditory event from the same location as the visual stimulus.  Bertelson and Aschersleben conducted an experiment in relation to this effect and obtained favourable results which Kubovy and Valkenburg summarised thusly: “When they presented the sound synchronously with an off-centre flash, the sound appeared to be shifted in the direction of the light flash.”[5]

So how does this relate to computer games? Mislocalization can be a useful aid within sound design, as creating speech to pair with a character is only possible if a player believes the sound that they are hearing is coming from the array of pixels on screen.  The character is not a real person, nor are the vocals really coming from the array of pixels; they are coming from speakers generally to the left and right of the screen. [6] With correct use of spatialisation the audio is perceived as if it is coming from the characters mouth, the vehicles engine, the flowing river in the centre of the screen etc…

In context to auditory stimulus presented with visuals Rumsey and McCormick state that auditory spatial cues are “strongly influenced” by information from other sensory stimuli, “particularly vision”. [7] Alais and Burr [8] conducted research and experimentation into the ventriloquism effect with similar results to Bertelson and Aschersleben, the results [8,9] of which presented not only the ventriloquism effect but also an inverse ventriloquism effect.  This experiment was conducted with different sized low contrast blobs, which were then projected onto a screen while “click” sounds provided auditory stimulus.  Results indicated that small blobs led to the ventriloquism effect while larger blobs resulted in an inverse effect, causing audition to dominate.

All of these findings reinforce the logic that audio and visuals go hand-in-hand when developing a game. Audio and visuals can even be perceived differently based on the impact of the other element. This now naturally brings us on to the question of quality. If visuals can affect audio and vice-verse, can the quality of one degrade or improve the perceptual quality of the other?

A Case for Better Quality Audio

Research has been conducted into how much a user will notice the degradation of sound quality while presented with visuals and secondly if audio quality can affect visual quality or vice-versa.  Due to the nature of multimodal stimulation or indeed a game, there are several elements that a user must concentrate on; therefore, it is only natural that attention becomes divided.  Experimental results obtained [10] suggest that a player is less likely to notice a lag between visual and auditory stimulus when immersed in a game, as opposed to simply watching a game.  These results would suggest that high quality visuals are the most important aspect regarding audio-visual systems and that poor, or indeed out of time audio, is acceptable.  However, research and experimentation has been conducted in order to realise the importance of the relationship between audio and visual stimuli.

Storms conducted a series of experiments motivated in part by Brenda Laurel’s research.  In particular Storms focused on the statement: “…in the game business we discovered that really high-quality audio will actually make people tell you that the games have better pictures, but really good pictures will not make audio sound better”. [11]

In the first of Storms’ experiments, his findings confirm the research of Laurel.  Storms noted that when test subjects were asked to measure visual quality only, that “quality perception of a high-quality visual display is increased when coupled with a high-quality auditory display.”[12] Simply put, high-quality visuals are perceived as being higher quality than they actually are when presented with high quality auditory stimulus.  This perception of heightened quality does not apply in reverse. If high-quality visual displays are used with low quality audio and test subjects are asked to measure audio quality only, there will not be a perceived increase in audio quality.  In fact the opposite can apply, making the audio seem further decreased than it actually is.  Finally when test subjects were asked to measure the quality of both audio and visual stimulus, even when visuals were poor, high quality audio perception was increased.

High-quality visual and auditory stimulus has been shown to improve the perceived quality of visuals.  In part this may be due to mechanisms within the superior colliculus, as using both auditory and visual content will stimulate neurons within the midbrain, hence audition and visuals appear to create an output that is “greater than the sum of its parts”.  However, as observed in Storms’ results this doesn’t allow low-quality audio to be perceived as higher quality if high-quality visuals are present.  Storms noted that this could be due to games starting out with only visuals while sounds are added later.  Hence sounds add to the overall experience making visuals appear enhanced.

Attempts to investigate audio degradation with relation to visual stimuli by using visuals as a distracter from the audio quality have also been conducted. [13] Their findings suggested that limiting the high and low frequency content over a 5.1 system resulted in a degradation of overall audio quality.  The results also showed that the deterioration is mainly due to the front left and front right channels being limited, while (understandably) the centre and rear channel degradations are less noticeable.

With a specific emphasis on audio quality in relation to computer games, a further two papers attempted to test how much a user will notice degradation when presented with not only visual stimulus (the graphics) but also having to concentrate on a computer game and complete certain tasks within the game.  An experiment was conducted in which test subjects were to play a computer game while evaluating the audio quality.  Unfortunately this paper concluded that more research was needed [14] but a second paper based on this research showed a “significant but very small overall effect”. The paper noted that a “visual task decreased the consistency of audio quality grading”. [15] Interestingly it would appear that when a player is absorbed in a task, puzzle or particularly tricky part of the game they may not notice a degradation of audio quality. However, this would also suggest that less taxing areas of a game (for example empty corridors in Dead Space) will perhaps allow the player to concentrate more on audible feedback. In these areas a player is generally put on edge and audible feedback often includes eerie sounds, scrapes or the classic, a baby cry.

Audio as a Primary Stimulus

Audio is not simply an auxiliary feature of television, a film or a computer game.  Hendrix believes that auditory stimuli cannot only be used as a primary source of information but also as an alternative or complimentary stimuli to “improve human performance”. [16] Storms states that “during signal detection the auditory channel proves dominant over the visual channel”[17] while Hendrix goes on to explain that auditory displays are used as warning signals due to a shorter response time than visual stimulus.  This instance of auditory stimulus dominating over visual stimulus eliminates blind spots, for example: “…you can’t see around walls, but you can hear around them.  And if you can hear the virtual door open behind you then you can turn around.  Your ears steer your eyes.”[18]

Audio can be used as complimentary stimuli when visuals are used in very bright or very dark areas (a satellite navigation system for example) or when visuals have already been saturated.  Auditory and visual stimuli should be used collectively to create a multimodal model which enhances a computer game or rather, brings it closer to reality, as multimodal stimulation is all around us in the real world.  The final diagram compares several variations of auditory and visual stimuli working together to enhance the player’s experience:


[1] King, A., Calvert, G., (2001) Multisensory Integration: Perceptual Grouping by Eye and Ear, Current Biology, Elsiver 2001


[3] Kubovy, M., Valkenburg, V., (2001) Auditory and Visual Objects, Elsevier 2001, p.109

[4] Bedford, F., (2001) Towards a General Law of Numerical/Object Identity, University of Arizona, Tucson, USA, p.8

[5] Kubovy, M., Valkenburg, V., (2001) Auditory and Visual Objects, Elsevier 2001, p. 100

[6] Alais, D., Burr, D., (2003) The Ventriloquist Effect Results From Near Optimal Crossmodal Intergration, Neuron 2003, p.2

[7] Rumsey, F., McCormick, T., (2006) Sound and Recording: an Introduction, 5th ed, Oxford, Focal press, p. 38

[8] Alais, D., Burr, D., (2003) The Ventriloquist Effect Results From Near Optimal Crossmodal Intergration, Neuron 2003

[9] Alais, D., Burr, D., (2004) The Ventriloquist Effect Results From Near Optimal Bimodal Intergration, Current Biology, Vol. 14, 2004

[10] Ward, P., et al (2004) Can Playing a Computer Game Affect Perception of Audio-Visual Synchrony? Journal of Audio Engineers Society, 117th Convention, October 2004

[11] Storms, R., (1998) Auditory-Visual Cross-Modal Perception Phenomena, Doctoral Dissertation, Monterey, Naval Postgraduate School, p.62

[12] Storms, R., (1998) Auditory-Visual Cross-Modal Perception Phenomena, Doctoral Dissertation, Monterey, Naval Postgraduate School, p.126

[13] Zieliński, S., et al (2003) Effects of Bandwidth Limitation on Audio Quality in Consumer Multichannel Audiovisual Delivery Systems, Journal of Audio Engineers Society, Vol. 51, No 6, June 2003

[14] Kassier, R., et al (2003) Computer Games and Multichannel Audio Quality Part 2 – Evaluation of Time-Variant Audio Degradations Under Divided and Undivided Attention, Journal of Audio Engineers Society, 15th Convention, October 2003

[15] Zieliński, S., et al (2003) Computer Games and Multichannel Audio Quality – The Effect of Division of Attention Between Auditory and Visual Modalities, Journal of Audio Engineers Society, 24th International Conference, May 2003

[16] Hendrix, C., (1994) Exploratory Studies on the Sense of Presence in Virtual Environments as a Function of Visual and Auditory Display Parameters, MSc Engineering Thesis, University of Washington, p.19

[17] Storms, R., (1998) Auditory-Visual Cross-Modal Perception Phenomena, Doctoral Dissertation, Monterey, Naval Postgraduate School, p.56

[18] Travis, C., (1996) Virtual Reality Perspective on Headphone Audio, Journal of Audio Engineers Society, 11th Conference: Audio for New Media, March 1996, p.2


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s